Go's string as slice of bytes • Alvin Lucillo

💻 Tech

Go uses UTF-8 encoding scheme by default, which means every Unicode character can be represented up to 4 bytes. This is important to know because manually manipulating a series of bytes from UTF-8 encoding might result to unexpected outcome. For example, some Unicode characters that take 3 bytes can take 3 indices in a string value. Evaluating each byte does not mean it results to a character as character may take up to 4 bytes. In this case, 3 bytes must be evaluated as a 3-byte occupying character. It’s important to note that a string is actually a slice of bytes under the hood.