Strings, Bytes and Runes in Go

Working with text data is the fundamentals of programming, and understanding how Go handles strings is rather recommended.

This post covers strings, bytes and runes in Go, explaining their differences and demonstrating elementary work with them.

Strings

A string is an immutable sequence of bytes which means we can store there whatever we want, but in most cases it is used to represent sequence of human readable characters.

In order to define a string literal we use double quotes like this:

"Here is an example of a string"

To learn more about strings please check the strings package documentation.

If you wanted to use single quotes for more than one character, it wouldn’t allow you to do it and you would see the error message:

variable := 'abc'
// Output:
// more than one character in rune literal

Because single quotes are reserved for single characters called runes (described below).

There is also possibility to define a string using back quotes like this:

`Hello there`

It’s called a raw string because it cannot be interpreted as strings in double quotes, which means that escaped control characters aren’t replaced with anything. Please take a look at this code snippet:

interpretedString := "Hello\nWorld!"
rawString := `Hello\nWorld!`
fmt.Println(interpretedString)
fmt.Println("---")
fmt.Println(rawString)
// Output: 
// Hello
// World!
// ---
// Hello\nWorld!

As you can see the \n (new line) in the interpretedString is replaced with the new line, but in the rawString isn’t, but treated literally (as it is).

Bytes

As I wrote before string is a sequence of bytes ([]byte), and it has its consequences when we want to loop through them using for loop, because when we do it we actually go through bytes.

Let’s take an emoji which is stored in 4 bytes and try to loop through it:

emoji := "🏍"
for i := 0; i < len(emoji); i++ {
	fmt.Printf("%x ", emoji[i])
}
// Output:
// f0 9f 8f 8d

As you can see it prints four values (hexadecimal digits) which represents four bytes.

It’s worth mention that the len() function returns the number of accumulated bytes not characters in a string (so in our case it returns 4).

Runes

A rune is actually underneath an integer (int32) which represents Unicode code point.

Unicode code point is an unique number for every character in the world and it may be stored up to 4 bytes.

As I mentioned before runes are declared in a single quotes, so let’s check it out:

emoji := '🏍'
fmt.Printf("type: %T\ncharacter: %c\ncode point: %U\n", emoji, emoji, emoji)
// Output:
// type: int32
// character: 🏍
// code point: U+1F3CD

Please note the %c format specifier is used to print the emoji. If you would like to print it without using this format, the integer value would be printed as follows:

	fmt.Println(emoji)
// Output:
// 127949

Runes and Strings

You may now be wondering how to check the “real” length of the string (by counting characters, not bytes). If so, please see this code snippet:

text := "Here is an example of a string with emoji: 🏍"
numberOfRunes := utf8.RuneCountInString(text)
lengthInBytes := len(text)

fmt.Printf("Length in bytes: %v\nNumber of runes: %v", lengthInBytes, numberOfRunes)
// Output:
// Length in bytes: 47
// Number of runes: 44

We need to use utf8.RuneCountInString function to calculate the “real” length of a string (in fact, the amount of runes in a string), and as you can see, there are more bytes than runes, because one character is stored as four bytes (the emoji).

The next thing you may wondering is how to iterate through runes (not bytes) in a string. Here is an example:

text := "🏍💨🚓🚨🔊🚧📃📂💸"
for i, c := range text {
  fmt.Printf("char: %c Code Point: %U i: %d\n", c, c, i)
}

// Output:
// char: 🏍 Code Point: U+1F3CD i: 0
// char: 💨 Code Point: U+1F4A8 i: 4
// char: 🚓 Code Point: U+1F693 i: 8
// char: 🚨 Code Point: U+1F6A8 i: 12
// char: 🔊 Code Point: U+1F50A i: 16
// char: 🚧 Code Point: U+1F6A7 i: 20
// char: 📃 Code Point: U+1F4C3 i: 24
// char: 📂 Code Point: U+1F4C2 i: 28
// char: 💸 Code Point: U+1F4B8 i: 32

We need to use the range keyword which iterates over a string’s runes (not bytes).

Please note that the i variable stores the starting index of the particular rune’s byte, and the c variable stores the rune.

Conclusion

Understanding strings, bytes, and runes in Go is essential for handling text correctly. Please remember that:

strings are immutable byte sequences often representing text (there are two types of literal strings: interpreted and raw)
runes handle Unicode code points for a all characters in the world
utf8.RuneCountInString allows to count runes in a string
range allows to iterate through a string’s runes (not bytes)

I hope you found this post useful and learned something! Have a great day! 🙂

Strings#

Bytes#

Runes#

Runes and Strings#

Conclusion#

Strings

Bytes

Runes

Runes and Strings

Conclusion