Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use 13 including most Chinese, Japanese and Korean characters.
However, by measuring string positions using bytes instead of "characters" most algorithms can be easily and efficiently adapted for UTF-8.
A UTF-8 decoder should be prepared for: the red invalid bytes in the above table an unexpected continuation byte a leading byte not followed by enough continuation bytes (which can happen in simple string truncation) an overlong encoding as described above a sequence that decodes.
"RFC 3629 UTF-8, a transformation format of ISO 10646".
The colors indicate how bits from the code point are distributed among the UTF-8 bytes.
Thus, many text processors, parsers, protocols, file formats, text display programs etc., which use ascii characters for formatting and control purposes will continue to work as intended by treating the UTF-8 byte stream as a sequence of single-byte characters, without decoding the multi-byte sequences.