U+FEFF Zero Width No-Break Space
U+FEFF was added in Unicode version 1.1 in 1993. It belongs to the block
This character is a Format and is commonly used, that is, in no specific script. The character is also known as BOM and ZWNBSP.
The glyph is not a composition. It has no designated width in East Asian texts. In bidirectional text it acts as Boundary Neutral. When changing direction it is not mirrored. U+FEFF prohibits a line break around it.
The Wikipedia has the following information about this codepoint:
The byte-order mark (BOM) is a particular usage of the special Unicode character code, U+FEFF ZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:
- the byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
- the fact that the text stream's encoding is Unicode, to a high level of confidence;
- which Unicode character encoding is used.
BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.
Unicode can be encoded in units of 8-bit, 16-bit, or 32-bit integers. For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM is encoded in the same scheme as the rest of the document and becomes a noncharacter Unicode code point if its bytes are swapped. Hence, the process accessing the text can examine these first few bytes to determine the endianness, without requiring some contract or metadata outside of the text stream itself. Generally the receiving computer will swap the bytes to its own endianness, if necessary, and would no longer need the BOM for processing.
The byte sequence of the BOM differs per Unicode encoding (including ones outside the Unicode standard such as UTF-7, see table below), and none of the sequences is likely to appear at the start of text streams stored in other encodings. Therefore, placing an encoded BOM at the start of a text stream can indicate that the text is Unicode and identify the encoding scheme used. This use of the BOM is called a "Unicode signature".
Representations
System | Representation |
---|---|
Nº | 65279 |
UTF-8 | EF BB BF |
UTF-16 | FE FF |
UTF-32 | 00 00 FE FF |
URL-Quoted | %EF%BB%BF |
HTML hex reference |  |
Wrong windows-1252 Mojibake |  |
alternate | BYTE ORDER MARK |
abbreviation | BOM |
abbreviation | ZWNBSP |
alias | BOM |
alias | ZWNBSP |
Adobe Glyph List | zerowidthjoiner |
Elsewhere
Complete Record
Property | Value |
---|---|
1.1 (1993) | |
ZERO WIDTH NO-BREAK SPACE | |
BYTE ORDER MARK | |
Arabic Presentation Forms-B | |
Format | |
Common | |
Boundary Neutral | |
Not Reordered | |
none | |
|
|
✘ | |
|
|
|
|
✘ | |
|
|
|
|
|
|
|
|
|
|
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
|
|
Control | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
0 | |
0 | |
0 | |
✘ | |
None | |
— | |
NA | |
Other | |
— | |
✘ | |
✘ | |
✘ | |
✘ | |
Yes | |
Yes | |
Yes | |
Yes | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
Format | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
Format | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
|
|
None | |
neutral | |
Not Applicable | |
— | |
No_Joining_Group | |
Transparent | |
Word Joiner | |
none | |
not a number | |
|
|
R |