1991 « » (. Unicode Consortium, Unicode Inc.).[4][5] : Unicode , , , , .[6]
: (. UCS, universal character set) (. UTF, Unicode transformation format). , . UCS.
. U+0000 U+007F ASCII . , . .[7] U+0400 U+052F, U+2DE0 U+2DFF, U+A640 U+A69F (. ).[8]
|
|
[]
1980- 8- , 8- , . , , - ( , , CP437). :
«» . , , , . 32- , 16-.
, 16 , 216 (65 536). (, U+04F0). , , . « » (private use area), U+D800U+F8FF. , , .
(, Windows NT[10]) 16- , 65 536 ( . basic multilingual plane, BMP). « » (. supplementary characters): , .
16- UTF-16, 65 536 , U+D800U+DFFF, 16- , « » ( U+D800U+DBFF, U+DC00U+DFFF). (2048 ), « ».
UTF-16 220+2162048 (1 112 064) , .
216 2.0, «» 3.1.
- , 2010 -, , 50 %.[11]
[]
, , Plane 0 , ISO. :
- 1.1 ( ISO/IEC 106461:1993), 19911995 .
- 2.0, 2.1 ( ISO/IEC 106461:1993 : «Amendments» 1- 7- «Technical Corrigenda» 1 2), 1996 .
- 3.0 ( ISO/IEC 106461:2000), 2000 .
- 3.2, 2002 .
- 4.0, 2003.
- 4.01, 2004.
- 4.1, 2005.
- 5.0, 2006.
- 5.1, 2008.
- 5.2, 2009.
- 6.0, 2010.
- 6.1, 2012.
[]
UTF-8 UTF-32 231 (2 147 483 648) , 1 112 064 UTF-16. , ( 6.0) 110 000 (109 242 273 ).
17 216 (65536) . , . , , , , [12]. 15 16 .[7]
Unicode «U+xxxx» ( 0FFFF), «U+xxxxx» ( 10000FFFFF), «U+xxxxxx» ( 10000010FFFF), xxx . , «» (U+044F) 044F16 = 110310.
[]
, . .
:
. , - , ( , composite character) ( , precomposed character).
[]
(). . , , . , . (. base characters), (. combining characters); . , «á» «a» (U+0061) « ́» (U+0301) «á» (U+00C1).
(. variation selectors). , . 5.0 , .
[]
, , , .
4 :
- D (NFD) . , .
- C (NFC) . D, :
- KD (NFKD) . , , , .
- KC (NFKC) .
«» «» .
[]
| NFD | NFC | NFKD | NFKC | |
|---|---|---|---|---|
| Français | Franc\u0327ais |
Fran\xe7ais |
Franc\u0327ais |
Fran\xe7ais |
| , , | \u0410, \u0415\u0308, \u0418\u0306 |
\u0410, \u0401, \u0419 |
\u0410, \u0415\u0308, \u0418\u0306 |
\u0410, \u0401, \u0419 |
| が | \u304b\u3099 |
\u304c |
\u304b\u3099 |
\u304c |
| Henry IV | Henry IV |
Henry IV |
Henry IV |
Henry IV |
| Henry Ⅳ | Henry \u2163 |
Henry \u2163 |
Henry IV |
Henry IV |
[]
(. left-to-right, LTR), (. right-to-left, RTL) , . «» ; .
, , . (. bidirectional text, BiDi). (, ) , . : , , . ( ) .
[]
- ,
- ,
- ,
- ,
- ,
- ,
- ,
- ,
- ,
- ,
- ( , ),
- ,
- ,
- ,
- ,
- (),
- ,
- ,
- ( ),
.
, (, Apple MacRoman (0xF0) Windows Wingdings (0xFF)). .
[] ISO/IEC 10646
ISO/IEC/JTC1/SC2/WG2, 10646 (ISO/IEC 10646). ISO/IEC 10646 , .
(. International Organization for Standardization, ISO) 1991 . 1993 ISO DIS 10646.1. 1.1, DIS 10646.1. Unicode 1.1 DIS 10646.1 .
. 2000 Unicode 3.0 ISO/IEC 10646-1:2000. ISO/IEC 10646 Unicode 4.0. , .
UTF-16 UTF-32 , ISO/IEC 10646 : UCS-2 (2 , UTF-16) UCS-4 (4 , UTF-32). UCS () (. universal multiple-octet coded character set). UCS-2 UTF-16 (UTF-16 ), UCS-4 UTF-32.
[]
(. Unicode transformation format, UTF): UTF-8, UTF-16 (UTF-16BE, UTF-16LE) UTF-32 (UTF-32BE, UTF-32LE). UTF-7 , - ASCII . 1 2005 : UTF-9 UTF-18 (RFC 4042).
Microsoft Windows NT Windows 2000 Windows XP UTF-16LE. UNIX- GNU/Linux, BSD Mac OS X UTF-8 UTF-32 UTF-8 .
Punycode Unicode- ACE-, - , .
[] UTF-8
UTF-8 , , 8- . , 128, UTF-8 ASCII. , UTF-8 128 ASCII . 2 6 ( , 4 , 10FFFF, ), 11xxxxxx, 10xxxxxx.
UTF-8 2 1992 Plan 9[13]. UTF-8 RFC 3629 ISO/IEC 10646 Annex D.
Unicode UTF-8: 0x00000000 0x0000007F: 0xxxxxxx 0x00000080 0x000007FF: 110xxxxx 10xxxxxx 0x00000800 0x0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx 0x00010000 0x001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
, :
0x00200000 0x03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0x04000000 0x7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
, UTF-8 , . .
[]
UTF-16 (. UTF-16 big-endian), (. UTF-16 little-endian). UTF-32BE UTF-32LE.
U+FEFF ( ), (. byte order mark, BOM). UTF-16LE UTF-16BE, U+FFFE . UTF-8, . , , :
- UTF-8
- EF BB BF
- UTF-16BE
- FE FF
- UTF-16LE
- FF FE
- UTF-32BE
- 00 00 FE FF
- UTF-32LE
- FF FE 00 00
, UTF-16LE UTF-32LE, U+0000 ( ).
UTF-16 UTF-32, BOM, big-endian (unicode.org).
[]
8- . , . 8- . , : , , .
[]
.
Windows NT UTF-16LE. , , . . Microsoft.
UNIX- , GNU/Linux, BSD, Mac OS X, UTF-8. UTF-8 , , . UCS-4, .
Java. 8- 16-. , .
[]
[] Microsoft Windows
Windows 2000, « » (charmap.exe) . , , Microsoft Word.
, Alt+X, , , WordPad, Microsoft Word. Alt+X .
MS Windows, Unicode, Alt . , Alt+0171 («) Alt+0187 (»). Alt+0133 ().
[] Macintosh
Mac OS 8.5 , «Unicode Hex Input». Option . , U+FFFF, ; . .
Mac OS X 10.2, «Character Palette», , , .
[] Linux
GNOME « », . , ISO 14755: Ctrl Shift ( GTK+ «U»). 32 , .
X Window, GNOME KDE, Compose. , Compose, , Caps Lock.
GNU/Linux Alt. : AltGr, AF NumLock Enter ( ). ISO 14755. , unicode_start(1) setfont(8).
Mozilla Firefox Linux ISO 14755.
[]
«a» «a» . ( «a» ) «» «». ; -, « » .
- , , . .
- . , , () (), ( CJK-), . , , , «-». , ( , . ). , .
- . : İi Iı , , «i» «I». , , .[14]
- : «» «», [15] . .
, .
- , , , ( UTF-8 , ASCII, , ASCII[16]). , , ́ [17]. ; , , , [18].
- , . , (BOM) . ( ).
- .
- UNIX ( - ) , - . UNIX- Unicode .
, (. combining diacritics). , (U+0401) (U+0419) , , (. decomposed): + ̈ (U+0415 U+0308), + ̆ (U+0418 U+0306). .
[] «» «»?
«Unicode» ( , , Unicode Consortium), , .
«». «-» ( «uni-» «-»: , , , ) «». , , , , «uni-» «-» («», «» . .), , , UNICEF «United Nations International Childrens Emergency Fund» .
«» . «», 11 «»[19]. .
, «Unicode» . «»[1].
, «Unicode», .
[] .
[]
- 1 2 Unicode Transcriptions (.). 22 2011. 10 2010.
- Paratype
- The Unicode® Standard: A Technical Introduction. 22 2011. 4 2010.
- History of Unicode Release and Publication Dates. 22 2011. 4 2010.
- The Unicode Consortium. 22 2011. 4 2010.
- 1 2 3 Foreword. 22 2011. 4 2010.
- 1 2 General Structure. 22 2011. 5 2010.
- European Alphabetic Scripts. 22 2011. 4 2010.
- Unicode 88. 22 2011. 8 2010.
- Unicode and Microsoft Windows NT (.). Microsoft Support. 22 2011.
- Unicode 50% - (.). 22 2011.
- Roadmap to the TIP (Tertiary Ideographic Plane)
- http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt (.)
- Unicode -
- «» () .
- , . , - , , , 67 ( ), . . 3,54 , UTF-8 2 .
- Arial Unicode 24 ; Times New Roman 120 , , 65536.
- 120 -. .
- 350 . «» 31 . «».
[]
- (.)
- Unicode Open Directory Project (dmoz). (.)
- Unicode? (.)
- [1] (.)
- (.) (.)
- ISO/IEC 10646 ( PDF) (.)
- FAQ UTF-8 Unicode (.)
- : [2], [3], [4], [5] ( PDF) (.)
- DecodeUnicode Unicode enci (50 000 ) (.)
- Windows (.)


