Unicode 6.0 Released

Unicode Logo

Unicode Logo

Two days ago Unicode 6.0.0 was released and it has important changes. There are also some changes that Iranian developers should be aware of, or is better to be aware. I try to summarize the changes base on what Roozbeh Pournader said in Persian Computing Community, and I also add some other information that mentioned in Unicode Official Website. I will post two article for this, the first one (this article) includes Unicode 6.0 information and the second one is some description about whatever is important for Iranian developers.

Note: Most of the text that you are going to read is copied from Unicode Official Website and Roozbeh Pournader text, so that the main writers are them. I just summarized and categorized text for complete information and revise it in some parts.

What is inside of Unicode 6.0

Version 6.0 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers.

The code charts show representative glyphs for all the Unicode characters.

The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard.

The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

“The links for most Version 6.0.0 chapters, and the front and back matter of the core specification are not yet active, because that text is still undergoing its last stage of editorial reviewUnicode Official Website said. “These links will be activated over the next several months, once the editorial review is complete”.

For Unicode 6.0.0 in particular two additional sets of code chart pages are provided:

  •    A set of delta code charts showing only the new blocks for Unicode 6.0.0 and any existing blocks for which new characters were added in Unicode 6.0.0. All new characters are visually highlighted in those charts.
  •    A set of archival code charts that represent the entire set of characters, names and representative glyphs at the time of publication of Unicode 6.0.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

What is new in Unicode 6.0?

1  –  2088 new characters have been added, including

  •    Over 1,000 additional symbols—chief among them the additional Emoji symbols, which are especially important for mobile phones.
  • The new official Indian currency symbol: the Indian Rupee Sign.
  • 222 additional CJK Unified Ideographs in common use in China, Taiwan, and Japan.
  • 603 additional characters for African language support, including extensions to the Tifinagh, Ethiopic, and Bamum scripts.
  • Three additional scripts: Mandaic, Batak, and Brahmi.

2  – Some new properties and data files have been added including

  •    A data file, EmojiSources.txt, which maps the Emoji symbols to their original Japanese telco source sets
  •    Two provisional properties for support of Indic scripts: IndicMatraCategory and IndicSyllabicCategory
  •    Provisional script extension data for use in segmentation, regular expressions, and spoof detection

3  – Some character properties for existing characters have been corrected including

  •    Property value updates to 36 non-CJK characters
  •    Numerous improvements to provisional properties for CJK Unified Ideographs
  •    Format updates for many normative IRG source tags, to better synchronize with ISO/IEC 10646 (see UAX #38, Unicode Han Database, for details)

4  – Amends the text of the Standard

  •    Many changes to the core specification, listed in D. Textual Changes and Character Additions
  •    Small clarifications of the conformance clauses in UAX #9, The Unicode Bidirectional Algorithm, but no significant changes to conformance requirements
  •    Major editorial revisions of UAX #44, Unicode Character Database, and UAX #15, Unicode Normalization Forms, but no significant changes to conformance requirements

5  – Provides format improvements, including

  •    Charts for CJK Compatibility Ideographs are now laid out in a multicolumn format showing sources, comparable to the structure of the charts for the CJK Unified Ideographs

UTS #10, Unicode Collation Algorithm, and UTS #46, Unicode IDNA Compatibility Processing, maintained in synchrony with the Unicode Standard, and have updates for Version 6.0.

The repertoire for Unicode Version 6.0 includes all the characters of the Second Edition, plus one additional character U+20B9 INDIAN RUPEE SIGN, which is still in the process of addition to 10646.

Character Assignment Overview

230 characters have been added to the BMP, while 1,858 characters have been added in the supplementary planes. For the first time in the history of the Unicode Standard, the majority of the regular encoded characters (graphic and format) are not in the BMP.

Most character additions are in new blocks, but there are also character additions to a number of existing blocks.

The following table shows the allocation of code points in Unicode 6.0, by character type. It highlights the numbers for the BMP and the supplementary planes separately. For more information on the specific characters newly assigned in Unicode 6.0, see the file DerivedAge.txt in the Unicode Character Database. For more details regarding character counts, see Appendix D, Changes from Previous Versions.

Type BMP Supplementary Total
Graphic 54,495 54,852 109,242
Format 37 105 142
Control 65 0 65
Private Use 6,400 131,068 137,468
Surrogate 2,048 0 2,048
Noncharacter 34 32 66
Reserved 2,457 862,624 865,081

New Blocks

The newly-defined blocks in Version 6.0 are:

0840..085F Mandaic
1BC0..1BFF Batak
AB00..AB2F Ethiopic Extended-A
11000..1107F Brahmi
16800..16A3F Bamum Supplement
1B000..1B0FF Kana Supplement
1F0A0..1F0FF Playing Cards
1F300..1F5FF Miscellaneous Symbols And Pictographs
1F600..1F64F Emoticons
1F680..1F6FF Transport And Map Symbols
1F700..1F77F Alchemical Symbols
2B740..2B81F CJK Unified Ideographs Extension D

Text Changes and Additions

Numbers indicate the chapter or section in the Unicode 6.0 core specification where there are some significant changes or additions. This list is not exhaustive. Select changes for Chapter 3, Conformance, are listed separately under E. Conformance Changes. Many figures have been updated or added throughout.

  •    Preface: Rewrote extensively
  •    5.17: Updated shift/rotate in UTF8/UTF16 binary order algorithm
  •    6.2: Documented dandas
  •    7.1: Added new text on Latvian (and Sorbian) letters in Latin Extended-D
  •    8.2: Updates to Arabic, including Arabic pedagogical symbols (nuktas) and Kashmiri additions for Arabic
  •    9: Various updates to Indic, including additions to tables of vowel letters
  •    9.1: Updates to Devanagari, including Kashmiri additions for Devanagari
  •    9.5: Added text on Oriya fraction signs
  •    9.6: Improvements to Tamil
  •    9.9: Added text on new Malayalam characters, including Dot Reph
  •    10.2: Various updates to Tibetan
  •    11.13: Various updates to Balinese
  •    11.14: Various updates to Javanese
  •    12.1: Various updates to Han; added new section on CJK Extension D
  •    12.4: Added new subsection on Kana Supplement in Hiragana and Katakana
  •    12.6: Various updates to Hangul
  •    13.1: New text on Ethiopic additions in Ethiopic Extended-A
  •    13.4: Added new text on Tifinagh bi-consonants
  •    13.7: Added new text for Bamum Supplement
  •    15: New text on Emoji affecting the description of the following code ranges:
          o    2300-23FF Miscellaneous Technical
          o    2700-27BF Dingbats
          o    1F0A0-1F0FF Playing Cards
          o    1F100-1F1FF Enclosed Alphanumeric Supplement
          o    1F300-1F5FF Miscellaneous Symbols And Pictographs
          o    1F600-1F64F Emoticons
          o    1F680-1F6FF Transport And Map symbols
  •    15.1: New text on U+20B9 INDIAN RUPEE SIGN
  •    15.8: New subsection on Alchemical Symbols
  •    16.8: Updates regarding annotation characters and bidi
  •    17.2: Updated to note the new presentation format for compatibility ideographs
  •    Appendices and Back Matter: various updates
  •    Han Radical-Stroke Index now online only; introductory material moved to chapter 12

Unicode Character Database changes

  •    A general category change to two Kannada characters (U+0CF1, U+0CF2), which has the effect of making them newly eligible for inclusion in identifiers
  •    A general category change to one New Tai Lue numeric character (U+19CA), which would have the effect of disqualifying it from inclusion in identifiers unless grandfathering measures are in place for the defining identifier syntax
  •    Changes to ten characters affecting the determination of script runs
  •    The formal deprecation of one Arabic character
  •    Reversal of the default grapheme cluster boundary determination for Thai and Lao to the behavior specified in Unicode 5.0

Other significant changes include:

  •    Addition of the EmojiSources.txt data file, detailing source mapping information for the Emoji characters
  •    Addition of the provisional ScriptExtensions.txt data file, providing information about use of certain characters with multiple scripts
  •    Addition of new provisional properties related to the structure of syllables in Indic scripts
  •    Deprecation of several derived properties related to Unicode normalization
  •    Improvement of the LineBreakTest.txt and BidiTest.txt files

Unicode 6.0.0 Standard Annexes

UAX #9: The Bidirectional Algorithm
UAX #11: East Asian Width
UAX #14: Line Breaking Properties
UAX #15: Unicode Normalization Forms
UAX #24: Script Names
UAX #29: Text Boundaries
UAX #31: Identifier and Pattern Syntax
UAX #34: Unicode Named Character Sequences
UAX #38: Unicode Han Database (Unihan)
UAX #41: Common References for Unicode Standard Annexes
UAX #42: An XML Representation of the UCD
UAX #44: Unicode Character Database

Related Links

Unicode Official Website

Unicode 6.0.0 Information

Latest Code Chart

Archive Code Chart

Delta Code Charts Unicode 6.0 additions are highlighted

Unicode 5.2.0 Core Specifications

13 Responses to Unicode 6.0 Released

  1. potenzmittel says:

    It sounds like you’re creating problems yourself by trying to solve this issue instead of looking at why
    their is a problem in the first place

  2. You made some good points there. I did a search on the topic and found most people will agree with
    your blog.

  3. As a Newbie, I am always searching online for articles that can help me. Thank you

  4. pc spiele says:

    Super-Duper site! I am loving it!! Will come back again – taking you feeds also, Thanks.

  5. pc spiele says:

    Strange this post is totaly unrelated to what I was searching google for, but it was listed on the first page. I guess your doing something right if Google likes you enough to put you on the first page of a non related search.

  6. Every time I see blogs as good as this because I should stop bludging and start working on mine.Thanks

  7. Please, can you PM me and tell me few more thinks about this, I am really fan of your blog… 34

  8. This is a really good read for me, Must admit that you are one of the best bloggers I ever saw.Thanks for posting this informative article.

  9. As a Newbie, I am always searching online for articles that can help me. Thank you

  10. Please, can you PM me and tell me few more thinks about this, I am really fan of your blog…gets solved properly asap.

  11. what javascript error that I always I do tried to reply

  12. Hi I found your site by mistake when i was searching Google for this issue, I have to say your site is really helpful I also love the theme, its amazing!. I dont have that much time to read all your post at the moment but I have bookmarked it and also add your RSS feeds. I will be back in a day or two. thanks for a great site.

  13. ptc says:

    I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. In fact your creative writing abilities has inspired me to start my own Blog spot blog now. Greetings from DHAKA CITY.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: