Skip to main content

Demystifying Unicode Text Display: From Unicode Code Points to Positioned Glyphs

Text display & Fonts

  • Characters
  • Code points
  • String
  • Glyphs
  • Glyph run
  • Font family
  • Font style

Basic glyph layout:

Common: single line, basic latin characters only

How to draw a glyph - single line: starting point align with ending point

Map from glyphs to the original characters & get the dimensions for a glyph run.

Font data:
  • Font file (contains strings for name, vendor...; glyph dat, metric data...
  • data table
  • Glyph data

Mapping code points to glyphs: map a single unicode code point to a glyph ID

Advanced layout requirments
  • Kerning
  • word position context (arabic)
  • cluster sequence context
  • glyph-glyph context

Alternate glyph substitution (cont'd)

  • Typographic ligatures
  • Typographic small caps (synthetic small caps vs Alternate small caps)
BIDI


Implications

  • Advanced line layout is required for high quality typography. & many scripts
  • Complex character-to-glyph associations
  • Default glyph metrics alone don't determine final positions
  • Additional software logic is required
  • Font specific detailes

Opentype Layout

Requires...
  • General advanced layout logic
  • Certain script behaviour logic: detailes that can be informed by the unicode character sequence independent of the font used & text shaping engine.
Advanced layout font tables
  • Glpyph Definition Table GDEF
  • Glyph substitution table GSUB
  • Glyph positioning table GPOS

Substitution/positioning actions

CoreText/DirectWrite/Halfbuzz (apple/win/linux)


Run Segmentation

Segment the string into seperate runs

Script runs. "itemization"

BIDI algorithm to get BIDI level runs


Script Itemization

...

Bidi level run segmentation

...

Unicode bidi algorithm uses Bidi_Class char properties

Shaping

....

  1. Canonical Decomposition [UAX #15]
  2. Cluster Analysis [UAX #29]
Positioning

...

Conclusion

We can map from...


Font Fallback