Demystifying Unicode Text Display: From Unicode Code Points to Positioned Glyphs
Text display & Fonts
- Characters
- Code points
- String
- Glyphs
- Glyph run
- Font family
- Font style
Basic glyph layout:
Common: single line, basic latin characters only
How to draw a glyph - single line: starting point align with ending point
Map from glyphs to the original characters & get the dimensions for a glyph run.
Font data:
- Font file (contains strings for name, vendor...; glyph dat, metric data...
- data table
- Glyph data
Mapping code points to glyphs: map a single unicode code point to a glyph ID
Advanced layout requirments
- Kerning
- word position context (arabic)
- cluster sequence context
- glyph-glyph context
Alternate glyph substitution (cont'd)
- Typographic ligatures
- Typographic small caps (synthetic small caps vs Alternate small caps)
BIDI
Implications
- Advanced line layout is required for high quality typography. & many scripts
- Complex character-to-glyph associations
- Default glyph metrics alone don't determine final positions
- Additional software logic is required
- Font specific detailes
Opentype Layout
Requires...
- General advanced layout logic
- Certain script behaviour logic: detailes that can be informed by the unicode character sequence independent of the font used & text shaping engine.
Advanced layout font tables
- Glpyph Definition Table GDEF
- Glyph substitution table GSUB
- Glyph positioning table GPOS
Substitution/positioning actions
CoreText/DirectWrite/Halfbuzz (apple/win/linux)
Run Segmentation
Segment the string into seperate runs
Script runs. "itemization"
BIDI algorithm to get BIDI level runs
Script Itemization
...
Bidi level run segmentation
...
Unicode bidi algorithm uses Bidi_Class char properties
Shaping
....
Positioning
...
Conclusion
We can map from...