๐ท๏ธ | Language & Region Identifiers
๐ฏ Learning Objectives
- Understand locale identifiers and their structure
- Master BCP 47 language tags and IETF standards
- Distinguish between language, region, and script
- Implement proper locale detection and fallback
- Handle special cases and edge scenarios
๐ What is a Locale?
A locale is a set of parameters that defines the user's language, region, and cultural preferences. It determines how your application formats numbers, dates, currency, and displays text.
Locale Components
Language
The primary language being used (e.g., English, Spanish, Japanese)
en, es, ja, zh
Region/Territory
The country or region affecting formats and conventions
US, GB, CN, BR
Script (Optional)
The writing system used for the language
Latn, Cyrl, Arab, Hans
Variant (Rare)
Specific dialectal or orthographic variations
valencia, pinyin
๐ค BCP 47 Language Tags
BCP 47 (Best Current Practice 47) is the IETF standard for language tags. It defines how to construct identifiers that specify language, region, script, and variants.
BCP 47 Tag Structure
language-Script-REGION-variant
Example: zh-Hans-CN = Chinese (Simplified script) as used in China
Component | Format | Standard | Example |
|---|---|---|---|
Language | 2-3 lowercase letters | ISO 639 |
|
Script | 4 letters, title case | ISO 15924 |
|
Region | 2 uppercase letters or 3 digits | ISO 3166-1 |
|
Variant | 5-8 alphanumeric | IANA registry |
|
Common BCP 47 Examples
BCP 47 Tag | Description | Use Case |
|---|---|---|
| English (United States) | MM/DD/YYYY, $ before amount |
| English (United Kingdom) | DD/MM/YYYY, ยฃ before amount |
| Spanish (Spain) | Uses โฌ and European Spanish |
| Spanish (Mexico) | Uses $ and Mexican Spanish |
| Chinese (Simplified, China) | Simplified characters, Mainland |
| Chinese (Traditional, Taiwan) | Traditional characters, Taiwan |
| Arabic (Saudi Arabia) | RTL, Arabic numerals, Saudi riyal |
| Portuguese (Brazil) | Brazilian Portuguese, R$ currency |
| Portuguese (Portugal) | European Portuguese, โฌ currency |
| French (Canada) | Canadian French, $ currency |
โ Why Both Language AND Region Matter
Same language, different formats:
en-USvsen-GB: "color" vs "colour", $ vs ยฃ, MM/DD vs DD/MMes-ESvses-MX: โฌ vs $, "ordenador" vs "computadora"fr-FRvsfr-CA: โฌ vs $, some vocabulary differencespt-BRvspt-PT: Significant spelling and vocabulary differences
๐ Locale Detection & Fallback
How do you determine a user's locale? There are multiple strategies, and you should use them in order of priority.
Locale Detection Priority
- User Preference (Explicit Setting): Highest priority โ user has explicitly selected their locale in settings
- URL Parameter:
?lang=en-GBor/en-gb/pageโ useful for switching without login - Cookie/Session: Previously saved preference from this device
- Browser Accept-Language Header:
Accept-Language: en-US,en;q=0.9,es;q=0.8 - IP Geolocation: Infer from user's location (least reliable, privacy concerns)
- Default Fallback: Your application's default locale (usually
en-US)
Locale Detection Code Examples
JavaScript (Browser)
// Get browser's locale preferences
const userLocales = navigator.languages || [navigator.language];
console.log(userLocales);
// โ ["en-US", "en", "es"]
// Get primary locale
const primaryLocale = navigator.language;
console.log(primaryLocale);
// โ "en-US"
// Detect and use best available locale
function getBestLocale(supportedLocales, userPreferences) {
// Try exact match first
for (const userLocale of userPreferences) {
if (supportedLocales.includes(userLocale)) {
return userLocale;
}
}
// Try language-only match (en-GB โ en-US)
for (const userLocale of userPreferences) {
const language = userLocale.split('-')[0];
const match = supportedLocales.find(l => l.startsWith(language));
if (match) return match;
}
// Fallback to default
return supportedLocales[0];
}
const supported = ['en-US', 'es-ES', 'fr-FR', 'de-DE'];
const userPrefs = ['en-GB', 'en', 'es'];
const bestLocale = getBestLocale(supported, userPrefs);
console.log(bestLocale); // โ "en-US" (language match)
Python (Server-side)
from flask import request
from babel import Locale, negotiate_locale
# Supported locales in your application
SUPPORTED_LOCALES = ['en_US', 'es_ES', 'fr_FR', 'de_DE', 'ja_JP']
DEFAULT_LOCALE = 'en_US'
def get_user_locale():
# 1. Check user's explicit preference (from database/session)
user_pref = session.get('locale')
if user_pref and user_pref in SUPPORTED_LOCALES:
return user_pref
# 2. Check URL parameter
url_locale = request.args.get('lang')
if url_locale and url_locale in SUPPORTED_LOCALES:
return url_locale
# 3. Negotiate from Accept-Language header
header_locales = request.accept_languages
best_match = negotiate_locale(
[str(l) for l in header_locales],
SUPPORTED_LOCALES,
sep='_'
)
if best_match:
return best_match
# 4. Fallback to default
return DEFAULT_LOCALE
# Usage
locale = get_user_locale()
print(f"Using locale: {locale}")
โ ๏ธ Locale Fallback Chain
Always implement a fallback chain. If you don't have zh-Hant-HK (Traditional Chinese, Hong Kong), try falling back to:
zh-Hant-HK(exact match) โ not availablezh-Hant(language + script) โ try thiszh(language only) โ then thisen(default language) โ final fallback
๐ป Working with Locales in Code
JavaScript - Parsing and Validating Locales
// Check if locale is valid
function isValidLocale(locale) {
try {
Intl.NumberFormat(locale);
return true;
} catch (e) {
return false;
}
}
console.log(isValidLocale('en-US')); // โ true
console.log(isValidLocale('invalid')); // โ false
// Get canonical locale
const canonical = Intl.getCanonicalLocales('EN-us')[0];
console.log(canonical); // โ "en-US" (normalized)
// Parse locale components
function parseLocale(tag) {
const parts = tag.split('-');
return {
language: parts[0]?.toLowerCase(),
script: parts[1]?.length === 4 ? parts[1] : undefined,
region: parts.find(p => p.length === 2)?.toUpperCase(),
};
}
console.log(parseLocale('zh-Hans-CN'));
// โ { language: 'zh', script: 'Hans', region: 'CN' }
// Using Intl.Locale (modern browsers)
const locale = new Intl.Locale('zh-Hans-CN');
console.log(locale.language); // โ "zh"
console.log(locale.script); // โ "Hans"
console.log(locale.region); // โ "CN"
console.log(locale.baseName); // โ "zh-Hans-CN"
Python - Working with Babel Locales
from babel import Locale, UnknownLocaleError
# Parse locale
try:
locale = Locale.parse('zh_Hans_CN', sep='_')
print(f"Language: {locale.language}") # โ zh
print(f"Script: {locale.script}") # โ Hans
print(f"Territory: {locale.territory}") # โ CN
print(f"Display name: {locale.display_name}") # โ Chinese (Simplified, China)
except UnknownLocaleError:
print("Invalid locale")
# Get English name for locale
locale = Locale.parse('fr_CA')
print(locale.get_display_name('en')) # โ "French (Canada)"
print(locale.get_display_name('fr')) # โ "franรงais (Canada)"
# List all available locales
from babel.localedata import list as list_locales
all_locales = list_locales()
print(f"Available locales: {len(all_locales)}")
# โ Available locales: 700+
# Locale negotiation
from babel import negotiate_locale
supported = ['en_US', 'es_ES', 'fr_FR']
user_prefs = ['de_DE', 'en_GB', 'en']
best = negotiate_locale(user_prefs, supported, sep='_')
print(best) # โ "en_US" (language fallback from en)
๐ Special Cases & Edge Scenarios
๐ค Script Matters for Some Languages
Chinese has two writing systems:
zh-Hansโ Simplified Chinese (Mainland China, Singapore)zh-Hantโ Traditional Chinese (Taiwan, Hong Kong, Macau)
Using just zh is ambiguous and can lead to displaying the wrong script!
๐ Language Without Region
Sometimes you have only en without a region. What should you do?
- Option 1: Use a sensible default (e.g.,
enโen-US) - Option 2: Use language-only formatting (may not be culturally appropriate)
- Option 3: Detect region from IP/browser and complete the locale
๐ Format Separators: Underscore vs Hyphen
Different systems use different separators:
- BCP 47 / IETF / JavaScript:
en-US(hyphen) - POSIX / Python / Java:
en_US(underscore)
Be prepared to convert between formats: en-US โ en_US
โ ๏ธ Don't Use Locale for Authorization
Never assume that locale = "de-DE" means the user is in Germany or should see Germany-specific content. Users can set any locale regardless of location. Use separate mechanisms for:
- Locale: Formatting preferences (how to display data)
- Location/Region: What content/features to show (geo-restrictions, pricing)
๐ฏ Best Practices Checklist
Practice | Priority |
|---|---|
โ Use BCP 47 format for language tags (en-US, not en_US in APIs) | CRITICAL |
โ Always include both language AND region (en-US, not just en) | HIGH |
โ Implement locale fallback chain (zh-Hant-HK โ zh-Hant โ zh โ en) | CRITICAL |
โ Let users explicitly choose their locale (don't just auto-detect) | HIGH |
โ Validate locale codes before using them | HIGH |
โ Store user's locale preference in profile/session | MEDIUM |
โ Use script subtag for Chinese (zh-Hans vs zh-Hant) | CRITICAL |
โ Don't confuse locale with user location/authorization | CRITICAL |
โ Test locale detection with various browser/header configurations | HIGH |
๐ Additional Resources
- BCP 47: RFC 5646 - Tags for Identifying Languages
- IANA Language Subtag Registry: Official registry of language codes
- ISO 639: Language codes standard
- ISO 3166-1: Country/region codes standard
- ISO 15924: Script codes standard
- Unicode CLDR: Common Locale Data Repository
- Intl.Locale (JavaScript): MDN documentation
- Babel (Python): Locale handling library
Next Topic: Start Day of the Week โ
No comments to display
No comments to display