# 🏷️ | Language & Region Identifiers

## 🎯 Learning Objectives

- Understand locale identifiers and their structure
- Master BCP 47 language tags and IETF standards
- Distinguish between language, region, and script
- Implement proper locale detection and fallback
- Handle special cases and edge scenarios

## 📖 What is a Locale?

<span style="white-space: pre-wrap;">A </span>**locale**<span style="white-space: pre-wrap;"> is a set of parameters that defines the user's language, region, and cultural preferences. It determines how your application formats numbers, dates, currency, and displays text.</span>

### Locale Components

#### Language

The primary language being used (e.g., English, Spanish, Japanese)

`<span class="editor-theme-code">en, es, ja, zh</span>`

#### Region/Territory

The country or region affecting formats and conventions

`<span class="editor-theme-code">US, GB, CN, BR</span>`

#### Script (Optional)

The writing system used for the language

`<span class="editor-theme-code">Latn, Cyrl, Arab, Hans</span>`

#### Variant (Rare)

Specific dialectal or orthographic variations

`<span class="editor-theme-code">valencia, pinyin</span>`

## 🔤 BCP 47 Language Tags

**BCP 47**<span style="white-space: pre-wrap;"> (Best Current Practice 47) is the IETF standard for language tags. It defines how to construct identifiers that specify language, region, script, and variants.</span>

### BCP 47 Tag Structure

**language**-**Script**-**REGION**-**variant**

<span style="white-space: pre-wrap;">Example: </span>`<span class="editor-theme-code">zh-Hans-CN</span>`<span style="white-space: pre-wrap;"> = Chinese (Simplified script) as used in China</span>

<table id="bkmrk-componentformatstand" style="width: 100%; border-collapse: collapse; margin-top: 1rem;"><colgroup><col></col><col></col><col></col><col></col></colgroup><tbody><tr style="background: rgb(243, 156, 18); color: white;"><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Component

</th><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Format

</th><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Standard

</th><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Example

</th></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">**Language**

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">2-3 lowercase letters

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">ISO 639

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">en, es, zh, ar</span>`

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">**Script**

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">4 letters, title case

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">ISO 15924

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">Latn, Cyrl, Arab, Hans</span>`

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">**Region**

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">2 uppercase letters or 3 digits

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">ISO 3166-1

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">US, GB, CN, 001</span>`

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">**Variant**

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">5-8 alphanumeric

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">IANA registry

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">valencia, posix</span>`

</td></tr></tbody></table>

### Common BCP 47 Examples

<table id="bkmrk-bcp-47-tagdescriptio" style="width: 100%; border-collapse: collapse;"><colgroup><col></col><col></col><col></col></colgroup><tbody><tr style="background: rgb(248, 249, 250);"><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">BCP 47 Tag

</th><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Description

</th><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221);">Use Case

</th></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">en-US</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">English (United States)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">MM/DD/YYYY, $ before amount

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">en-GB</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">English (United Kingdom)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">DD/MM/YYYY, £ before amount

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">es-ES</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Spanish (Spain)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Uses € and European Spanish

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">es-MX</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Spanish (Mexico)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Uses $ and Mexican Spanish

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">zh-Hans-CN</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Chinese (Simplified, China)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Simplified characters, Mainland

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">zh-Hant-TW</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Chinese (Traditional, Taiwan)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Traditional characters, Taiwan

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">ar-SA</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Arabic (Saudi Arabia)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">RTL, Arabic numerals, Saudi riyal

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">pt-BR</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Portuguese (Brazil)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Brazilian Portuguese, R$ currency

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">pt-PT</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Portuguese (Portugal)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">European Portuguese, € currency

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">`<span class="editor-theme-code">fr-CA</span>`

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">French (Canada)

</td><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">Canadian French, $ currency

</td></tr></tbody></table>

#### ✅ Why Both Language AND Region Matter

**Same language, different formats:**

- `<span class="editor-theme-code">en-US</span>`<span style="white-space: pre-wrap;"> vs </span>`<span class="editor-theme-code">en-GB</span>`: "color" vs "colour", $ vs £, MM/DD vs DD/MM
- `<span class="editor-theme-code">es-ES</span>`<span style="white-space: pre-wrap;"> vs </span>`<span class="editor-theme-code">es-MX</span>`: € vs $, "ordenador" vs "computadora"
- `<span class="editor-theme-code">fr-FR</span>`<span style="white-space: pre-wrap;"> vs </span>`<span class="editor-theme-code">fr-CA</span>`: € vs $, some vocabulary differences
- `<span class="editor-theme-code">pt-BR</span>`<span style="white-space: pre-wrap;"> vs </span>`<span class="editor-theme-code">pt-PT</span>`: Significant spelling and vocabulary differences

## 🔍 Locale Detection &amp; Fallback

How do you determine a user's locale? There are multiple strategies, and you should use them in order of priority.

### Locale Detection Priority

1. **User Preference (Explicit Setting):**<span style="white-space: pre-wrap;"> Highest priority — user has explicitly selected their locale in settings</span>
2. **URL Parameter:**<span style="white-space: pre-wrap;"> </span>`<span class="editor-theme-code">?lang=en-GB</span>`<span style="white-space: pre-wrap;"> or </span>`<span class="editor-theme-code">/en-gb/page</span>`<span style="white-space: pre-wrap;"> — useful for switching without login</span>
3. **Cookie/Session:**<span style="white-space: pre-wrap;"> Previously saved preference from this device</span>
4. **Browser Accept-Language Header:**<span style="white-space: pre-wrap;"> </span>`<span class="editor-theme-code">Accept-Language: en-US,en;q=0.9,es;q=0.8</span>`
5. **IP Geolocation:**<span style="white-space: pre-wrap;"> Infer from user's location (least reliable, privacy concerns)</span>
6. **Default Fallback:**<span style="white-space: pre-wrap;"> Your application's default locale (usually </span>`<span class="editor-theme-code">en-US</span>`)

### Locale Detection Code Examples

#### JavaScript (Browser)

```
// Get browser's locale preferences
const userLocales = navigator.languages || [navigator.language];
console.log(userLocales);
// → ["en-US", "en", "es"]

// Get primary locale
const primaryLocale = navigator.language;
console.log(primaryLocale);
// → "en-US"

// Detect and use best available locale
function getBestLocale(supportedLocales, userPreferences) {
  // Try exact match first
  for (const userLocale of userPreferences) {
    if (supportedLocales.includes(userLocale)) {
      return userLocale;
    }
  }
  
  // Try language-only match (en-GB → en-US)
  for (const userLocale of userPreferences) {
    const language = userLocale.split('-')[0];
    const match = supportedLocales.find(l => l.startsWith(language));
    if (match) return match;
  }
  
  // Fallback to default
  return supportedLocales[0];
}

const supported = ['en-US', 'es-ES', 'fr-FR', 'de-DE'];
const userPrefs = ['en-GB', 'en', 'es'];
const bestLocale = getBestLocale(supported, userPrefs);
console.log(bestLocale);  // → "en-US" (language match)
```

#### Python (Server-side)

```
from flask import request
from babel import Locale, negotiate_locale

# Supported locales in your application
SUPPORTED_LOCALES = ['en_US', 'es_ES', 'fr_FR', 'de_DE', 'ja_JP']
DEFAULT_LOCALE = 'en_US'

def get_user_locale():
    # 1. Check user's explicit preference (from database/session)
    user_pref = session.get('locale')
    if user_pref and user_pref in SUPPORTED_LOCALES:
        return user_pref
    
    # 2. Check URL parameter
    url_locale = request.args.get('lang')
    if url_locale and url_locale in SUPPORTED_LOCALES:
        return url_locale
    
    # 3. Negotiate from Accept-Language header
    header_locales = request.accept_languages
    best_match = negotiate_locale(
        [str(l) for l in header_locales],
        SUPPORTED_LOCALES,
        sep='_'
    )
    if best_match:
        return best_match
    
    # 4. Fallback to default
    return DEFAULT_LOCALE

# Usage
locale = get_user_locale()
print(f"Using locale: {locale}")
```

#### ⚠️ Locale Fallback Chain

<span style="white-space: pre-wrap;">Always implement a </span>**fallback chain**<span style="white-space: pre-wrap;">. If you don't have </span>`<span class="editor-theme-code">zh-Hant-HK</span>`<span style="white-space: pre-wrap;"> (Traditional Chinese, Hong Kong), try falling back to:</span>

1. `<span class="editor-theme-code">zh-Hant-HK</span>`<span style="white-space: pre-wrap;"> (exact match) → not available</span>
2. `<span class="editor-theme-code">zh-Hant</span>`<span style="white-space: pre-wrap;"> (language + script) → try this</span>
3. `<span class="editor-theme-code">zh</span>`<span style="white-space: pre-wrap;"> (language only) → then this</span>
4. `<span class="editor-theme-code">en</span>`<span style="white-space: pre-wrap;"> (default language) → final fallback</span>

## 💻 Working with Locales in Code

#### JavaScript - Parsing and Validating Locales

```
// Check if locale is valid
function isValidLocale(locale) {
  try {
    Intl.NumberFormat(locale);
    return true;
  } catch (e) {
    return false;
  }
}

console.log(isValidLocale('en-US'));     // → true
console.log(isValidLocale('invalid'));   // → false

// Get canonical locale
const canonical = Intl.getCanonicalLocales('EN-us')[0];
console.log(canonical);  // → "en-US" (normalized)

// Parse locale components
function parseLocale(tag) {
  const parts = tag.split('-');
  return {
    language: parts[0]?.toLowerCase(),
    script: parts[1]?.length === 4 ? parts[1] : undefined,
    region: parts.find(p => p.length === 2)?.toUpperCase(),
  };
}

console.log(parseLocale('zh-Hans-CN'));
// → { language: 'zh', script: 'Hans', region: 'CN' }

// Using Intl.Locale (modern browsers)
const locale = new Intl.Locale('zh-Hans-CN');
console.log(locale.language);   // → "zh"
console.log(locale.script);     // → "Hans"
console.log(locale.region);     // → "CN"
console.log(locale.baseName);   // → "zh-Hans-CN"
```

#### Python - Working with Babel Locales

```
from babel import Locale, UnknownLocaleError

# Parse locale
try:
    locale = Locale.parse('zh_Hans_CN', sep='_')
    print(f"Language: {locale.language}")      # → zh
    print(f"Script: {locale.script}")          # → Hans
    print(f"Territory: {locale.territory}")    # → CN
    print(f"Display name: {locale.display_name}")  # → Chinese (Simplified, China)
except UnknownLocaleError:
    print("Invalid locale")

# Get English name for locale
locale = Locale.parse('fr_CA')
print(locale.get_display_name('en'))  # → "French (Canada)"
print(locale.get_display_name('fr'))  # → "français (Canada)"

# List all available locales
from babel.localedata import list as list_locales
all_locales = list_locales()
print(f"Available locales: {len(all_locales)}")
# → Available locales: 700+

# Locale negotiation
from babel import negotiate_locale

supported = ['en_US', 'es_ES', 'fr_FR']
user_prefs = ['de_DE', 'en_GB', 'en']
best = negotiate_locale(user_prefs, supported, sep='_')
print(best)  # → "en_US" (language fallback from en)
```

## 🌐 Special Cases &amp; Edge Scenarios

#### 🔤 Script Matters for Some Languages

**Chinese**<span style="white-space: pre-wrap;"> has two writing systems:</span>

- `<span class="editor-theme-code">zh-Hans</span>`<span style="white-space: pre-wrap;"> — Simplified Chinese (Mainland China, Singapore)</span>
- `<span class="editor-theme-code">zh-Hant</span>`<span style="white-space: pre-wrap;"> — Traditional Chinese (Taiwan, Hong Kong, Macau)</span>

<span style="white-space: pre-wrap;">Using just </span>`<span class="editor-theme-code">zh</span>`<span style="white-space: pre-wrap;"> is ambiguous and can lead to displaying the wrong script!</span>

#### 🌍 Language Without Region

<span style="white-space: pre-wrap;">Sometimes you have only </span>`<span class="editor-theme-code">en</span>`<span style="white-space: pre-wrap;"> without a region. What should you do?</span>

- **Option 1:**<span style="white-space: pre-wrap;"> Use a sensible default (e.g., </span>`<span class="editor-theme-code">en</span>`<span style="white-space: pre-wrap;"> → </span>`<span class="editor-theme-code">en-US</span>`)
- **Option 2:**<span style="white-space: pre-wrap;"> Use language-only formatting (may not be culturally appropriate)</span>
- **Option 3:**<span style="white-space: pre-wrap;"> Detect region from IP/browser and complete the locale</span>

#### 🔄 Format Separators: Underscore vs Hyphen

Different systems use different separators:

- **BCP 47 / IETF / JavaScript:**<span style="white-space: pre-wrap;"> </span>`<span class="editor-theme-code">en-US</span>`<span style="white-space: pre-wrap;"> (hyphen)</span>
- **POSIX / Python / Java:**<span style="white-space: pre-wrap;"> </span>`<span class="editor-theme-code">en_US</span>`<span style="white-space: pre-wrap;"> (underscore)</span>

<span style="white-space: pre-wrap;">Be prepared to convert between formats: </span>`<span class="editor-theme-code">en-US</span>`<span style="white-space: pre-wrap;"> ↔ </span>`<span class="editor-theme-code">en_US</span>`

#### ⚠️ Don't Use Locale for Authorization

**Never assume**<span style="white-space: pre-wrap;"> that </span>`<span class="editor-theme-code">locale = "de-DE"</span>`<span style="white-space: pre-wrap;"> means the user is in Germany or should see Germany-specific content. Users can set any locale regardless of location. Use separate mechanisms for:</span>

- **Locale:**<span style="white-space: pre-wrap;"> Formatting preferences (how to display data)</span>
- **Location/Region:**<span style="white-space: pre-wrap;"> What content/features to show (geo-restrictions, pricing)</span>

## 🎯 Best Practices Checklist

<table id="bkmrk-practicepriority%E2%9C%85-us" style="width: 100%; border-collapse: collapse;"><colgroup><col style="width: 60%;"></col><col></col></colgroup><tbody><tr style="background: rgb(243, 156, 18); color: white;"><th class="align-left" style="padding: 0.75rem; text-align: left; border: 1px solid rgb(221, 221, 221); width: 60%;">Practice

</th><th class="align-center" style="padding: 0.75rem; text-align: center; border: 1px solid rgb(221, 221, 221);">Priority

</th></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Use BCP 47 format for language tags (en-US, not en\_US in APIs)

</td><td class="align-center" style="background: rgb(231, 76, 60); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**CRITICAL**

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Always include both language AND region (en-US, not just en)

</td><td class="align-center" style="background: rgb(243, 156, 18); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**HIGH**

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Implement locale fallback chain (zh-Hant-HK → zh-Hant → zh → en)

</td><td class="align-center" style="background: rgb(231, 76, 60); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**CRITICAL**

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Let users explicitly choose their locale (don't just auto-detect)

</td><td class="align-center" style="background: rgb(243, 156, 18); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**HIGH**

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Validate locale codes before using them

</td><td class="align-center" style="background: rgb(243, 156, 18); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**HIGH**

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Store user's locale preference in profile/session

</td><td class="align-center" style="background: rgb(52, 152, 219); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**MEDIUM**

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Use script subtag for Chinese (zh-Hans vs zh-Hant)

</td><td class="align-center" style="background: rgb(231, 76, 60); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**CRITICAL**

</td></tr><tr style="background: rgb(248, 249, 250);"><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Don't confuse locale with user location/authorization

</td><td class="align-center" style="background: rgb(231, 76, 60); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**CRITICAL**

</td></tr><tr><td style="padding: 0.75rem; border: 1px solid rgb(221, 221, 221);">✅ Test locale detection with various browser/header configurations

</td><td class="align-center" style="background: rgb(243, 156, 18); padding: 0.75rem; border: 1px solid rgb(221, 221, 221); text-align: center; color: white;">**HIGH**

</td></tr></tbody></table>

## 📚 Additional Resources

- **BCP 47:**<span style="white-space: pre-wrap;"> </span>[RFC 5646 - Tags for Identifying Languages](https://www.rfc-editor.org/rfc/bcp/bcp47.txt)
- **IANA Language Subtag Registry:**<span style="white-space: pre-wrap;"> Official registry of language codes</span>
- **ISO 639:**<span style="white-space: pre-wrap;"> Language codes standard</span>
- **ISO 3166-1:**<span style="white-space: pre-wrap;"> Country/region codes standard</span>
- **ISO 15924:**<span style="white-space: pre-wrap;"> Script codes standard</span>
- **Unicode CLDR:**<span style="white-space: pre-wrap;"> Common Locale Data Repository</span>
- **Intl.Locale (JavaScript):**<span style="white-space: pre-wrap;"> MDN documentation</span>
- **Babel (Python):**<span style="white-space: pre-wrap;"> Locale handling library</span>

**Next Topic:**<span style="white-space: pre-wrap;"> Start Day of the Week →</span>