💻 Developer#UTF-8#CP949#한글깨짐#인코딩
UTF-8 vs CP949: Why Korean Text Gets Garbled
5 min read · Last updated: 2026-05-08
What is character encoding?
Computers store characters as numbers (bytes). A character encoding is the mapping that defines which number corresponds to which character. The same bytes will display as completely different characters if read with the wrong encoding.
UTF-8 vs CP949
| Attribute | UTF-8 | CP949 (extended EUC-KR) |
|---|---|---|
| Coverage | All Unicode characters worldwide | Korean + some Western European |
| Korean bytes/char | 3 bytes | 2 bytes |
| Standard | Unicode (international) | Microsoft extension |
| Typical environment | Web, Linux, macOS | Legacy Windows apps |
| BOM | Optional | None |
Why does Mojibake (garbled text) occur?
When a CP949-encoded file is opened as UTF-8, the 2-byte Korean character sequences do not conform to UTF-8 rules, so they are replaced with replacement characters (like ??? or ). This phenomenon is called mojibake (文字化け).
Example: "가" in CP949 = 0xB0 0xA1 → when misread as UTF-8, produces garbled output.
Detecting and converting encodings
- Check encoding: Most text editors show the current encoding in the status bar. On Linux/macOS, the
filecommand or Python'schardetlibrary can detect it programmatically. - Convert: Editors typically offer "Save with encoding" options. On the command line:
iconv -f CP949 -t UTF-8 input.txt > output.txt. - Web development: The
<meta charset="UTF-8">declaration must match the actual file encoding, otherwise browsers will misinterpret the file.
Key takeaways
- Encoding mismatch is the root cause of garbled Korean text.
- Always use UTF-8 for new projects — it is the international standard with the broadest compatibility.
- Develop a habit of checking the encoding before opening files.
- Convert CP949 ↔ UTF-8 with
iconvor your editor's encoding conversion feature.