💻 Developer#UTF-8#CP949#한글깨짐#인코딩

UTF-8 vs CP949: Why Korean Text Gets Garbled

5 min read · Last updated: 2026-05-08

What is character encoding?

Computers store characters as numbers (bytes). A character encoding is the mapping that defines which number corresponds to which character. The same bytes will display as completely different characters if read with the wrong encoding.

UTF-8 vs CP949

Attribute	UTF-8	CP949 (extended EUC-KR)
Coverage	All Unicode characters worldwide	Korean + some Western European
Korean bytes/char	3 bytes	2 bytes
Standard	Unicode (international)	Microsoft extension
Typical environment	Web, Linux, macOS	Legacy Windows apps
BOM	Optional	None

Why does Mojibake (garbled text) occur?

When a CP949-encoded file is opened as UTF-8, the 2-byte Korean character sequences do not conform to UTF-8 rules, so they are replaced with replacement characters (like ??? or ). This phenomenon is called mojibake (文字化け).

Example: "가" in CP949 = 0xB0 0xA1 → when misread as UTF-8, produces garbled output.

Detecting and converting encodings

Check encoding: Most text editors show the current encoding in the status bar. On Linux/macOS, the file command or Python's chardet library can detect it programmatically.
Convert: Editors typically offer "Save with encoding" options. On the command line: iconv -f CP949 -t UTF-8 input.txt > output.txt.
Web development: The <meta charset="UTF-8"> declaration must match the actual file encoding, otherwise browsers will misinterpret the file.

Key takeaways

Encoding mismatch is the root cause of garbled Korean text.
Always use UTF-8 for new projects — it is the international standard with the broadest compatibility.
Develop a habit of checking the encoding before opening files.
Convert CP949 ↔ UTF-8 with iconv or your editor's encoding conversion feature.