Developer & Digital Tools · 🇰🇷 Korea

Korean Encoding Repair Tool

LIVE

Auto-detect and recover broken Korean text across EUC-KR, CP949, UTF-8 encodings.

About this tool

Korean Encoding Repair Tool (hangulfix) auto-detects EUC-KR / CP949 / UTF-8 / Latin-1 mis-encoding and restores garbled Korean text (e.g. ¾È³çÇϼ¼¿ä, 안녕) into "안녕하세요". It tries every major mojibake pattern — EUC-KR→UTF-8, CP949→UTF-8, UTF-8→Latin-1, double-encoded UTF-8, and URL-percent — and surfaces the top 5 candidates ranked by Hangul-validity score. Drag-and-drop ZIP filenames, DB dumps, email subjects, CSV/log files for batch repair. Everything runs in your browser, so corporate data never leaves the device.

Use cases

Scenario 1

Windows ZIP filenames on macOS

A ZIP created on Korean Windows extracted on macOS shows garbled filenames; the tool repairs the whole folder list back to clean Hangul.

Scenario 2

DB migration column corruption

After a MySQL latin1 → utf8mb4 migration, some Korean rows are mojibake. Paste the dump, pick the right candidate from 5 options, and re-import cleanly.

Scenario 3

Email subject headers

A vendor email subject reads "안녕". The tool detects double-encoded UTF-8 and recovers the original Hangul.

Scenario 4

Bulk CSV / log repair

Drop legacy-system CSV / log files; the tool guesses encoding line-by-line, repairs each, and returns a clean ZIP.

Features

  • Auto-detection of EUC-KR / CP949 / UTF-8 / Latin-1 mis-encoding
  • Top-5 candidates with Hangul validity score (precomposed / combining)
  • 5 dedicated modes: ZIP, DB, email, CSV, URL
  • Drag-and-drop batch repair with result ZIP export
  • 100% in-browser; corporate data never leaves the device
  • Heuristics inspired by Python’s ftfy library

Frequently asked

Q. Why does Korean text break in the first place?
A. It happens when the source program’s encoding (e.g., Korean Windows = CP949) differs from the reader’s encoding (macOS / Linux = UTF-8). The tool tries every plausible pairing to find the right one.
Q. What pattern is ¾È³çÇϼ¼¿ä?
A. It’s the classic case of "안녕하세요" stored as EUC-KR/CP949 then misread as Latin-1 (ISO-8859-1). The tool ranks "EUC-KR ↔ Latin-1" first for that signature.
Q. Is repair always 100% accurate?
A. Single-step mis-encoding is recovered at 95%+ accuracy. Multi-step or lossy cases may not have a perfect candidate among the top 5; in that case, retrieve the raw bytes and try again.
Q. Is it safe to upload company data?
A. Inputs and files are processed entirely in browser memory; nothing is sent to a server. You can verify with the DevTools Network tab.

Sources / references

Related tools

How we run it / disclaimer

This tool is advisory and does not constitute legal, tax, medical, or financial advice. All calculations and document generation run in your browser; inputs are never sent to a server. Ads follow Google AdSense policy and are kept separate from tool accuracy.