HTML encoding (also called HTML escaping) replaces characters that have special meaning in HTML - the angle brackets that mark up tags, the ampersand that begins entities, and the quote characters that delimit attribute values - with named or numeric entity references. The browser then displays them as literal characters instead of treating them as syntax.
This encoder uses a small in-house mapping (ENTITY_MAP in the source above) for the most common named entities. Named mode escapes the five mandatory characters plus that curated symbol set; numeric mode walks the input by code point and escapes the five mandatory characters plus every non-ASCII or control character as a decimal reference. The five mandatory escapes for a safe HTML encoder are & (must come first - escaping it last would double-escape everything else), <, >, ", and ' for the apostrophe. Without ', attribute values single-quoted in HTML can be broken by injected single quotes - a real XSS vector.
Two output modes are exposed. Named entities like &copy; are human-readable; the HTML5 spec defines about 2,000 of them, though this tool emits only the curated set listed above. Numeric entities like &#169; (decimal) or &#xA9; (hex) work for any Unicode code point and are the only safe choice for characters outside the named set; numeric mode emits the decimal form here. The decoder understands all three styles - this tool's named entities plus any decimal or hex numeric reference - so anything you encode with this tool round-trips losslessly; named entities outside the curated set (pasted from elsewhere) are left untouched rather than decoded.
Where this matters most: server-side rendering. Frameworks like React (auto-escapes children), Vue (auto-escapes interpolations), and Django (auto-escapes via {{ }} unless you mark a string as |safe) handle this for you. Old-school string-concatenation rendering (PHP echo, raw template literals into innerHTML, dangerouslySetInnerHTML in React) does not - and that's where most XSS vulnerabilities live. Run any user-controlled string through HTML encoding before it touches innerHTML or its equivalent.
HTML encoding is context-specific. Inside an HTML element body, encoding the five characters above is sufficient. Inside an HTML attribute, you also need to consider the quoting style. Inside a <script> block, HTML encoding does nothing useful - you need JavaScript escaping. Inside a URL attribute (href, src), you need URL encoding instead. Inside a CSS context (<style>, style=), you need CSS escaping. Treating HTML encoding as a universal "sanitizer" is the most common security mistake - it isn't, it's only one of four context-specific encodings.
The decoder handles a more interesting edge case correctly: numeric references like &#x1F600; can encode characters outside the BMP (the smiley face emoji is at U+1F600). Because the decoder resolves each reference with String.fromCodePoint rather than the 16-bit String.fromCharCode, code points above 0xFFFF are reconstructed as a single character instead of being split into broken surrogate halves - so emoji and astral-plane scripts round-trip cleanly.
What this tool deliberately does not do: it does not parse or sanitize HTML. If you paste <script>evil()</script>, the tool encodes the angle brackets but does not reach in and remove the script element. Encoding makes it safe to display as text; if you want to remove dangerous tags entirely while keeping safe ones (a paste-from-Word workflow, say), use a real HTML sanitizer like DOMPurify, which runs in-browser and handles attribute filtering, URL scheme allowlists, and namespace coercion.