HTML encoding (also called HTML escaping) replaces characters that have special meaning in HTML - the angle brackets that mark up tags, the ampersand that begins entities, and the quote characters that delimit attribute values - with named or numeric entity references. The browser then displays them as literal characters instead of treating them as syntax.
This encoder uses a small in-house mapping (ENTITY_MAP in the source above) for the most common entities and falls back to numeric references via charCodeAt for anything outside the named set. The five mandatory escapes for a safe HTML encoder are & (must come first - escaping it last would double-escape everything else), <, >, ", and ' for the apostrophe. Without ', attribute values single-quoted in HTML can be broken by injected single quotes - a real XSS vector.
Two output modes are exposed. Named entities like &copy; are human-readable and standardised in the HTML5 spec - there are about 2,000 of them, covering almost every glyph you'd realistically need. Numeric entities like &#169; (decimal) or &#xA9; (hex) work for any Unicode code point and are the only safe choice for characters outside the named set. The decoder accepts all three styles - named, decimal numeric, and hex numeric - so a round-trip never loses data.
Where this matters most: server-side rendering. Frameworks like React (auto-escapes children), Vue (auto-escapes interpolations), and Django (auto-escapes via {{ }} unless you mark a string as |safe) handle this for you. Old-school string-concatenation rendering (PHP echo, raw template literals into innerHTML, dangerouslySetInnerHTML in React) does not - and that's where most XSS vulnerabilities live. Run any user-controlled string through HTML encoding before it touches innerHTML or its equivalent.
HTML encoding is context-specific. Inside an HTML element body, encoding the five characters above is sufficient. Inside an HTML attribute, you also need to consider the quoting style. Inside a <script> block, HTML encoding does nothing useful - you need JavaScript escaping. Inside a URL attribute (href, src), you need URL encoding instead. Inside a CSS context (<style>, style=), you need CSS escaping. Treating HTML encoding as a universal "sanitizer" is the most common security mistake - it isn't, it's only one of four context-specific encodings.
The decoder handles a more interesting edge case: numeric references like &#x1F600; can encode characters outside the BMP (the smiley face emoji is at U+1F600). String.fromCharCode handles 16-bit code units, so for code points above 0xFFFF you technically need String.fromCodePoint to avoid splitting them into surrogate pairs. The current implementation handles standard cases well; for full emoji round-trip safety, run the output through a Unicode normalization step.
What this tool deliberately does not do: it does not parse or sanitize HTML. If you paste <script>evil()</script>, the tool encodes the angle brackets but does not reach in and remove the script element. Encoding makes it safe to display as text; if you want to remove dangerous tags entirely while keeping safe ones (a paste-from-Word workflow, say), use a real HTML sanitizer like DOMPurify, which runs in-browser and handles attribute filtering, URL scheme allowlists, and namespace coercion.