Remove Duplicates splits the pasted text on newline characters, then walks the array once, building a JavaScript Set of canonicalized keys. The first time a key is seen, the original line is pushed to the output and the key is recorded. Every later line that produces the same key is treated as a duplicate. This single-pass O(n) approach handles tens of thousands of lines in milliseconds because Set.has and Set.add are amortized O(1).
Canonicalization is configurable. Case-insensitive matching applies toLowerCase before hashing. Whitespace folding collapses runs of whitespace into a single space and trims the ends so that 'hello world' and ' hello world ' collapse to the same key. Punctuation stripping removes anything matching the regex /[^\w\s]/ before comparison. Each option only affects the comparison key, not the line that gets written to the output, so the original formatting is preserved.
Unicode normalization uses String.prototype.normalize. Many visually identical strings have multiple binary representations — the letter e-acute can be a single code point (U+00E9) or two code points (e plus combining acute, U+0065 U+0301). The 'normalize Unicode' switch runs both forms through NFC so they hash identically. The 'ignore accents' switch goes further: it decomposes to NFD, strips the combining-mark range U+0300 to U+036F, and matches accent-insensitively.
The tool keeps the first occurrence by default. There is no setting for 'keep last' because the trade-off is rarely useful on real data and always confusing when the duplicates differ in case or whitespace. If you specifically need the last occurrence, reverse the input, dedupe, and reverse the output again — the math works out the same.
Three output modes exist. 'Unique' returns the deduplicated list (default). 'Duplicates only' returns just the lines that appeared more than once, useful for audit trails. 'With counts' annotates each unique line with the number of occurrences, which is what you usually want when analyzing log files or survey responses.
Performance scales linearly with input size, but the comparison key calculation can dominate on very large inputs with all options enabled. For a million lines with case-folding, whitespace collapsing, punctuation stripping, and Unicode normalization, expect roughly five to fifteen seconds on a modern laptop. Plain exact-match dedupe of the same input runs in well under a second.
Output is exposed via Copy, Download (as a Blob in text/plain), and Replace Input (which feeds the result back into the input box for chaining with another tool such as Sort Lines or Case Converter). Nothing leaves the browser tab — there is no upload endpoint behind this page.