Remove Duplicates

Remove duplicate lines with advanced matching options and duplicate reporting.

100% private

Paste a list and strip out duplicate lines, preserving the original order of first occurrences. Case-sensitive and case-insensitive modes both work; trim-whitespace mode catches the kind of dupes that differ only by trailing spaces. Useful for cleaning email lists scraped from multiple sources, deduplicating SKU exports before importing, or merging team rosters across spreadsheets. The deduplication is a one-pass set operation in your browser — no list ever gets sent over the network.

0 lines0 chars
0 lines
Options
Runs right inside your browser tab. No uploads. Your files stay private.

How Remove Duplicates Detects Repeated Lines

Remove Duplicates splits the pasted text on newline characters, then scans the lines using a hash map of canonicalized keys. The first time a key is seen, the original line is preserved in the output and the key is recorded; every later line that produces the same key is treated as a duplicate. This linear-time approach handles tens of thousands of lines in milliseconds because map lookups and insertions are amortized O(1).
Canonicalization is configurable. Case-insensitive matching applies toLowerCase before hashing. Whitespace folding collapses runs of whitespace into a single space and trims the ends so that 'hello world' and ' hello world ' collapse to the same key. Punctuation stripping removes anything matching the regex /[^\w\s]/ before comparison. Each option only affects the comparison key, not the line that gets written to the output, so the original formatting is preserved.
Unicode normalization uses String.prototype.normalize. Many visually identical strings have multiple binary representations — the letter e-acute can be a single code point (U+00E9) or two code points (e plus combining acute, U+0065 U+0301). The 'normalize Unicode' switch runs both forms through NFC so they hash identically. The 'ignore accents' switch goes further: it decomposes to NFD, strips the combining-mark range U+0300 to U+036F, and matches accent-insensitively.
The tool keeps the first occurrence by default. Flip the Keep first / Keep last switch in the Options panel to keep the last occurrence of each duplicate instead — either way the original line text is preserved verbatim, only which copy survives changes.
Three output modes exist. 'Unique' returns the deduplicated list (default). 'Duplicates only' returns just the lines that appeared more than once, useful for audit trails. 'With counts' lists those same repeated lines, each prefixed with how many times it occurred in the format 3x: line — lines that appear only once are not included, which is what you usually want when surfacing the repeats in log files or survey responses.
Performance scales linearly with input size, but the comparison key calculation can dominate on very large inputs with all options enabled. For a million lines with case-folding, whitespace collapsing, punctuation stripping, and Unicode normalization, expect roughly five to fifteen seconds on a modern laptop. Plain exact-match dedupe of the same input runs in well under a second.
Output is exposed via Copy, Download (as a Blob in text/plain), a JSON export of the full duplicate report, and Swap, which feeds the result back into the input box so you can run a second pass with different options. Nothing leaves the browser tab — there is no upload endpoint behind this page.

Common Use Cases

01

Mailing list deduplication

Strip duplicate email addresses from a list, optionally lowercasing first so 'Bob@example.com' and 'bob@example.com' collapse.

02

Log file cleanup

Remove repeated INFO and WARN lines from server logs to surface unique events for review.

03

CSV row dedupe

Paste a column extracted from a spreadsheet and produce a clean unique list ready to paste back into Sheets or Excel.

04

SEO keyword consolidation

Merge keyword exports from multiple tools and dedupe across the union, ignoring case and surrounding whitespace.

Frequently Asked Questions

First by default. The deduplication walks the input top-to-bottom and the first time a normalized key is seen, that line is preserved verbatim. Flip the Keep first / Keep last switch in the Options panel to keep the last occurrence of each duplicate instead.
Two lines are duplicates if their normalized keys match. The key is the line itself by default, modified by whichever options you enable: case-insensitive, whitespace-folded, punctuation-stripped, NFC-normalized, or accent-stripped. The original line text is never modified — only the comparison key is normalized.
It uses String.prototype.normalize('NFC') to collapse equivalent code-point sequences. Many accented characters can be represented as either a single composed code point or a base letter plus a combining mark. NFC picks the composed form, which makes 'é' (U+00E9) and 'é' (U+0065 plus U+0301) compare equal. The accent-insensitive option goes further by stripping all combining marks.
By default the original order is preserved (the first occurrence keeps its original position). For sorted output, enable 'Sort alphabetically' in the Options panel; 'Reverse order' is also available. For more advanced sorting you can hand the result to the Sort Lines tool with the Swap button, which moves the output back into the input box.
It lists the lines that appeared more than once, each prefixed with how many times it occurred. The format is the count, then 'x: ', then the line — for example '3x: apple'. Lines that appear only once are not included. Useful for log analysis and survey response tabulation where you only care about the repeats.
Memory is the only ceiling. A million ASCII lines with no normalization runs in well under a second and uses about 50 to 100 MB of browser memory. Enabling every normalization option on the same input takes five to fifteen seconds because each line builds a normalized key. For multi-million-line inputs, consider doing the work in a script with a streaming approach.
By default the 'Remove empty lines' option is ON, so every blank line is stripped before deduping. Turn it off if you want blank lines treated as ordinary values — then the first blank line is kept and only repeated blanks are removed. Leaving 'Remove empty lines' on (optionally with 'Trim whitespace') produces the cleanest list when input came from a copy-paste.
No. The dedupe walk runs synchronously inside the page using a JavaScript hash map. There is no fetch call and no analytics on the input. You can disconnect from the network after the page loads and the tool keeps working.
Trailing commas, quoting differences ("foo, bar" vs foo, bar), and stray whitespace produce different keys. Enable whitespace-folding and punctuation-stripping if you only care about the underlying field values, or pre-process the CSV through a parser that normalizes the row format before pasting.
Yes. Switch to 'duplicates only' mode and the output becomes just the lines that appeared more than once. Each duplicate appears once in that view; switch to 'with counts' if you also need the multiplicities.
Maintained by the WebToolVerse teamLast updated Suggest an edit

Advertisement