Question 1

Does the tool keep the first or last occurrence?

Accepted Answer

First by default. The deduplication walks the input top-to-bottom and the first time a normalized key is seen, that line is preserved verbatim. Flip the Keep first / Keep last switch in the Options panel to keep the last occurrence of each duplicate instead.

Question 2

What counts as a duplicate?

Accepted Answer

Two lines are duplicates if their normalized keys match. The key is the line itself by default, modified by whichever options you enable: case-insensitive, whitespace-folded, punctuation-stripped, NFC-normalized, or accent-stripped. The original line text is never modified — only the comparison key is normalized.

Question 3

How does Unicode normalization work?

Accepted Answer

It uses String.prototype.normalize('NFC') to collapse equivalent code-point sequences. Many accented characters can be represented as either a single composed code point or a base letter plus a combining mark. NFC picks the composed form, which makes 'é' (U+00E9) and 'é' (U+0065 plus U+0301) compare equal. The accent-insensitive option goes further by stripping all combining marks.

Question 4

Can I keep the original order or sort alphabetically?

Accepted Answer

By default the original order is preserved (the first occurrence keeps its original position). For sorted output, enable 'Sort alphabetically' in the Options panel; 'Reverse order' is also available. For more advanced sorting you can hand the result to the Sort Lines tool with the Swap button, which moves the output back into the input box.

Question 5

What does 'with counts' mode do?

Accepted Answer

It lists the lines that appeared more than once, each prefixed with how many times it occurred. The format is the count, then 'x: ', then the line — for example '3x: apple'. Lines that appear only once are not included. Useful for log analysis and survey response tabulation where you only care about the repeats.

Question 6

How big a list can I dedupe?

Accepted Answer

Memory is the only ceiling. A million ASCII lines with no normalization runs in well under a second and uses about 50 to 100 MB of browser memory. Enabling every normalization option on the same input takes five to fifteen seconds because each line builds a normalized key. For multi-million-line inputs, consider doing the work in a script with a streaming approach.

Question 7

Does it remove blank lines?

Accepted Answer

By default the 'Remove empty lines' option is ON, so every blank line is stripped before deduping. Turn it off if you want blank lines treated as ordinary values — then the first blank line is kept and only repeated blanks are removed. Leaving 'Remove empty lines' on (optionally with 'Trim whitespace') produces the cleanest list when input came from a copy-paste.

Question 8

Is my text uploaded?

Accepted Answer

No. The dedupe walk runs synchronously inside the page using a JavaScript hash map. There is no fetch call and no analytics on the input. You can disconnect from the network after the page loads and the tool keeps working.

Question 9

Why are my CSV rows not collapsing?

Accepted Answer

Trailing commas, quoting differences ("foo, bar" vs foo, bar), and stray whitespace produce different keys. Enable whitespace-folding and punctuation-stripping if you only care about the underlying field values, or pre-process the CSV through a parser that normalizes the row format before pasting.

Question 10

Can I export the duplicates I found?

Accepted Answer

Yes. Switch to 'duplicates only' mode and the output becomes just the lines that appeared more than once. Each duplicate appears once in that view; switch to 'with counts' if you also need the multiplicities.

Remove Duplicates

Next steps

Sort Lines

Find & Replace

Text Cleaner

Case Converter

How Remove Duplicates Detects Repeated Lines

Common Use Cases

Mailing list deduplication

Log file cleanup

CSV row dedupe

SEO keyword consolidation

Frequently Asked Questions