Question 1

What invisible characters does it remove?

Accepted Answer

Control characters in U+0000 to U+001F (excluding tab, newline, carriage return) and U+007F to U+009F, the byte-order-mark U+FEFF when at the start of input, and the zero-width family U+200B (zero-width space), U+200C (zero-width non-joiner), U+200D (zero-width joiner), and U+2060 (word joiner). These are common pollution sources from rich-text copy operations.

Question 2

How are smart quotes converted?

Accepted Answer

Smart-to-straight is mechanical: U+201C and U+201D both become U+0022, U+2018 and U+2019 both become U+0027. Straight-to-smart works on balanced pairs — a regex matches each "..." or '...' span and rewrites it as an opening/closing curly pair, so well-formed quotations are converted directionally. An unpaired straight quote is left as-is.

Question 3

What happens to my line endings?

Accepted Answer

When 'fix line endings' is on, every CRLF, CR, and LF in the input is rewritten to whichever single style you select (LF by default, the Unix and macOS convention). This is essential for paste-from-Windows-into-shell-script scenarios where mixed endings cause `bash: command not found` errors.

Question 4

Does HTML stripping understand nested tags?

Accepted Answer

No. It uses the regex /<[^>]*>/g, which removes anything between angle brackets. Self-closing tags, comments, and CDATA blocks are all caught. Malformed input with unbalanced angle brackets may leak text — for those cases use a real HTML parser. For typical pasted web content, the regex approach is fast and correct.

Question 5

Why did my apostrophes get changed?

Accepted Answer

The smart-quotes toggle works in both directions. If you converted straight to smart and your text had ASCII apostrophes, they were rewritten to U+2019 (right single quotation mark). To reverse, run the same input through with the opposite direction selected, or paste a fresh copy and disable the option.

Question 6

Does it preserve my Unicode characters?

Accepted Answer

Yes by default. CJK, Cyrillic, Arabic, emoji, and accented Latin all pass through unchanged. The 'remove accents' toggle is the explicit opt-in that decomposes via NFD and strips combining marks, turning 'café' into 'cafe'. Without that option enabled, all Unicode is preserved.

Question 7

How big a document can it handle?

Accepted Answer

Memory-bound only. The full pipeline is O(n) over input length. Live preview recomputes on every keystroke, so very large inputs may show a small lag while you type — switch live preview off and click 'Clean' to run the pipeline once on demand instead.

Question 8

Are my settings synced across devices?

Accepted Answer

No. Settings are stored in localStorage under the key 'text-cleaner-settings' and live in the current browser only. There is no account system and no cloud sync — clear the key via DevTools > Application > Local Storage to reset to defaults.

Question 9

What does 'fix punctuation' do?

Accepted Answer

It corrects spacing around common punctuation: removes spaces before commas, periods, and other sentence punctuation, ensures a single space after it (without splitting runs like '...'), and trims the space just inside brackets and parentheses. It also normalizes ellipsis based on the selected quote style — three or more periods collapse to U+2026 when 'smart' is chosen, and U+2026 expands back to three periods when 'straight' is chosen.

Question 10

Is my text uploaded?

Accepted Answer

No. The cleaner runs synchronously inside the browser tab. There is no fetch call and no analytics on the input. You can disconnect from the network after the page loads and the tool keeps working.

Text Cleaner

Next steps

Case Converter

Find & Replace

Remove Duplicates

Sort Lines

How the Text Cleaner Pipeline Works

Common Use Cases

PDF copy-paste cleanup

Word-to-CMS migration

Code linting prep

Email body sanitation

Frequently Asked Questions