Question 1

Is my file uploaded anywhere?

Accepted Answer

No. OCR runs client-side, with no upload. pdfjs-dist renders pages on a canvas, Tesseract.js does the recognition in a Web Worker, and only the initial WASM and language-model files are fetched from a CDN. Your document stays in tab memory.

Question 2

Which languages are supported?

Accepted Answer

Thirteen out of the box: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Simplified Chinese, Japanese, Korean, Arabic, and Hindi. Each language uses a Tesseract LSTM model that downloads on first use and caches in the browser.

Question 3

Why does the first run take so long?

Accepted Answer

The Tesseract WASM core (~2 MB) and the chosen language model (2–4 MB) are downloaded from a CDN and cached on first use. Subsequent runs in the same browser skip the download and start in under a second.

Question 4

What's the difference between Fast and Best?

Accepted Answer

Fast mode renders PDF pages at ~192 DPI; Best mode at ~288 DPI. The same Tesseract engine reads both, but Best gives more pixels per glyph and produces noticeably better accuracy on small or faded text at the cost of about 2× the processing time.

Question 5

Can OCR read handwriting?

Accepted Answer

No, not reliably. Tesseract's standard LSTM models are trained on printed text. Cursive and casual handwriting come out as gibberish; carefully hand-printed block letters sometimes work but should not be relied on.

Question 6

What accuracy can I expect?

Accepted Answer

Clean printed text from a 300 DPI scan typically achieves 95–99 % accuracy. Photographs of pages, faded photocopies, and small fonts drop to 80–90 %. Confidence scores under 60 usually indicate the source is too noisy — re-scan or use Best mode.

Question 7

Why is column order wrong on multi-column PDFs?

Accepted Answer

Tesseract reads top-to-bottom and does not reconstruct page layout. Multi-column scans tend to read down one column and then the next, scrambling paragraphs. For column-aware OCR, a commercial engine is required.

Question 8

Can I OCR password-protected PDFs?

Accepted Answer

No. Unlock the PDF first using the PDF Password tool, then run OCR on the unlocked file.

Question 9

How long does OCR take?

Accepted Answer

On a typical laptop, Fast mode processes 1–2 pages per second; Best mode is roughly half that. A 100-page PDF in Best mode is around 2–4 minutes. The progress bar updates per page.

Question 10

Does the tool produce a searchable PDF or just text?

Accepted Answer

Just text right now. The output is plain UTF-8 you can copy or save as .txt. To embed the OCR text back into the PDF as a searchable layer, a desktop tool like ocrmypdf is currently the best option.

PDF OCR

Next steps

PDF to Images

Protect PDF

Split PDF

Convert format

How Browser-Based OCR Works — Tesseract.js, DPI, and Language Models

Common Use Cases

Digitize scanned contracts

Extract text from photos

Make image-only PDFs searchable

Multi-language document processing

Frequently Asked Questions