Public
- Public
- Groups
- Popular
- People

Notices by Dan Jones (danjones000@fedi.absturztau.be)

Dan Jones (danjones000@fedi.absturztau.be)'s status on Monday, 14-Feb-2022 08:28:13 UTC Dan Jones
in reply to
- Amolith
@amolith
I used to do this for a living.
I built a cloud based document management system that would take scanned pages, OCR them, and store them as PDFs.
We used Google Cloud Vision, which was overkill, but my CEO had a hard-on for Google.
Tesseract should be all you need: https://guides.library.illinois.edu/c.php?g=347520&p=4121426
Although, may I suggest using DjVu instead of PDF. DjVu is a better archival format. It’s much simpler usually results on smaller fine sizes. Many PDF viewers already support it. But I don’t know exactly what your use case is, so that may not be an option
In conversation Monday, 14-Feb-2022 08:28:13 UTC from fedi.absturztau.be permalink
Attachments
1. LibGuides: Introduction to OCR and Searchable PDFs: Using Tesseract
  
  from Scholarly Commons
  
  Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.
Dan Jones (danjones000@fedi.absturztau.be)'s status on Tuesday, 05-Oct-2021 18:31:09 UTC Dan Jones
in reply to
- muesli
- Mr. Teatime
@fribbledom @Mr_Teatime
I like the term “willfully ignorant”

In conversation Tuesday, 05-Oct-2021 18:31:09 UTC from fedi.absturztau.be permalink

User actions

Husband, father, Christian, Mormon, PHP Developer, etc.

Tags

(None)

ActivityPub: Remote Profile

Following 0

Followers 0

Groups 0

Statistics

User ID: 25491

Member since: 5 Oct 2021

Notices: 2

Daily average: 0

Feeds

Atom