Ask HN: OCR framework for extracting formatted text

Ask HN: OCR framework for extracting formatted text?
146 points by crocodiletears 14 days ago | hide | past | web | favorite | 42 comments
I’m a serial information hoarder, and often use screenshots in order to store comments, passages and fragments of conversations I find useful or insightful. This works well if I want to reference something recent, but obviously doesn’t scale well. I’d like to integrate these into my personal archive, but don’t know any frameworks (preferably for Go, Node, or Python) which could automatically extract the text from the images while retaining its formatting. I’m not against doing some image preprocessing myself, but I don’t feel comfortable passing the images to a 3rd party service, since a portion of the images contain private or sensitive information that I can’t readily sort out of my collection.

Guidelines
| FAQ
| Support
| API
| Security
| Lists
| Bookmarklet
| Legal
| Apply to YC
| Contact

Read More

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top