First, watch this video for the full meal deal.
However, for a quick and dirty summary of how to quickly convert a pretty bad image / photo (or pretty good one) into a .txt file, here we go:
1. Install Tesseract
sudo apt install tesseract-ocr
2. Convert the image (ie. .jpg, pdf, etc) into a .tiff file with Imagemagik to make it ready for Tesseract
Ubuntu (I think) comes with Imagemagik (or whatever it’s called) so when you run the ‘convert’ command it runs imagemagik. Anyways, here is what I did:
- navigate to where the source images are with command line in terminal
- convert whatever the image is into a tiff file with this command from video (adjust accordingly)
convert -density 300 IMG_input_image_1234.jpg -depth 8 -strip -background white -alpha off IMG_output_image_1234.tiff
Now you shouuld have IMG_output_image_1234.tiff in your directory
3. Convert the TIFF to TEXT
tesseract IMG_output_image_1234.tiff IMG_output_text_1234
Now you should have IMG_output_text_1234.txt in your directory.
Note that I didn’t add .txt in the output command. Seems Tesseract ‘just does that’
Note also that you can only do one language at a time and default is English. If you need another language you have to do that on a second round and do some other stuff in the command line I recall..
Hope this helps