Tesseract – an OCR engine

Tesseract is a very interesting open source project. It’s not only one of the few open source engines for OCR (optical character recognition), the algorithms it uses are by far more accurate than most typical commercial products. Linux Journal has a howto on configuring, installing and running this command line engine.
Obviously the higher the dpi the scan, the better the quality of read. Even so, this open source engine seems to do great against ocrad, and can definitely compete with the best of them. International characters seem to be out, and there’s some definite learning curve if you’re not a command line whiz, but hey…. that’s what learning is all about eh?