[Hidden-tech] Recommendation for OCR software for digitizing magazines

Noah Smith noah at born-digital.com
Thu May 6 12:57:39 UTC 2021


I'll second the vote for Tesseract - all the work we do (e.g.
https://compass.fivecolleges.edu/) uses Tesseract to generate OCR and it
works very well. Not perfect especially for handwritten stuff, but nothing
will be.

--

Noah Smith

Founder + CEO

Pronouns: he/him/his. Scheduling a meeting with me? View my availability
<https://calendar.google.com/calendar/u/0/embed?src=noah@born-digital.com&ctz=America/New_York&mode=WEEK>
Born-Digital | 84 Russell St, Hadley MA | (413) 259-6777 | born-digital.com


On Wed, May 5, 2021 at 4:37 PM Rich at tnr via Hidden-discuss <
hidden-discuss at lists.hidden-tech.net> wrote:

> funny you should ask - as I am just finishing digitizing 40 years of
> journals from Watervliet (NY)Shaker Village and have digitized a large
> number of resources
> as part of Shakerpedia and other projects.
>
> SO this is not just a pick software question -- it's more about the
> overall project design:
>    Just a few parts:
>         most higher level scanners include good OCR systems -- ABBYY is
> included with many PC/MAC systems
>         Are the journals sheet feed-able  or can they be even if cutting
> the bindings.
>         Or even better, has someone else digitized or will help digitize -
> just are Archive.org that has both a major archive and infrastructure for
> digitizing.
>         Is there the staff to handle this or what cost has to be covered.
>         What is the most effective platform, there are reason to get into
> linux systems - such as Tesseract
>          Once it's digitized, how will it be search -- the most common
> system for such online use is Elasticsearch, which you can run on AWS or
> almost any cloud platform.
>
> As you can tell - there is a lot more to that question than just software
> -- there are few comments above - if you want to discuss this more, email
> off-list
>
> Stay well - Rich
> On 5/5/2021 3:53 PM, Joanna Campe via Hidden-discuss wrote:
>
> Hi everyone,
>
> I hope you are all safe and well.
>
> We would like to digitize our archival hardcopy magazines, and we are
> looking for the best option. Does anyone have experience with this and can
> make a recommendation for OCR software?
>
> We have tried Adobe Acrobat Pro and a couple others, but are having some
> difficulty recognizing text that is printed over images.
>
> Important features are searchable PDF creation in a magazine format. We
> are using an Epson Perfection V500 Plus scanner, if that matters.
>
> Your recommendations are much appreciated!
>
> My best,
>
> Joanna
>
> Joanna Campe
> Executive Director
> Remineralize the Earth
> 152 South Street
> Northampton, MA 01060 USA
>
> Tel: 413-563-9938
> Email: jcampe at remineralize.org
> http://www.remineralize.org
>
>
> *Book*
> Geotherapy: Innovative Methods of Soil Fertility Restoration, Carbon
> Sequestration, and Reversing CO2 Increase
> http://www.crcpress.com/product/isbn/9781466595392
>
> Please join and support us on *Patr <https://www.patreon.com/RTE>**eon
> <https://www.patreon.com/RTE>*
> https://www.patreon.com/RTE
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.netHidden-discuss@lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.http://www.hidden-tech.net/members
>
> --
> Rich Roth
> CEO TnR Global
>
> Bio and personal blog: http://rizbang.com
> Building the really big sites:      http://www.tnrglobal.com
> Small/Soho business in the PV:        http://www.hidden-tech.net
> Places to meet for business:        http://www.meetmewhere.com
> And for Arts and relaxation:http://TarotMuertos.com - Artistic Tarot Deck
>    http://www.welovemuseums.com
>    http://www.artonmytv.com/
> Helping move the world:             http://www.earththrives.com
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20210506/85d6aae2/attachment-0001.html>


Google

More information about the Hidden-discuss mailing list