<div dir="ltr">I'll second the vote for Tesseract - all the work we do (e.g. <a href="https://compass.fivecolleges.edu/">https://compass.fivecolleges.edu/</a>) uses Tesseract to generate OCR and it works very well. Not perfect especially for handwritten stuff, but nothing will be.<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"></div><div dir="ltr"><br>--<br><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Noah Smith</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Founder + CEO</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Pronouns: he/him/his. Scheduling a meeting with me? <a href="https://calendar.google.com/calendar/u/0/embed?src=noah@born-digital.com&ctz=America/New_York&mode=WEEK" target="_blank">View my availability</a></span></p><span style="font-size:8pt;font-family:Arial;font-weight:700;vertical-align:baseline;white-space:pre-wrap"><font color="#0b5394">Born-Digital</font></span><span style="font-size:8pt;font-family:Arial;color:rgb(255,102,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"> </span><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">| 84 Russell St, Hadley MA | (413) 259-6777 | </span><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"> </span><span style="font-size:8pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"><font color="#0b5394"><a href="http://born-digital.com/" target="_blank">born-digital.com</a></font></span></div></div></div></div></div><div dir="ltr"><div style="font-family:arial"><font size="1"></font></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 5, 2021 at 4:37 PM Rich@tnr via Hidden-discuss <<a href="mailto:hidden-discuss@lists.hidden-tech.net">hidden-discuss@lists.hidden-tech.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>funny you should ask - as I am just finishing digitizing 40 years
of journals from Watervliet (NY)Shaker Village and have digitized
a large number of resources<br>
as part of Shakerpedia and other projects.<br>
<br>
SO this is not just a pick software question -- it's more about
the overall project design:<br>
Just a few parts:<br>
most higher level scanners include good OCR systems --
ABBYY is included with many PC/MAC systems<br>
Are the journals sheet feed-able or can they be even if
cutting the bindings.<br>
Or even better, has someone else digitized or will help
digitize - just are Archive.org that has both a major archive and
infrastructure for digitizing.<br>
Is there the staff to handle this or what cost has to be
covered.<br>
What is the most effective platform, there are reason to
get into linux systems - such as Tesseract <br>
Once it's digitized, how will it be search -- the most
common system for such online use is Elasticsearch, which you can
run on AWS or almost any cloud platform.<br>
</p>
<p>As you can tell - there is a lot more to that question than just
software -- there are few comments above - if you want to discuss
this more, email off-list</p>
<p>Stay well - Rich<br>
</p>
<div>On 5/5/2021 3:53 PM, Joanna Campe via
Hidden-discuss wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_default" style="font-family:garamond,serif;font-size:large">Hi
everyone,<br>
<br>
I hope you are all safe and well.<br>
<br>
We would like to digitize our archival hardcopy magazines, and
we are looking for the best option. Does anyone have
experience with this and can make a recommendation for OCR
software? </div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large"><br>
</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large">We have
tried Adobe Acrobat Pro and a couple others, but are having
some difficulty recognizing text that is printed over images.</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large"><br>
</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large">Important
features are searchable PDF creation in a magazine format. We
are using an Epson Perfection V500 Plus scanner, if that
matters.</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large"><br>
</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large">Your
recommendations are much appreciated!<br>
<br>
My best,<br>
<br>
Joanna</div>
<div class="gmail_default" style="font-family:garamond,serif;font-size:large"><br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><font size="4" face="garamond,
serif">Joanna Campe<br>
Executive Director<br>
Remineralize the Earth<br>
152 South Street<br>
Northampton, MA 01060 USA </font></div>
<div dir="ltr"><br>
</div>
<div dir="ltr"><font size="4" face="garamond,
serif">Tel: 413-563-9938</font>
<div><font size="4" face="garamond, serif">Email: </font><a href="mailto:jcampe@remineralize.org" style="font-family:Times;font-size:18px" target="_blank">jcampe@remineralize.org</a><font size="4" face="garamond, serif"><br>
</font><font size="4" face="garamond,
serif"><a href="http://www.remineralize.org/" target="_blank">http://www.remineralize.org</a>
<div style="display:inline-block;width:16px;height:16px"> </div>
</font></div>
<div><font size="4" face="garamond, serif">
<div style="display:inline-block;width:16px;height:16px"><br>
</div>
</font></div>
<div><font size="4" face="garamond, serif" color="#000000"><b>Book</b></font></div>
<div><font size="4" face="garamond, serif" color="#000000">Geotherapy: Innovative
Methods of Soil Fertility Restoration,
Carbon Sequestration, and Reversing CO2
Increase</font></div>
<div><font size="4" face="garamond, serif"><a href="http://www.crcpress.com/product/isbn/9781466595392" style="color:rgb(17,85,204)" target="_blank">http://www.crcpress.com/product/isbn/9781466595392</a> </font></div>
<div><font size="4" face="garamond, serif"><br>
</font></div>
<div><font size="4" face="garamond, serif">Please
join and support us on <b><a href="https://www.patreon.com/RTE" target="_blank">Patr</a></b></font><b><a href="https://www.patreon.com/RTE" target="_blank"><span style="font-family:garamond,serif;font-size:large">e</span><span style="font-family:garamond,serif;font-size:large">on</span></a></b></div>
<div><font size="4" face="garamond, times
new roman, serif"><a href="https://www.patreon.com/RTE" target="_blank">https://www.patreon.com/RTE</a></font><font size="4" face="garamond, serif"><br>
</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Hidden-discuss mailing list - home page: <a href="http://www.hidden-tech.net" target="_blank">http://www.hidden-tech.net</a>
<a href="mailto:Hidden-discuss@lists.hidden-tech.net" target="_blank">Hidden-discuss@lists.hidden-tech.net</a>
You are receiving this because you are on the Hidden-Tech Discussion list.
If you would like to change your list preferences, Go to the Members
page on the Hidden Tech Web site.
<a href="http://www.hidden-tech.net/members" target="_blank">http://www.hidden-tech.net/members</a>
</pre>
</blockquote>
<pre cols="72">--
Rich Roth
CEO TnR Global
Bio and personal blog: <a href="http://rizbang.com" target="_blank">http://rizbang.com</a>
Building the really big sites: <a href="http://www.tnrglobal.com" target="_blank">http://www.tnrglobal.com</a>
Small/Soho business in the PV: <a href="http://www.hidden-tech.net" target="_blank">http://www.hidden-tech.net</a>
Places to meet for business: <a href="http://www.meetmewhere.com" target="_blank">http://www.meetmewhere.com</a>
And for Arts and relaxation:
<a href="http://TarotMuertos.com" target="_blank">http://TarotMuertos.com</a> - Artistic Tarot Deck
<a href="http://www.welovemuseums.com" target="_blank">http://www.welovemuseums.com</a>
<a href="http://www.artonmytv.com/" target="_blank">http://www.artonmytv.com/</a>
Helping move the world: <a href="http://www.earththrives.com" target="_blank">http://www.earththrives.com</a></pre>
</div>
_______________________________________________<br>
Hidden-discuss mailing list - home page: <a href="http://www.hidden-tech.net" rel="noreferrer" target="_blank">http://www.hidden-tech.net</a><br>
<a href="mailto:Hidden-discuss@lists.hidden-tech.net" target="_blank">Hidden-discuss@lists.hidden-tech.net</a><br>
<br>
You are receiving this because you are on the Hidden-Tech Discussion list.<br>
If you would like to change your list preferences, Go to the Members<br>
page on the Hidden Tech Web site.<br>
<a href="http://www.hidden-tech.net/members" rel="noreferrer" target="_blank">http://www.hidden-tech.net/members</a><br>
</blockquote></div>