[Hidden-tech] Looking for GOOD pdf-to-text converter

Jan Werner jwerner at jwdp.com
Sat Dec 23 16:11:53 EST 2006


There are numerous products on the market that will extract text from 
pdf files, including Adobe Acrobat itself, which allows you to save a 
text-based pdf file as a MS Word or plain text file.

Both Nuance (formerly Scansoft) and ABBYY, the two leading purveyors of 
OCR software, sell software that converts to and from pdf files but also 
use OCR to extract text from pdf files that consist of scanned images. 
Both claim to be able to output to Excel files directly. Both cost $100 
and both use activation over the Internet to restrict their usage to one 
computer, which makes them unacceptable to me (since I work on more than 
one computer), but others may not object.  Nuance also sells a $50 
version that can only extract from, but not create, pdf files.

A friend who has published prolifically in academic publications over 
many decades has successfully used ABBYY's Transformer to extract papers 
he had written from JStor's scanned image pdf files into Microsoft Word.

Jan Werner
__________

Bill Bither wrote:
>    ** The author of this post was a Good Dobee.
>    ** You too can help the group
>    ** Fill out the survey/skills inventory in the member's area.
>    ** If you did, we all thank you.
> 
> 
>> I'm looking for software that can convert PDF to text, and ideally
> that 
>> has some options for reformatting or global replacement, since what I 
>> most often need to do is to break a PDF file down into fields and spit
> 
>> out tab-delimited text.  I've been using a freeware product that just 
>> isn't reliable when big files are involved.  Does anyone have
> something 
>> they like for this?
> 
> I noticed that some of the responses have directed you to an OCR
> product.  Most PDF files have text already stored in them, so what you
> really need is a product that will extract the text out of the PDF.
> This is much more reliable than OCR.  There is actually a local software
> company (www.snowtide.com) that does this but the product is a developer
> toolkit more for the enterprise market.  Ask for Chas, he might give a
> local software company a deal.
> 
> OCR would be required only if the PDF contained an image, without text.
> In that case I'm unaware of an off the shelf product that would
> accomplish this.  We offer OCR and PDF Rasterization technology that can
> do this for the developer.
> 
> While we're on the topic of OCR and PDF's, we will be releasing a beta
> of an application that will generate searchable PDF's from scanned image
> documents.  Send me an email if you're interested in testing it out.
> 
> Best Regards,
> 
> Bill Bither
> Atalasoft, Inc.
> www.atalasoft.com
> www.billbither.com
> 
> 
> 
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
> 
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members   
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
> 
> 



Google

More information about the Hidden-discuss mailing list