[Hidden-tech] [ANN] DocuHarvest, turning documents into data

Chas Emerick cemerick at snowtide.com
Mon Jun 7 14:34:39 EDT 2010


If memory services, new product/service announcements are welcome on  
the list, assuming they're from local companies.  If so, then I  
presume this would qualify. :-)

DocuHarvest is a data extraction service that enables non-technical  
users to get useful data out of their documents into familiar tools en  
masse (a developer-friendly API is under the covers, and will be  
unwrapped in due course). It draws on our experience building and  
selling PDFTextStream[1] over the past 6 years, and solving a lot of  
sticky data extraction problems along the way.

DocuHarvest is available here:

http://docuharvest.com

At the moment, DocuHarvest provides three different data extraction/  
conversion jobs, including document metadata extraction, conversion to  
text, and PDF form data extraction. A raft of additional types of jobs  
are waiting in the wings, and will be made available over the coming  
weeks.

I invite you to check it out, tinker, and poke to your heart's  
content. I welcome any suggestions, ideas, or problems of any stripe.  
You can reply on-list, send a message to me directly, or use the  
feedback boxes that are available on the site.

Thanks!

Chas Emerick
Founder, Snowtide Informatics Systems

cemerick at snowtide.com
http://snowtide.com | +1 413.519.6365


[1] PDFTextStream is Snowtide's PDF text extraction library for Java  
and .NET: http://snowtide.com


Google

More information about the Hidden-discuss mailing list