[Hidden-tech] Converting PDFs to Word?

Thomas Lore thomas.lore at gmail.com
Wed Dec 17 07:55:25 EST 2008


For what Shawn is suggesting, there is always Adobe Writer, which I hear can
do this exact task, although I've never done it myself.  I'd imaging Adobe
products (other than the free Reader) can get pricy.

On Tue, Dec 16, 2008 at 9:25 PM, Shawn Fumo <programming at shawnfumo.com>wrote:

>   ** Be sure to fill out the survey/skills inventory in the member's area.
>   ** If you did, we all thank you.
>
>
> As an FYI, from my understanding the problem with pdf conversion is
> that it isn't a normal document format. Unlike HTML, it isn't bunches
> of text with some formatting applied. Instead, each character is
> treated like a separate entity. So it'll store a letter "a" at a
> particular pixel location, an "x" at some location, etc. They may not
> even be stored in the same order they're displayed in the resulting
> document. So any program that extracts text from a PDF basically has
> to act like an OCR program to try to reconstruct the document.
>
> There's a variety of programs out there for extracting text/converting
> from pdfs, both open source and commercial. One of the commercial ones
> is from a local company Snowtide Informatics.
>
> I'd agree with many of the responders that if there's some way of
> getting at the text and images before it's actually made into a pdf,
> that'd probably be the easiest way to go.
>
> Shawn
>
>
>
> > Hi H-Ters,
> > I'm editor at 2 quarterly business magazines. We publish in print and
> online.
> > (Yes, I'm a H-T member, since the beginning, and live here in the Happy
> Valley.)
> > We're looking for the simplest, most automated (if possible) way to
> convert the final PDF files we send our printer into MS-Word so our
> webmaster can post the upcoming issue online ASAP. Our Art director is doing
> it manually now, not the best use of her time. Here's what our webmaster
> wrote:
> > "I need articles in MS Word, plus I need a PDF copy of the magazine so
> that we can use it as a guide when posting the articles, as well as extract
> the images from the PDFs for use in the online articles. We will not pull
> content from the PDF copy as Acrobat does nasty things to text when you pull
> it out of a PDF. It's a nightmare to work with PDF-extracted text."
> > Questions:
> > 1) Any help with converting PDFs to Word?
> > 2) Is there a better way to do this?
> > Thanks for any ideas,
> > Eddy
> > Eddy Goldberg, Managing Editor
> > Franchise Update Media Group
> > 413-256-6616
> > eddyg at franchiseupdatemedia.com
> > www.franchiseupdatemedia.com
>  _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20081217/5f955ecc/attachment.html


Google

More information about the Hidden-discuss mailing list