[Hidden-tech] MS Word to HTML Code

Chris Hoogendyk hoogendyk at bio.umass.edu
Tue Mar 2 14:21:36 EST 2010


I'm neither a Word advocate nor a Word user, however,

It seemed to me that it wouldn't be too hard for someone who knew what 
they were doing to create a Word macro that would generate some basic 
markup and spit out the results. If you google "word macro to generate 
markup text", the first page of hits contains several interesting 
candidates for a starting point. I didn't actually click on any of the 
links, but the excerpts on the search page include things about 
"automatically generate semantic markup", "Word macro produces XML", 
"Word document content to MediaWiki markup", and so on. Seems 
potentially promising.


---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<hoogendyk at bio.umass.edu>

--------------- 

Erdös 4



Robert Heller wrote:
>> ---Executing: recode
>> Many people have responded to Claudia -- with the basically obvious answer
>> MSword can do it - WELL ...
>>
>> Not really - on 2 counts:
>>
>> 1) a 300 page book is not same thing that the 'save as word' can handle 
>> properly - it will make a html document that exceeds the limits of any 
>> browser
>> and clearly any reader -- so what is needed is something that handles 
>> paging and a TOC
>>
>> 2) as some have commented on the html is REALLY terrible - word tries 
>> for exactly what the page looks like
>> in word - not something html is very good at -- so the code is a night 
>> mare and very browser display intensive.
>>
>> SO anyone have an idea of a practical solution ?
>>
>> I am famility with systems like DocBook (http://www.docbook.org/) that 
>> has many output modes
>> not sure if it can start with a Word Doc - even in rtf form.
>>     
>
> Virtually *all* of the good solutions cannot start with a Word Doc.  One
> of the best solutions is LaTeX => HTML via HT4Tex.  DocBook is another
> (again a Word Doc is NOT a supported input format).
>
> It you really want the book converted to *good* HTML, you may have to
> first convert the MS-Word doc file to something else.  *Unfortunatly* I
> don't know of *any* good automated tools to do that and I am uncertain
> that any exist.  You may end up using something like antiword to
> convert the doc file to basically plain text, with the formatting
> pretty much stripped out.  Then you'll have to re-insert markup (eg
> LaTex or DocBook or SGML or XML) and then run the result through a
> formatting program that generates the HTML. HT4Tex can be set to
> generate a TOC and separate HTML files by chapter or section, for
> example. HT4Tex does a wonderful job -- I use it for my stuff (all of
> my documents start out as LaTeX). I use it for articles I post on my
> website and for internal documentation for applications I build. I
> suspect that DocBook can do that also (I think DocBook starting format
> of choice is SGML or XML).  I don't know if MS-Word's XML format is even
> remotely compatible with DocBook -- I would suspect NOT.
>
> This is probably not what Claudia wanted to hear.
>
>   
>> Any other ideas
>>
>> Rich
>>
>> PS  -- I hope those that were blocked appreciate that too many of the 
>> same answer has to be prevented
>> and since you don't know who else answered - it's up to me (list 
>> moderator) to filter
>>
>> On 3/2/2010 11:22 AM, Jeffrey Peck wrote:
>>     
>>>   
>>>
>>> If you are using Word 97 or newer,  there should be an option to "Save 
>>> as HTML".  The following link provides some detail on what to expect 
>>> and some issues that may be encountered:
>>> http://www.temple.edu/cs/web/wordconvert.html
>>>
>>> Perhaps some other Hidden-Tech users have some good/bad experience 
>>> with this feature?
>>>
>>> - Jeff
>>>
>>> On Mar 2, 2010, at 10:00 AM, Claudia Gere wrote:
>>>
>>>       
>>>> I´m looking for the easiest/cleanest way to turn a Microsoft Word 
>>>> document (a 300-page book, text with simple formatting, no photos) 
>>>> into HTML code. Does anyone have experience using an application 
>>>> (free or fee) for this purpose?
>>>> Thank you, Claudia
>>>> Claudia Gere & Co. LLC
>>>> Helping smart people become outstanding authorsTM
>>>> Produce, Publish, Promote
>>>> Follow me on Twitter: @claudiagere
>>>> Aspiring Authors Workshops
>>>> www.claudiagereco.com/Workshop.html 
>>>> <http://www.claudiagereco.com/Workshop.html>
>>>> Claudia at ClaudiaGereCo.com <mailto:Claudia at ClaudiaGereCo.com>
>>>> www.ClaudiaGereCo.com <http://www.claudiagereco.com/>
>>>> +1 413 259 1741
>>>> _______________________________________________
>>>>         


Google

More information about the Hidden-discuss mailing list