To understand e-readers, you have to understand that you’re really dealing with a lightweight web browser. Ebooks are in fact a variation of web pages with HTML at their core. That’s really all they are. The cleaner the HTML you put into the ebook, the more predictable it will appear across various e-reader devices.
In the past, I’ve played around with a number of ways to ensure I convert word to clean HTML. I’ve used regular expression, tried to manually clean out all the excess code that Word adds. I even created my own program that attempted to return a clean HTML file. There are sites that claim to clean your HTML but the results have been disappointing and I’m a bit wary about uploading my book content to websites – call me paranoid.
Word to Clean HTML: Getting the Foundation Right
Then I downloaded a 30-day trial version of DocToHTML. Not only did this little program do all I wanted but performed other tasks I never thought possible.Once you install the program and open Word, it appears as a ribbon on the Word toolbar. From here, I can make all the settings I want. I can control how each style is saved, how images are handled, how I want various fonts to be converted, if I want each chapter to be broken into their own files, and the removal of empty spaces (very handy and perfect for ebooks).
The outputted HTML was as clean as I could ever expect. The smaller this file, the smaller the final ebook and the less cost you incur from Amazon.
Once the book part was done, I could then change the settings for web publication and upload sections to my blog. Sweet.
In short, this tool is the answer to my word conversion problems. I’ll be using it to produce clean HTML for all my future books.
DoctoHTML is available on a 30-day trial and the license only costs $39.