Lately I had a task to publish some word documents with tables on a web site, after saving the file as filtered web site Microsoft Word produced a 250 KB file !! looking closely at the source code I notice their was 5 line of styles and unnecessary tags for every cell in the HTML table! fast search in Google resulted with Tidy, I installed the command line version as it was already included in Ubuntu 10.10 Maverick Meerkat repository, I run it and wow 30 KB result with clean and formated HTML.
here are the option I used in Tidy to cleanup the code:
bare: yes, clean: yes, drop-empty-paras: yes, drop-font-tags: yes, join-styles: yes, output-xhtml: yes, word-2000: yes