XML.com: Creating Web Utilities Using XML::XPath:
HTML Tidy was created specifically to help HTML authors clean their markup. Now it offers the ability to translate HTML to XHTML. Use caution, however. Some of the errors that Tidy will fix may have been introduced by HTML authors in order to achieve a certain visual effect. Also keep in mind that the contents of most <script> elements will cause an XML parser to reject the parent document as not well-formed since some of the client-side script operators (>,
<, and &.) are special characters in the XML world.
I was really surprised to read that most xml parsers will reject code that has the script tag in it! This could be a problem. Almost all the code I have seen has a ton of this stuff, so using HTML->xHTML->xPath would be a problem. RegEx might be a better solution in that case.
Tags: xpath, regex