TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML.
Please log in to add feedback.
This update has been submitted for testing by robert.
This update's test gating status has been changed to 'ignored'.
This update has been pushed to testing.
This update has been submitted for stable by bodhi.
This update has been pushed to stable.