The libhtmltok0 package
Html tokenizer library
The library can be used to tokenize the html files into tokens of type
opentag, closetag, selfclose, comment, doctype, or text element.
Remember this is just a tokenizer and not a parser. Another 'layer'
can be added to it to make it a parser. This version of library does
not support entities but may be expected to be supported in future
versions. This library does not support scripts like javascript. But
another 'layer' may support it.
Properties
Downloads