Version: 4.1.0.2
HTML: Parsing Library
The html library provides
functions to read html documents and structures to represent them.
Reads (X)HTML from a port, producing an html instance.
Reads HTML from a port, producing an xexpr compatible with the
xml library (which defines content?).
1 Example
(module html-example scheme |
|
htmlxml |
|
|
(require (prefix-in h: html) |
(prefix-in x: xml)) |
|
(define an-html |
(h:read-xhtml |
(open-input-string |
(string-append |
"<html><head><title>My title</title></head><body>" |
"<p>Hello world</p><p><b>Testing</b>!</p>" |
"</body></html>")))) |
|
|
|
(define (extract-pcdata some-content) |
(cond [(x:pcdata? some-content) |
(list (x:pcdata-string some-content))] |
[(x:entity? some-content) |
(list)] |
[else |
(extract-pcdata-from-element some-content)])) |
|
|
|
(define (extract-pcdata-from-element an-html-element) |
(match an-html-element |
[(struct h:html-full (content)) |
(apply append (map extract-pcdata content))] |
|
[(struct h:html-element (attributes)) |
'()])) |
|
(printf "~s~n" (extract-pcdata an-html))) |
> (require 'html-example) |
("My title" "Hello world" "Testing" "!") |
2 HTML Structures
pcdata, entity, and attribute are defined
in the xml documentation.
A html-content is either
Any of the structures below inherits from html-element.
|
content : (listof html-content) |
Any html tag that may include content also inherits from
html-full without adding any additional fields.
A html is
(make-html (listof attribute) (listof Contents-of-html))
A Contents-of-html is either
A div is
(make-div (listof attribute) (listof G2))
A center is
(make-center (listof attribute) (listof G2))
A blockquote is
(make-blockquote (listof attribute) G2)
An Ins is
(make-ins (listof attribute) (listof G2))
A del is
(make-del (listof attribute) (listof G2))
A dd is
(make-dd (listof attribute) (listof G2))
A li is
(make-li (listof attribute) (listof G2))
A th is
(make-th (listof attribute) (listof G2))
A td is
(make-td (listof attribute) (listof G2))
An iframe is
(make-iframe (listof attribute) (listof G2))
A noframes is
(make-noframes (listof attribute) (listof G2))
A noscript is
(make-noscript (listof attribute) (listof G2))
A style is
(make-style (listof attribute) (listof pcdata))
A script is
(make-script (listof attribute) (listof pcdata))
A basefont is
(make-basefont (listof attribute))
A br is
(make-br (listof attribute))
An area is
(make-area (listof attribute))
A alink is
(make-alink (listof attribute))
An img is
(make-img (listof attribute))
A param is
(make-param (listof attribute))
A hr is
(make-hr (listof attribute))
An input is
(make-input (listof attribute))
A col is
(make-col (listof attribute))
An isindex is
(make-isindex (listof attribute))
A base is
(make-base (listof attribute))
A meta is
(make-meta (listof attribute))
An option is
(make-option (listof attribute) (listof pcdata))
A textarea is
(make-textarea (listof attribute) (listof pcdata))
A title is
(make-title (listof attribute) (listof pcdata))
A head is
(make-head (listof attribute) (listof Contents-of-head))
A Contents-of-head is either
A tr is
(make-tr (listof attribute) (listof Contents-of-tr))
A Contents-of-tr is either
A colgroup is
(make-colgroup (listof attribute) (listof col))
A thead is
(make-thead (listof attribute) (listof tr))
A tfoot is
(make-tfoot (listof attribute) (listof tr))
A tbody is
(make-tbody (listof attribute) (listof tr))
A tt is
(make-tt (listof attribute) (listof G5))
An i is
(make-i (listof attribute) (listof G5))
A b is
(make-b (listof attribute) (listof G5))
An u is
(make-u (listof attribute) (listof G5))
A s is
(make-s (listof attribute) (listof G5))