Parsing HTML Forms

Sometimes in functional tests, information from a generated form must be extracted in order to re-submit it as part of a subsequent request. The zope.testing.formparser module can be used for this purpose.

The scanner is implemented using the FormParser class. The constructor arguments are the page data containing the form and (optionally) the URL from which the page was retrieved:

>>> import zope.testing.formparser
>>> page_text = '''\
... <html><body>
...   <form name="form1" action="/cgi-bin/foobar.py" method="POST">
...     <input type="hidden" name="f1" value="today" />
...     <input type="submit" name="do-it-now" value="Go for it!" />
...     <input type="IMAGE" name="not-really" value="Don't."
...            src="dont.png" />
...     <select name="pick-two" size="3" multiple>
...       <option value="one" selected>First</option>
...       <option value="two" label="Second">Another</option>
...       <optgroup>
...         <option value="three">Third</option>
...         <option selected="selected">Fourth</option>
...       </optgroup>
...     </select>
...   </form>
...
...   Just for fun, a second form, after specifying a base:
...   <base href="http://www.example.com/base/" />
...   <form action = 'sproing/sprung.html' enctype="multipart/form">
...     <textarea name="sometext" rows="5">Some text.</textarea>
...     <input type="Image" name="action" value="Do something."
...            src="else.png" />
...     <input type="text" value="" name="multi" size="2" />
...     <input type="text" value="" name="multi" size="3" />
...   </form>
... </body></html>
... '''
>>> parser = zope.testing.formparser.FormParser(page_text)
>>> forms = parser.parse()
>>> len(forms)
2
>>> forms.form1 is forms[0]
True
>>> forms.form1 is forms[1]
False

More often, the parse() convenience function is all that’s needed:

>>> forms = zope.testing.formparser.parse(
...     page_text, "http://cgi.example.com/somewhere/form.html")
>>> len(forms)
2
>>> forms.form1 is forms[0]
True
>>> forms.form1 is forms[1]
False

Once we have the form we’re interested in, we can check form attributes and individual field values:

>>> form = forms.form1
>>> form.enctype
'application/x-www-form-urlencoded'
>>> form.method
'post'
>>> keys = sorted(form.keys())
>>> keys
['do-it-now', 'f1', 'not-really', 'pick-two']
>>> not_really = form["not-really"]
>>> not_really.type
'image'
>>> not_really.value
"Don't."
>>> not_really.readonly
False
>>> not_really.disabled
False

Note that relative URLs are converted to absolute URLs based on the <base> element (if present) or using the base passed in to the constructor.

>>> form.action
'http://cgi.example.com/cgi-bin/foobar.py'
>>> not_really.src
'http://cgi.example.com/somewhere/dont.png'
>>> forms[1].action
'http://www.example.com/base/sproing/sprung.html'
>>> forms[1]["action"].src
'http://www.example.com/base/else.png'

Fields which are repeated are reported as lists of objects that represent each instance of the field:

>>> field = forms[1]["multi"]
>>> isinstance(field, list)
True
>>> [o.value for o in field]
['', '']
>>> [o.size for o in field]
[2, 3]

The <textarea> element provides some additional attributes:

>>> ta = forms[1]["sometext"]
>>> print_(ta.rows)
5
>>> print_(ta.cols)
None
>>> ta.value
'Some text.'

The <select> element provides access to the options as well:

>>> select = form["pick-two"]
>>> select.multiple
True
>>> select.size
3
>>> select.type
'select'
>>> select.value
['one', 'Fourth']
>>> options = select.options
>>> len(options)
4
>>> [opt.label for opt in options]
['First', 'Second', 'Third', 'Fourth']
>>> [opt.value for opt in options]
['one', 'two', 'three', 'Fourth']