Regular expression pattern normalizing output checker

The pattern-normalizing output checker from zope.testing.renormalizing extends the default output checker with an option to normalize expected and actual output.

You specify a sequence of patterns and replacements. The replacements are applied to the expected and actual outputs before calling the default outputs checker. Let’s look at an example. In this example, we have some times and addresses:

>>> want = '''\
... <object object at 0xb7f14438>
... completed in 1.234 seconds.
... <BLANKLINE>
... <object object at 0xb7f14440>
... completed in 123.234 seconds.
... <BLANKLINE>
... <object object at 0xb7f14448>
... completed in .234 seconds.
... <BLANKLINE>
... <object object at 0xb7f14450>
... completed in 1.234 seconds.
... <BLANKLINE>
... '''
>>> got = '''\
... <object object at 0xb7f14458>
... completed in 1.235 seconds.
...
... <object object at 0xb7f14460>
... completed in 123.233 seconds.
...
... <object object at 0xb7f14468>
... completed in .231 seconds.
...
... <object object at 0xb7f14470>
... completed in 1.23 seconds.
...
... '''

We may wish to consider these two strings to match, even though they differ in actual addresses and times. The default output checker will consider them different:

>>> import doctest
>>> doctest.OutputChecker().check_output(want, got, 0)
False

We’ll use the zope.testing.renormalizing.OutputChecker to normalize both the wanted and gotten strings to ignore differences in times and addresses:

>>> import re
>>> from zope.testing.renormalizing import OutputChecker
>>> checker = OutputChecker([
...    (re.compile('[0-9]*[.][0-9]* seconds'), '<SOME NUMBER OF> seconds'),
...    (re.compile('at 0x[0-9a-f]+'), 'at <SOME ADDRESS>'),
...    ])
>>> checker.check_output(want, got, 0)
True

Usual doctest.OutputChecker options work as expected:

>>> want_ellided = '''\
... <object object at 0xb7f14438>
... completed in 1.234 seconds.
... ...
... <object object at 0xb7f14450>
... completed in 1.234 seconds.
... <BLANKLINE>
... '''
>>> checker.check_output(want_ellided, got, 0)
False
>>> checker.check_output(want_ellided, got, doctest.ELLIPSIS)
True

When we get differencs, we output them with normalized text:

>>> source = '''\
... >>> do_something()
... <object object at 0xb7f14438>
... completed in 1.234 seconds.
... ...
... <object object at 0xb7f14450>
... completed in 1.234 seconds.
... <BLANKLINE>
... '''
>>> example = doctest.Example(source, want_ellided)
>>> print_(checker.output_difference(example, got, 0))
Expected:
    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.
    ...
    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.

Got:
    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.

    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.

    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.

    <object object at <SOME ADDRESS>>
    completed in <SOME NUMBER OF> seconds.

>>> print_(checker.output_difference(example, got,
...                                 doctest.REPORT_NDIFF))
Differences (ndiff with -expected +actual):
    - <object object at <SOME ADDRESS>>
    - completed in <SOME NUMBER OF> seconds.
    - ...
      <object object at <SOME ADDRESS>>
      completed in <SOME NUMBER OF> seconds.

    + <object object at <SOME ADDRESS>>
    + completed in <SOME NUMBER OF> seconds.
    + <BLANKLINE>
    + <object object at <SOME ADDRESS>>
    + completed in <SOME NUMBER OF> seconds.
    + <BLANKLINE>
    + <object object at <SOME ADDRESS>>
    + completed in <SOME NUMBER OF> seconds.
    + <BLANKLINE>

If the wanted text is empty, however, we don’t transform the actual output. This is usful when writing tests. We leave the expected output empty, run the test, and use the actual output as expected, after reviewing it.

>>> source = '''\
... >>> do_something()
... '''
>>> example = doctest.Example(source, '\n')
>>> print_(checker.output_difference(example, got, 0))
Expected:

Got:
    <object object at 0xb7f14458>
    completed in 1.235 seconds.

    <object object at 0xb7f14460>
    completed in 123.233 seconds.

    <object object at 0xb7f14468>
    completed in .231 seconds.

    <object object at 0xb7f14470>
    completed in 1.23 seconds.

If regular expressions aren’t expressive enough, you can use arbitrary Python callables to transform the text. For example, suppose you want to ignore case during comparison:

>>> checker = OutputChecker([
...    lambda s: s.lower(),
...    lambda s: s.replace('<blankline>', '<BLANKLINE>'),
...    ])
>>> want = '''\
... Usage: thundermonkey [options] [url]
... <BLANKLINE>
... Options:
...     -h    display this help message
... '''
>>> got = '''\
... usage: thundermonkey [options] [URL]
...
... options:
...     -h    Display this help message
... '''
>>> checker.check_output(want, got, 0)
True

Suppose we forgot that <BLANKLINE> must be in upper case:

>>> checker = OutputChecker([
...    lambda s: s.lower(),
...    ])
>>> checker.check_output(want, got, 0)
False

The difference would show us that:

>>> source = '''\
... >>> print_help_message()
... ''' + want
>>> example = doctest.Example(source, want)
>>> print_(checker.output_difference(example, got,
...                                 doctest.REPORT_NDIFF))
Differences (ndiff with -expected +actual):
      usage: thundermonkey [options] [url]
    - <blankline>
    + <BLANKLINE>
      options:
          -h    display this help message

It is possible to combine OutputChecker checkers for easy reuse:

>>> address_and_time_checker = OutputChecker([
...    (re.compile('[0-9]*[.][0-9]* seconds'), '<SOME NUMBER OF> seconds'),
...    (re.compile('at 0x[0-9a-f]+'), 'at <SOME ADDRESS>'),
...    ])
>>> lowercase_checker = OutputChecker([
...    lambda s: s.lower(),
...    ])
>>> combined_checker = address_and_time_checker + lowercase_checker
>>> len(combined_checker.transformers)
3

Combining a checker with something else does not work:

>>> lowercase_checker + 5 
Traceback (most recent call last):
    ...
TypeError: unsupported operand type(s) for +: ...