Practical Property Testing

Hypothesis is a property-based testing framework. This means that it will generate test cases for you as long as you tell it what kind of data to generate, and you write tests that can generalize solutions against novel data. But how do you get started with a tool like this?

How to come up with your own properties is a challenge, but there are a few patterns that are immediately useful in the software that people work on in the real world. I learned about these patterns from Matt Bachmann’s Pycon 2016 talk.

The term “property” comes from Math. For example, we say that a list-sorting function has the property of idempotence. That means that if you run the function many times, it won’t have any different outcomes from just running it once.

def test_my_sort():
    a = random_list()
    assert my_sort(a) == my_sort(my_sort(a))
    assert my_sort(a) == my_sort(my_sort(my_sort(a)))
    assert my_sort(a) == my_sort(my_sort(my_sort(my_sort(a))))

This is an awfully academic way to think about your code, and you shouldn’t be surprised by that. Hypothesis is based off another project: QuickCheck, from Haskell.

Just because their premise is based on abstract mathematical properties doesn’t mean you need to understand how to write a formal proof about your code. Coming up with your own tests is easy. Instead of coming up with specific test cases with hard-coded input and output data, you come up with a test that describes the general shape of the behavior of your code. When you do that, the property-test framework can generate data for you and use your test to verify that your code is doing what it should be. When you describe how the output should be, rather than what it should be, your test will work on data you haven’t thought of before!

Still, property tests are far more challenging to write than unit tests. The properties that you implement during testing are not the same properties that come up during feature requests from users – nobody is asking for you to make sure your implementation of quick-sort is idempotent! But there are a few properties that are immediately useful to you to start using Hypothesis in your codebase right away.

Oracle-based property

Something that all property tests have in common is it asserts some kind of behavior about your code. Often the property test only describe one aspect about the behavior of what’s being tested, but this pattern asserts everything about what data it should return at once.

Say you have a function that’s too slow, or written poorly, or you want to write it in a different language. Before embarking on a complete rewrite, it’s important to have a bunch of tests around that function to make sure that you don’t accidentally make a mistake and change its behavior from what it should do. Writing out all those tests is time consuming, why not just have Hypothesis test it for you?

from hypothesis import given
import hypothesis.strategies as st

from project.legacy import fix_data as old_fix_data
from project import fix_data

@given(st.integers())
def test_new_data_fixer(x):
    assert fix_data(x) == old_fix_data(x)

Hypothesis will generate random integers and check to make sure the new implementation fixes the data correctly. This is a cheap and easy way to verify your implementation across many test cases. While it helps to have a few pointed edge cases that make sense in your problem domain, having a computer generate thousands of test cases for you may uncover differences in the rewrite that could have been bugs in the legacy code to begin with! It’s like having users that always reports concrete ways to reproduce a bug.

Reversible Operations

You get a property test for free whenever you have two functions that are supposed to undo each other. Serialization and deserialization of data is a common function where this happens:

from hypothesis import given
import hypothesis.strategies as st
json_data = st.recursive(
    st.integers() | st.booleans() | st.text() | st.none(),
    lambda children: st.lists(children) | st.dictionaries(st.text(), children)
)
import json

@given(json_data)
def test_json_serialization(data):
    assert data == json.loads(json.dumps(data))

This asserts that serializing data back and forth from json works correctly. Notice that we aren’t specifying any specific test case of data because we shouldn’t have to. json.loads and json.dumps are supposed to reverse what each other do by definition, so it makes sense that we should be able to use any sort of data that’s allowed to be stored as JSON.

This generator is making use of the recursive strategy from hypothesis to generate nested data. I pulled the example from the hypothesis documentation.

Note that the documentation uses st.floats() where I’m using st.integers(). This is because the float nan serializes back to the string "NaN", which hypothesis points out as an error. Maybe this highlights a problem with the JSON spec, or with python’s standard library json module. Either way, outside the scope of what I want to talk about right now.

Fuzzing for Failure

A common property of software that you usually want to have is that it doesn’t crash. It’s surprising how rare a property this is.

import datetime


@h.given(st.integers())
def test_epoch_to_datetime(ts):
    datetime.datetime.fromtimestamp(ts)

This is testing a function that takes the number of seconds since January 1st, 1970 and returns a datetime object for that point in time. This test isn’t asserting anything about the behavior of this function at all, it’s not even looking at the datetime that it returns. Why even bother? However when we run our test:

============================ FAILURES =============================
_____________________ test_epoch_to_datetime ______________________

    @h.given(st.integers())
>   def test_epoch_to_datetime(ts):

tests/test_thing.py:104:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py3/lib/python2.7/site-packages/hypothesis/core.py:520: in wrapped_test
    print_example=True, is_final=True
.tox/py3/lib/python2.7/site-packages/hypothesis/executors.py:58: in default_new_style_executor
    return function(data)
.tox/py3/lib/python2.7/site-packages/hypothesis/core.py:110: in run
    return test(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ts = 253402318800

    @h.given(st.integers())
    def test_epoch_to_datetime(ts):
>       datetime.datetime.fromtimestamp(ts)
E       ValueError: year is out of range

tests/test_thing.py:105: ValueError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_epoch_to_datetime(ts=253402318800)

fromtimestamp is crashing! Why is there a number that this would crash on? Looking at the documentation for this function:

… This may raise ValueError, if the timestamp is out of the range of values supported by the platform C localtime() function. It’s common for this to be restricted to years from 1970 through 2038. …

So what year does this integer represent?

Python 3.4.3 (default, Aug 11 2015, 08:53:29)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ts = 253402318800
>>> (ts
... / 60 # seconds in a minute
... / 60 # minutes in an hour
... / 24 # hours in a day
... / 365 # days a year
... ) + 1970 # since epoch year
10005.334817351599

Ah. Yeah that’s a bit over 2038. But the documentation says “it’s common for this to be restricted…” meaning it will be different depending on what the platform supports. Depending on the kind of software you’re writing and deploying, it would make sense to test for this on every platform you support. Do you know all instances where your codebase calls datetime.fromtimestamp? Do you know all instances where the libraries you use call datetime.fromtimestamp?

Fuzzing your components for failure can uncover all sorts of unexpected errors like this without much effort on your part. You don’t have to know where you use functions that have arbitrary limits like this. Just let the computer search for them and save you the trouble.

Start Using Property Tests Now

Hypothesis supports pytest and unittest already, and using some of the patterns above it’s easy to begin incorporating property tests into your project right away. The more familiar you’ll get with how to apply these basic patterns, the sooner you’ll learn how to come up with novel patterns on your own code.