Write Set Coverage in Hypothesis
Coming up with properties is hard.
For example, say you want to implement set coverage. Write a function that takes a list of sets, and returns the smallest list of these sets, where every element in the large set is now in the small set.
large_set_of_sets = [{1}, {1, 2}, {2}]
reduce_sets(large_set_of_sets) # [{1, 2}]
We’re going to walk through how to implement properties that test set coverage. Once we’ve written properties that give boundaried so it’s behavior, hypothesis will discover bugs in how we’ve written our code and give us concrete examples of bugs to fix.
What are our properties?
What are the properties of reduce_sets
? If you want to steal from Discrete
Math or CS, here are a couple:
- The number of sets in the output is the same or less than the number of sets from the input.
- Every number in the list of input sets is in the list of output sets.
- Every set of numbers in the output list was also in the input list.
- There are no duplicate sets in the output.
- No output set is a subset of any other output set
- No output set is a subset of the other sets combined.
- The output set is the smallest possible number of sets with the previous properties still satisfied.
(I got most of these from Jessica Kerr)
The last property is particularly hard, for example the following two solutions are valid on all the properties but the last property:
large_set_of_sets = [{1}, {1, 2}, {2}]
reduce_sets(large_set_of_sets) # [{1}, {2}] -- covers all the elements
reduce_sets(large_set_of_sets) # [{1, 2}] -- but this is smaller
But we’ll get to why it’s hard at the end.
Let’s work through how we’d implement these properties.
A Unit Test
We’ll be using hypothesis with pytest.
Let’s turn our first example of how set coverage works into a unit test on its own.
from setcoverage import reduce_sets
def test_example():
s = [{1}, {1, 2}, {2}]
assert reduce_sets(s) == [{1, 2}]
And here’s the minimum amount of code it takes to make the test pass:
def reduce_sets(sets):
return [{1, 2}]
Look! Set Coverage in one line! Running the tests proves our success and genius:
$ tox -e py27
GLOB sdist-make: /Users/conrad/dev/set-coverage-example/setup.py
py27 inst-nodeps: /Users/conrad/dev/set-coverage-example/.tox/dist/setcoverage-0.0.1.zip
py27 installed: enum34==1.1.6,hypothesis==3.4.1,py==1.4.31,pytest==2.9.2,setcoverage==0.0.1,wheel==0.24.0
py27 runtests: PYTHONHASHSEED='4047888479'
py27 runtests: commands[0] | py.test
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 1 items
tests/test_thing.py .
==================== 1 passed in 0.01 seconds =====================
_____________________________ summary _____________________________
py27: commands succeeded
congratulations :)
Our First Property: The Cardinality Constraint
The first property we’re going to implement on our set coverage code is this: The output list of sets should have the same number of sets or fewer a number of sets than the input list of sets.
We know this has to be a property of a function that solves set coverage because if it didn’t have this property, that would mean it sometimes returns more sets than it was given. That goes against the premise of what it does – finding the smallest number of sets.
Here’s what that property looks like in python:
import hypothesis as h
from hypothesis import strategies as st
# ...
@h.given(st.lists(st.sets(st.integers())))
def test_that_output_has_fewer_sets_than_input(sets):
output = reduce_sets(sets)
assert len(output) <= len(sets)
Now we’ll let hypothesis find us a failing test case by just re-running our test suite.
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 2 items
tests/test_thing.py .F
============================ FAILURES =============================
___________ test_that_output_has_fewer_sets_than_input ____________
@h.given(st.lists(st.sets(st.integers())))
> def test_that_output_has_fewer_sets_than_input(sets):
tests/test_thing.py:13:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py27/lib/python2.7/site-packages/hypothesis/core.py:520: in wrapped_test
print_example=True, is_final=True
.tox/py27/lib/python2.7/site-packages/hypothesis/executors.py:58: in default_new_style_executor
return function(data)
.tox/py27/lib/python2.7/site-packages/hypothesis/core.py:110: in run
return test(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sets = []
@h.given(st.lists(st.sets(st.integers())))
def test_that_output_has_fewer_sets_than_input(sets):
output = reduce_sets(sets)
> assert len(output) <= len(sets)
E assert 1 <= 0
E + where 1 = len([set([1, 2])])
E + and 0 = len([])
tests/test_thing.py:15: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_that_output_has_fewer_sets_than_input(sets=[])
=============== 1 failed, 1 passed in 0.10 seconds ================
ERROR: InvocationError: '/Users/conrad/dev/set-coverage-example/.tox/py27/bin/py.test'
_____________________________ summary _____________________________
ERROR: py27: commands failed
Hypothesis found data which will show our implementation doesn’t satisfy that property. Let’s zoom in on the error itself to see what Hypothesis tried.
sets = []
@h.given(st.lists(st.sets(st.integers())))
def test_that_output_has_fewer_sets_than_input(sets):
output = reduce_sets(sets)
> assert len(output) <= len(sets)
E assert 1 <= 0
E + where 1 = len([set([1, 2])])
E + and 0 = len([])
tests/test_thing.py:15: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_that_output_has_fewer_sets_than_input(sets=[])
At the beginning of the failure test case, we can see the input data clearly
labeled: sets = []
. This means that Hypothesis tried to run the test with an
empty list. If we look a bit closer at the assertion that failed we can see
how the property we define interacts with the input data:
> assert len(output) <= len(sets)
E assert 1 <= 0
E + where 1 = len([set([1, 2])])
E + and 0 = len([])
This is a feature of pytest that extends Python’s built-in assertion statement to provide useful debug information. It’s showing us how the empty list has zero elements, where our function is still returning data to solve its own use-case.
Let’s return to our implementation and write the minimum amount of code to pass both the test case and this property:
def reduce_sets(sets):
# remove first and last item
return sets[1:][:-1]
It just so happens that slice syntax will return empty lists if you over-slice and on our test case the input only needs the first and last items pruned to pass.
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 2 items
tests/test_thing.py ..
==================== 2 passed in 0.31 seconds =====================
_____________________________ summary _____________________________
py27: commands succeeded
congratulations :)
Property Two: All elements in the Sets are in the Output from the Input
The next property of a function that does set coverage is that no items are lost in the output list. To implement this property, we’ll just say that the set of all items coming in are the same as the set of all items coming out.
from functools import reduce
# ...
@h.given(st.lists(st.sets(st.integers())))
def test_no_items_lost(sets):
union = lambda a, b: a.union(b)
all_input = reduce(union, sets, set())
all_output = reduce(union, reduce_sets(sets), set())
assert all_input == all_output
Easy enough, let’s see what Hypothesis finds:
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 3 items
tests/test_thing.py ..F
============================ FAILURES =============================
_______________________ test_no_items_lost ________________________
@h.given(st.lists(st.sets(st.integers())))
> def test_no_items_lost(sets):
tests/test_thing.py:20:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py27/lib/python2.7/site-packages/hypothesis/core.py:520: in wrapped_test
print_example=True, is_final=True
.tox/py27/lib/python2.7/site-packages/hypothesis/executors.py:58: in default_new_style_executor
return function(data)
.tox/py27/lib/python2.7/site-packages/hypothesis/core.py:110: in run
return test(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sets = [set([0])]
@h.given(st.lists(st.sets(st.integers())))
def test_no_items_lost(sets):
union = lambda a, b: a.union(b)
all_input = reduce(union, sets, set())
all_output = reduce(union, reduce_sets(sets), set())
> assert all_input == all_output
E assert set([0]) == set([])
E Extra items in the left set:
E 0
E Use -v to get the full diff
tests/test_thing.py:24: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_no_items_lost(sets=[{0}])
Looks like our pruning strategy from before is removing valuable sets. Let’s come up with a way to satisfy all the properties we’ve written so far:
def reduce_sets(sets):
if sets == []:
return sets
# create a set of the items found
o = set()
for s in sets:
o = o.union(s)
return [o]
Property Three: Every output set can be found from the list of input sets
This property basically is making sure no novel sets are created from our function. We can implement this property like this:
@h.given(st.lists(st.sets(st.integers())))
def test_all_sets_come_from_input(sets):
output = reduce_sets(sets)
forbidden_sets = [s for s in output if s not in sets]
assert len(forbidden_sets) == 0
By running our tests, we can find a case that this fails on:
sets = [set([0]), set([1])]
@h.given(st.lists(st.sets(st.integers())))
def test_all_sets_come_from_input(sets):
output = reduce_sets(sets)
forbidden_sets = [s for s in output if s not in sets]
> assert len(forbidden_sets) == 0
E assert 1 == 0
E + where 1 = len([set([0, 1])])
tests/test_thing.py:31: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_all_sets_come_from_input(sets=[{0}, {1}])
It’s clear why this is happening since our implementation is strictly creating a new superset from the sets its given.
Let’s come up with a new way to satisfy all the tests we’ve seen so far:
def _without(objects, i):
"""
get a list of objects without
the element at i
"""
return objects[:i] + objects[i+1:]
def _prune_one(sets):
""" attempt to remove a single item """
for i, e in enumerate(sets):
for s in _without(sets, i):
if s == e:
return _without(sets, i)
return sets
def reduce_sets(sets):
one_removed = _prune_one(sets)
two_removed = _prune_one(one_removed)
return two_removed
Our implementation just looks for items to safely remove, and removes them. I’m only doing this twice because I want only want to solve for the properties we’ve seen so far.
Note: This implementation will break that first unit test we wrote. To prevent this unit test from distracting us, let’s mark it for skipping so that pytest won’t run it.
import pytest
# ...
@pytest.mark.skip(reason='focusing on prop tests for now')
def test_example():
s = [{1}, {1, 2}, {2}]
assert reduce_sets(s) == [{1, 2}]
When you’re skipping a test, it shows up in the suite progress as the letter ‘s’:
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 5 items
tests/test_thing.py s....
=============== 4 passed, 1 skipped in 1.12 seconds ===============
_____________________________ summary _____________________________
py27: commands succeeded
congratulations :)
Property Four: There are no duplicate sets in the output.
The previous solution removes duplicates, but now let’s write a property to make sure all duplicates are removed.
@h.given(st.lists(st.sets(st.integers())))
def test_no_duplicate_sets(sets):
output = reduce_sets(sets)
for e_ix, e in enumerate(output):
for s_ix, s in enumerate(output):
if e_ix == s_ix:
continue
assert e != s, "duplicate in {}".format(output)
Gives us:
sets = [set([]), set([]), set([]), set([])]
@h.given(st.lists(st.sets(st.integers())))
def test_no_duplicate_sets(sets):
output = reduce_sets(sets)
for e_ix, e in enumerate(output):
for s_ix, s in enumerate(output):
if e_ix == s_ix:
continue
> assert e != s, "duplicate in {}".format(output)
E AssertionError: duplicate in [set([]), set([])]
E assert set([]) != set([])
tests/test_thing.py:41: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_no_duplicate_sets(sets=[set(), set(), set(), set()])
Looks like our code fails for inputs with many duplicate sets, which we can cover by just running the prune helper we wrote for as many times as there are sets passed in.
def reduce_sets(sets):
o = sets
for n in range(len(sets)):
o = _prune_one(o)
return o
Property Five: No set in the output list is a subset of any other set in that list.
We ultimately want to remove anything we can from the list of sets passed in. Removing duplicates is a good, but we can do better than that because anything that is a subset of another set can also be tossed out. By removing a subset, we will always have the same number of elements covered in our output.
Let’s write a property to look for sets that are subsets of others in the output.
@h.given(st.lists(st.sets(st.integers())))
def test_no_subsets(sets):
output = reduce_sets(sets)
for e_ix, e in enumerate(output):
for s_ix, s in enumerate(output):
if e_ix == s_ix:
continue
assert not e.issubset(s), "subset found in {}".format(output)
And Hypothesis tells us:
sets = [set([0]), set([])]
@h.given(st.lists(st.sets(st.integers())))
def test_no_subsets(sets):
output = reduce_sets(sets)
for e_ix, e in enumerate(output):
for s_ix, s in enumerate(output):
if e_ix == s_ix:
continue
> assert not e.issubset(s), "subset found in {}".format(output)
E AssertionError: subset found in [set([0]), set([])]
E assert not True
E + where True = <built-in method issubset of set object at 0x10cbab3f0>(set([0]))
E + where <built-in method issubset of set object at 0x10cbab3f0> = set([]).issubset
tests/test_thing.py:53: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_no_subsets(sets=[{0}, set()])
We could solve this immediately with a one-line change, but I want to play with this failure case a bit first. The empty set are subsets of all other sets, so let’s write a property especially for that. Our function should never return anything with an empty set in it.
@h.given(st.lists(st.sets(st.integers())))
def test_no_empty_sets(sets):
output = reduce_sets(sets)
assert set([]) not in output
And to solve for it:
def reduce_sets(sets):
o = [s for s in sets if s != set([])]
for n in range(len(sets)):
o = _prune_one(o)
return o
Now we have a more interesting failure case on our original property:
sets = [set([0]), set([0, 1])]
@h.given(st.lists(st.sets(st.integers())))
def test_no_subsets(sets):
output = reduce_sets(sets)
for e_ix, e in enumerate(output):
for s_ix, s in enumerate(output):
if e_ix == s_ix:
continue
> assert not e.issubset(s), "subset found in {}".format(output)
E AssertionError: subset found in [set([0]), set([0, 1])]
E assert not True
E + where True = <built-in method issubset of set object at 0x1077de4d8>(set([0, 1]))
E + where <built-in method issubset of set object at 0x1077de4d8> = set([0]).issubset
tests/test_thing.py:59: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_no_subsets(sets=[{0}, {0, 1}])
So we can see that {0}
is sneaking into our output even though element 0
is
covered by the set {0, 1}
. Let’s tweak our pruning function to prune for
elements that can find their superset. This still satisfies the previous
property because duplicate sets are both supersets and subsets of each other.
def _prune_one(sets):
""" attempt to remove a single item """
for i, e in enumerate(sets):
for s in _without(sets, i):
if s.issuperset(e): # changed from 's == e'
return _without(sets, i)
return sets
def reduce_sets(sets):
o = [s for s in sets if s != set([])]
for n in range(len(sets)):
o = _prune_one(o)
return o
And now running our test suite again:
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 7 items
tests/test_thing.py s......
=============== 6 passed, 1 skipped in 1.75 seconds ===============
_____________________________ summary _____________________________
Success! Everything passes! We can even lift that skip
mark we added before.
Property Six: No output set is a subset of the other sets combined.
This property is actually just a small extension of the previous one. Checking
to see if a set has a superset isn’t going far enough. If our function had
some sort of input data and it returned [{1, 2}, {3, 4}, {2, 3}]
you can tell
that it’s incomplete because the set {2, 3}
can be covered from elements
found in the other two sets. Even though it isn’t a strict subset of either,
it is redundant and safe-to-prune because all of its contents are found in
other sets.
The implementation of this property looks like this:
@h.given(st.lists(st.sets(st.integers(min_value=-1, max_value=5), max_size=3), max_size=8))
def test_no_sibling_superset_cover_a_set_in_output(sets):
union = lambda a, b: a.union(b)
without = lambda o, i: o[:i] + o[i+1:]
output = reduce_sets(sets)
for i, e in enumerate(output):
siblings_superset = reduce(union, without(output, i), set())
assert not siblings_superset.issuperset(e)
And after multiple trials I finally got it to crash:
sets = [set([0, 2]), set([0, 1]), set([1, 2])]
@h.given(st.lists(st.sets(st.integers(min_value=-1, max_value=5), max_size=3), max_size=8))
def test_no_sibling_superset_cover_a_set_in_output(sets):
union = lambda a, b: a.union(b)
without = lambda o, i: o[:i] + o[i+1:]
output = reduce_sets(sets)
for i, e in enumerate(output):
siblings_superset = reduce(union, without(output, i), set())
> assert not siblings_superset.issuperset(e)
E assert not True
E + where True = <built-in method issuperset of set object at 0x1042cd878>(set([0, 2]))
E + where <built-in method issuperset of set object at 0x1042cd878> = set([0, 1, 2]).issuperset
tests/test_thing.py:68: AssertionError
--------------------------- Hypothesis ----------------------------
Falsifying example: test_no_sibling_superset_cover_a_set_in_output(sets=[{0, 2}, {0, 1}, {1, 2}])
The fix to this error is very similar to the last property we implemented. We actually just have to check to make sure a given set is not a subset of all the other sets combined and it will have the same effect as if we were just checking the sibling sets individually. Because math.
def _union_all(sets):
return reduce(lambda a, b: a.union(b), sets, set())
def _prune_one(sets):
""" attempt to remove a single item """
for i, e in enumerate(sets):
s = _union_all(_without(sets, i))
if s.issuperset(e):
return _without(sets, i)
return sets
def reduce_sets(sets):
o = [s for s in sets if s != set([])]
for n in range(len(sets)):
o = _prune_one(o)
return o
And there we have it, one step closer to our set coverage.
Property Seven: The output list has the smallest possible number of sets
This property is tricky, or at least tedious. How do we prove that a list of sets is the smallest possible number of sets? To show how our solution is incomplete, we have to come up with a specific failing test case:
def test_hard_coverage():
s = [
{1, 2}, {3, 4},
{1}, {2, 3}, {4}
]
assert reduce_sets(s) == [{1, 2}, {3, 4}]
Another possible solution could be [{1}, {2, 3}, {4}]
, among others, but no
solution is as small in length as the one in the test.
Running pytest with the -v
flag, we can see a completely different solution
was found.
$ tox -e py27 -- -v
...
============================ FAILURES =============================
_______________________ test_hard_coverage ________________________
def test_hard_coverage():
s = [
{1, 2}, {3, 4},
{1}, {2, 3}, {4}
]
> assert reduce_sets(s) == [{1, 2}, {3, 4}]
E assert [set([1]), se...3]), set([4])] == [set([1, 2]), set([3, 4])]
E At index 0 diff: set([1]) != set([1, 2])
E Left contains more items, first extra item: set([4])
E Full diff:
E - [set([1]), set([2, 3]), set([4])]
E + [set([1, 2]), set([3, 4])]
tests/test_thing.py:75: AssertionError
=============== 1 failed, 8 passed in 2.35 seconds ================
ERROR: InvocationError: '/Users/conrad/dev/set-coverage-example/.tox/py27/bin/py.test -v'
Our code greedily converges on the wrong solution. It finds something that
satisfies all the other properties, but it isn’t the smallest solution
possible. The reason it fails is that it first considers the two sets {1, 2}
and {3, 4}
for removal. Because the last 3 elements have all the elements of
the first two, the two ideal sets get removed.
The only way to find an ideal solution would be to generate all possible solutions that satisfy the above properties, and then grab the smallest one. In other words, to implement the checker of this property, you have to implement set coverage yourself, which is the heart of what defines an NP-Complete problem. If you could implement the solution to this in polynomial time, you win Computer Science. Every problem as hard as this one could be solved by your solution too.
So this is the one property I will not implement with Hypothesis. It would have to be a strict reimplementation of the solution, maybe if there were more than one way to implement it we could validate that the two solutions are equivalent with enough data, but for now a few pointed unit tests will have to do.
Here’s a pretty slow implementation of set coverage:
from itertools import combinations, chain
# ...
def brute_force_set_coverage(sets):
"""
most naiive approach to set coverage.
Try all solutions
Filter non-covering solutions
Choose smallest solution
"""
o = [s for s in sets if s != set([])]
if len(o) == 0:
return o
goal_set = _union_all(o)
solution_generators = [combinations(o, i + 1) for i in range(len(o))]
all_possible_solutions = chain.from_iterable(solution_generators)
valid_solutions = [i for i in all_possible_solutions if _union_all(i) == goal_set]
return list(sorted((i for i in valid_solutions), key=len)[0])
But before we can swap it out for tests, we have to adjust how our generators work. Right now, this version will take a much longer time to find a solution as the list of sets grows in size, so lets put a hard limit on how big sets get.
First let’s refactor our hypothesis generators to source from the same function:
def list_of_sets():
return st.lists(st.sets(st.integers()))
@h.given(list_of_sets())
def test_that_output_has_fewer_sets_than_input(sets):
output = brute_force_set_coverage(sets)
assert len(output) <= len(sets)
@h.given(list_of_sets())
def test_no_items_lost(sets):
union = lambda a, b: a.union(b)
all_input = reduce(union, sets, set())
all_output = reduce(union, brute_force_set_coverage(sets), set())
assert all_input == all_output
And then put some limitations on the size of the lists and sets that get generated by this generator.
def list_of_sets():
return st.lists(
st.sets(
st.integers(max_value=8),
max_size=8
),
max_size=16
)
Even by limiting it to these small numbers, the entire test suite can take several seconds longer than it was before:
======================= test session starts =======================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/conrad/dev/set-coverage-example, inifile:
plugins: hypothesis-3.4.1
collected 9 items
tests/test_thing.py .........
==================== 9 passed in 8.99 seconds =====================
_____________________________ summary _____________________________
py27: commands succeeded
congratulations :)
To view all the code from this post, you can find it on Github.