Python doesn’t have compile-time static type checking. There’s nothing in the interpreter that will ensure that your program will work correctly when you run it. Python does support optional type annotations that can be used in static analysis to detect many kinds of bugs (see Item 90: “Consider Static Analysis via typing
to Obviate Bugs” for details). However, it’s still fundamentally a dynamic language, and anything is possible. With Python, you ultimately don’t know if the functions your program calls will be defined at runtime, even when their existence is evident in the source code. This dynamic behavior is both a blessing and a curse.
The large numbers of Python programmers out there say it’s worth going without compile-time static type checking because of the productivity gained from the resulting brevity and simplicity. But most people using Python have at least one horror story about a program encountering a boneheaded error at runtime. One of the worst examples I’ve heard of involved a SyntaxError
being raised in production as a side effect of a dynamic import (see Item 88: “Know How to Break Circular Dependencies”), resulting in a crashed server process. The programmer I know who was hit by this surprising occurrence has since ruled out using Python ever again.
But I have to wonder, why wasn’t the code more well tested before the program was deployed to production? Compile-time static type safety isn’t everything. You should always test your code, regardless of what language it’s written in. However, I’ll admit that in Python it may be more important to write tests to verify correctness than in other languages. Luckily, the same dynamic features that create risks also make it extremely easy to write tests for your code and to debug malfunctioning programs. You can use Python’s dynamic nature and easily overridable behaviors to implement tests and ensure that your programs work as expected.
You should think of tests as an insurance policy on your code. Good tests give you confidence that your code is correct. If you refactor or expand your code, tests that verify behavior—not implementation—make it easy to identify what’s changed. It sounds counterintuitive, but having good tests actually makes it easier to modify Python code, not harder.
repr
Strings for Debugging OutputWhen debugging a Python program, the print
function and format strings (see Item 4: “Prefer Interpolated F-Strings Over C-style Format Strings and str.format
”), or output via the logging
built-in module, will get you surprisingly far. Python internals are often easy to access via plain attributes (see Item 42: “Prefer Public Attributes Over Private Ones”). All you need to do is call print
to see how the state of your program changes while it runs and understand where it goes wrong.
The print
function outputs a human-readable string version of whatever you supply it. For example, printing a basic string prints the contents of the string without the surrounding quote characters:
print('foo bar') >>> foo bar
This is equivalent to all of these alternatives:
Calling the str
function before passing the value to print
Using the '%s'
format string with the %
operator
Default formatting of the value with an f-string
Calling the format
built-in function
Explicitly calling the __format__
special method
Explicitly calling the __str__
special method
Here, I verify this behavior:
my_value = 'foo bar' print(str(my_value)) print('%s' % my_value) print(f'{my_value}') print(format(my_value)) print(my_value.__format__('s')) print(my_value.__str__()) >>> foo bar foo bar foo bar foo bar foo bar foo bar
The problem is that the human-readable string for a value doesn’t make it clear what the actual type and its specific composition are. For example, notice how in the default output of print
, you can’t distinguish between the types of the number 5
and the string '5'
:
print(5) print('5') int_value = 5 str_value = '5' print(f'{int_value} == {str_value} ?') >>> 5 5 5 == 5 ?
If you’re debugging a program with print
, these type differences matter. What you almost always want while debugging is to see the repr
version of an object. The repr
built-in function returns the printable representation of an object, which should be its most clearly understandable string representation. For most built-in types, the string returned by repr
is a valid Python expression:
a = '\x07' print(repr(a)) >>> '\x07'
Passing the value from repr
to the eval
built-in function should result in the same Python object that you started with (and, of course, in practice you should only use eval
with extreme caution):
b = eval(repr(a)) assert a == b
When you’re debugging with print
, you should call repr
on a value before printing to ensure that any difference in types is clear:
print(repr(5)) print(repr('5')) >>> 5 '5'
This is equivalent to using the '%r'
format string with the %
operator or an f-string with the !r
type conversion:
print('%r' % 5) print('%r' % '5') int_value = 5 str_value = '5' print(f'{int_value!r} != {str_value!r}') >>> 5 '5' 5 != '5'
For instances of Python classes, the default human-readable string value is the same as the repr
value. This means that passing an instance to print
will do the right thing, and you don’t need to explicitly call repr
on it. Unfortunately, the default implementation of repr
for object
subclasses isn’t especially helpful. For example, here I define a simple class and then print one of its instances:
class OpaqueClass: def __init__(self, x, y): self.x = x self.y = y obj = OpaqueClass(1, 'foo') print(obj) >>> <__main__.OpaqueClass object at 0x10963d6d0>
This output can’t be passed to the eval
function, and it says nothing about the instance fields of the object.
There are two solutions to this problem. If you have control of the class, you can define your own __repr__
special method that returns a string containing the Python expression that re-creates the object. Here, I define that function for the class above:
class BetterClass: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return f'BetterClass({self.x!r}, {self.y!r})'
Now the repr
value is much more useful:
obj = BetterClass(2, 'bar') print(obj) >>> BetterClass(2, 'bar')
When you don’t have control over the class definition, you can reach into the object’s instance dictionary, which is stored in the __dict__
attribute. Here, I print out the contents of an OpaqueClass
instance:
obj = OpaqueClass(4, 'baz') print(obj.__dict__) >>> {'x': 4, 'y': 'baz'}
Things to Remember
✦ Calling print
on built-in Python types produces the humanreadable string version of a value, which hides type information.
✦ Calling repr
on built-in Python types produces the printable string version of a value. These repr
strings can often be passed to the eval
built-in function to get back the original value.
✦ %s
in format strings produces human-readable strings like str
. %r
produces printable strings like repr
. F-strings produce humanreadable strings for replacement text expressions unless you specify the !r
suffix.
✦ You can define the __repr__
special method on a class to customize the printable representation of instances and provide more detailed debugging information.
TestCase
SubclassesThe canonical way to write tests in Python is to use the unittest
built-in module. For example, say I have the following utility function defined in utils.py
that I would like to verify works correctly across a variety of inputs:
# utils.py def to_str(data): if isinstance(data, str): return data elif isinstance(data, bytes): return data.decode('utf-8') else: raise TypeError('Must supply str or bytes, ' 'found: %r' % data)
To define tests, I create a second file named test_utils.py
or utils_test.py
—the naming scheme you prefer is a style choice—that contains tests for each behavior that I expect:
# utils_test.py from unittest import TestCase, main from utils import to_str class UtilsTestCase(TestCase): def test_to_str_bytes(self): self.assertEqual('hello', to_str(b'hello')) def test_to_str_str(self): self.assertEqual('hello', to_str('hello')) def test_failing(self): self.assertEqual('incorrect', to_str('hello')) if __name__ == '__main__': main()
Then, I run the test file using the Python command line. In this case, two of the test methods pass and one fails, with a helpful error message about what went wrong:
$ python3 utils_test.py F.. =============================================================== FAIL: test_failing (__main__.UtilsTestCase) --------------------------------------------------------------- Traceback (most recent call last): File "utils_test.py", line 15, in test_failing self.assertEqual('incorrect', to_str('hello')) AssertionError: 'incorrect' != 'hello' - incorrect + hello --------------------------------------------------------------- Ran 3 tests in 0.002s FAILED (failures=1)
Tests are organized into TestCase
subclasses. Each test case is a method beginning with the word test
. If a test method runs without raising any kind of Exception
(including AssertionError
from assert
statements), the test is considered to have passed successfully. If one test fails, the TestCase
subclass continues running the other test methods so you can get a full picture of how all your tests are doing instead of stopping at the first sign of trouble.
If you want to iterate quickly to fix or improve a specific test, you can run only that test method by specifying its path within the test module on the command line:
$ python3 utils_test.py UtilsTestCase.test_to_str_bytes . --------------------------------------------------------------- Ran 1 test in 0.000s OK
You can also invoke the debugger from directly within test methods at specific breakpoints in order to dig more deeply into the cause of failures (see Item 80: “Consider Interactive Debugging with pdb
” for how to do that).
The TestCase
class provides helper methods for making assertions in your tests, such as assertEqual
for verifying equality, assertTrue
for verifying Boolean expressions, and many more (see help(TestCase)
for the full list). These are better than the built-in assert
statement because they print out all of the inputs and outputs to help you understand the exact reason the test is failing. For example, here I have the same test case written with and without using a helper assertion method:
# assert_test.py from unittest import TestCase, main from utils import to_str class AssertTestCase(TestCase): def test_assert_helper(self): expected = 12 found = 2 * 5 self.assertEqual(expected, found) def test_assert_statement(self): expected = 12 found = 2 * 5 assert expected == found if __name__ == '__main__': main()
Which of these failure messages seems more helpful to you?
$ python3 assert_test.py FF =============================================================== FAIL: test_assert_helper (__main__.AssertTestCase) --------------------------------------------------------------- Traceback (most recent call last): File "assert_test.py", line 16, in test_assert_helper self.assertEqual(expected, found) AssertionError: 12 != 10 =============================================================== FAIL: test_assert_statement (__main__.AssertTestCase) --------------------------------------------------------------- Traceback (most recent call last): File "assert_test.py", line 11, in test_assert_statement assert expected == found AssertionError --------------------------------------------------------------- Ran 2 tests in 0.001s FAILED (failures=2)
There’s also an assertRaises
helper method for verifying exceptions that can be used as a context manager in with
statements (see Item 66: “Consider contextlib
and with
Statements for Reusable try
/finally
Behavior” for how that works). This appears similar to a try
/except
statement and makes it abundantly clear where the exception is expected to be raised:
# utils_error_test.py from unittest import TestCase, main from utils import to_str class UtilsErrorTestCase(TestCase): def test_to_str_bad(self): with self.assertRaises(TypeError): to_str(object()) def test_to_str_bad_encoding(self): with self.assertRaises(UnicodeDecodeError): to_str(b'\xfa\xfa') if __name__ == '__main__': main()
You can define your own helper methods with complex logic in TestCase
subclasses to make your tests more readable. Just ensure that your method names don’t begin with the word test
, or they’ll be run as if they’re test cases. In addition to calling TestCase
assertion methods, these custom test helpers often use the fail
method to clarify which assumption or invariant wasn’t met. For example, here I define a custom test helper method for verifying the behavior of a generator:
# helper_test.py from unittest import TestCase, main def sum_squares(values): cumulative = 0 for value in values: cumulative += value ** 2 yield cumulative class HelperTestCase(TestCase): def verify_complex_case(self, values, expected): expect_it = iter(expected) found_it = iter(sum_squares(values)) test_it = zip(expect_it, found_it) for i, (expect, found) in enumerate(test_it): self.assertEqual( expect, found, f'Index {i} is wrong') # Verify both generators are exhausted try: next(expect_it) except StopIteration: pass else: self.fail('Expected longer than found') try: next(found_it) except StopIteration: pass else: self.fail('Found longer than expected') def test_wrong_lengths(self): values = [1.1, 2.2, 3.3] expected = [ 1.1**2, ] self.verify_complex_case(values, expected) def test_wrong_results(self): values = [1.1, 2.2, 3.3] expected = [ 1.1**2, 1.1**2 + 2.2**2, 1.1**2 + 2.2**2 + 3.3**2 + 4.4**2, ] self.verify_complex_case(values, expected) if __name__ == '__main__': main()
The helper method makes the test cases short and readable, and the outputted error messages are easy to understand:
$ python3 helper_test.py FF =============================================================== FAIL: test_wrong_lengths (__main__.HelperTestCase) --------------------------------------------------------------- Traceback (most recent call last): File "helper_test.py", line 43, in test_wrong_lengths self.verify_complex_case(values, expected) File "helper_test.py", line 34, in verify_complex_case self.fail('Found longer than expected') AssertionError: Found longer than expected =============================================================== FAIL: test_wrong_results (__main__.HelperTestCase) --------------------------------------------------------------- Traceback (most recent call last): File "helper_test.py", line 52, in test_wrong_results self.verify_complex_case(values, expected) File "helper_test.py", line 24, in verify_complex_case f'Index {i} is wrong') AssertionError: 36.3 != 16.939999999999998 : Index 2 is wrong --------------------------------------------------------------- Ran 2 tests in 0.002s FAILED (failures=2)
I usually define one TestCase
subclass for each set of related tests. Sometimes, I have one TestCase
subclass for each function that has many edge cases. Other times, a TestCase
subclass spans all functions in a single module. I often create one TestCase
subclass for testing each basic class and all of its methods.
The TestCase
class also provides a subTest
helper method that enables you to avoid boilerplate by defining multiple tests within a single test method. This is especially helpful for writing data-driven tests, and it allows the test method to continue testing other cases even after one of them fails (similar to the behavior of TestCase
with its contained test methods). To show this, here I define an example data-driven test:
# data_driven_test.py from unittest import TestCase, main from utils import to_str class DataDrivenTestCase(TestCase): def test_good(self): good_cases = [ (b'my bytes', 'my bytes'), ('no error', b'no error'), # This one will fail ('other str', 'other str'), ... ] for value, expected in good_cases: with self.subTest(value): self.assertEqual(expected, to_str(value)) def test_bad(self): bad_cases = [ (object(), TypeError), (b'\xfa\xfa', UnicodeDecodeError), ... ] for value, exception in bad_cases: with self.subTest(value): with self.assertRaises(exception): to_str(value) if __name__ == '__main__': main()
The 'no error'
test case fails, printing a helpful error message, but all of the other cases are still tested and confirmed to pass:
$ python3 data_driven_test.py . =============================================================== FAIL: test_good (__main__.DataDrivenTestCase) [no error] --------------------------------------------------------------- Traceback (most recent call last): File "testing/data_driven_test.py", line 18, in test_good self.assertEqual(expected, to_str(value)) AssertionError: b'no error' != 'no error' --------------------------------------------------------------- Ran 2 tests in 0.001s FAILED (failures=1)
Note
Depending on your project’s complexity and testing requirements, the pytest (https://pytest.org) open source package and its large number of community plug-ins can be especially useful.
Things to Remember
✦ You can create tests by subclassing the TestCase
class from the unittest
built-in module and defining one method per behavior you’d like to test. Test methods on TestCase
classes must start with the word test
.
✦ Use the various helper methods defined by the TestCase
class, such as assertEqual
, to confirm expected behaviors in your tests instead of using the built-in assert
statement.
✦ Consider writing data-driven tests using the subTest
helper method in order to reduce boilerplate.
setUp
, tearDown
, setUpModule
, and tearDownModule
TestCase
classes (see Item 76: “Verify Related Behaviors in TestCase
Subclasses”) often need to have the test environment set up before test methods can be run; this is sometimes called the test harness. To do this, you can override the setUp
and tearDown
methods of a TestCase
subclass. These methods are called before and after each test method, respectively, so you can ensure that each test runs in isolation, which is an important best practice of proper testing.
For example, here I define a TestCase
that creates a temporary directory before each test and deletes its contents after each test finishes:
# environment_test.py from pathlib import Path from tempfile import TemporaryDirectory from unittest import TestCase, main class EnvironmentTest(TestCase): def setUp(self): self.test_dir = TemporaryDirectory() self.test_path = Path(self.test_dir.name) def tearDown(self): self.test_dir.cleanup() def test_modify_file(self): with open(self.test_path / 'data.bin', 'w') as f: ... if __name__ == '__main__': main()
When programs get complicated, you’ll want additional tests to verify the end-to-end interactions between your modules instead of only testing code in isolation (using tools like mocks; see Item 78: “Use Mocks to Test Code with Complex Dependencies”). This is the difference between unit tests and integration tests. In Python, it’s important to write both types of tests for exactly the same reason: You have no guarantee that your modules will actually work together unless you prove it.
One common problem is that setting up your test environment for integration tests can be computationally expensive and may require a lot of wall-clock time. For example, you might need to start a database process and wait for it to finish loading indexes before you can run your integration tests. This type of latency makes it impractical to do test preparation and cleanup for every test in the TestCase
class’s setUp
and tearDown
methods.
To handle this situation, the unittest
module also supports module-level test harness initialization. You can configure expensive resources a single time, and then have all TestCase
classes and their test methods run without repeating that initialization. Later, when all tests in the module are finished, the test harness can be torn down a single time. Here, I take advantage of this behavior by defining setUpModule
and tearDownModule
functions within the module containing the TestCase
classes:
# integration_test.py from unittest import TestCase, main def setUpModule(): print('* Module setup') def tearDownModule(): print('* Module clean-up') class IntegrationTest(TestCase): def setUp(self): print('* Test setup') def tearDown(self): print('* Test clean-up') def test_end_to_end1(self): print('* Test 1') def test_end_to_end2(self): print('* Test 2') if __name__ == '__main__': main() $ python3 integration_test.py * Module setup * Test setup * Test 1 * Test clean-up .* Test setup * Test 2 * Test clean-up .* Module clean-up --------------------------------------------------------------- Ran 2 tests in 0.000s OK
I can clearly see that setUpModule
is run by unittest
only once, and it happens before any setUp
methods are called. Similarly, tearDownModule
happens after the tearDown
method is called.
Things to Remember
✦ It’s important to write both unit tests (for isolated functionality) and integration tests (for modules that interact with each other).
✦ Use the setUp
and tearDown
methods to make sure your tests are isolated from each other and have a clean test environment.
✦ For integration tests, use the setUpModule
and tearDownModule
module-level functions to manage any test harnesses you need for the entire lifetime of a test module and all of the TestCase
classes that it contains.
Another common need when writing tests (see Item 76: “Verify Related Behaviors in TestCase
Subclasses”) is to use mocked functions and classes to simulate behaviors when it’s too difficult or slow to use the real thing. For example, say that I need a program to maintain the feeding schedule for animals at the zoo. Here, I define a function to query a database for all of the animals of a certain species and return when they most recently ate:
class DatabaseConnection: ... def get_animals(database, species): # Query the database ... # Return a list of (name, last_mealtime) tuples
How do I get a DatabaseConnection
instance to use for testing this function? Here, I try to create one and pass it into the function being tested:
database = DatabaseConnection('localhost', '4444') get_animals(database, 'Meerkat') >>> Traceback ... DatabaseConnectionError: Not connected
There’s no database running, so of course this fails. One solution is to actually stand up a database server and connect to it in the test. However, it’s a lot of work to fully automate starting up a database, configuring its schema, populating it with data, and so on in order to just run a simple unit test. Further, it will probably take a lot of wallclock time to set up a database server, which would slow down these unit tests and make them harder to maintain.
A better approach is to mock out the database. A mock lets you provide expected responses for dependent functions, given a set of expected calls. It’s important not to confuse mocks with fakes. A fake would provide most of the behavior of the DatabaseConnection
class but with a simpler implementation, such as a basic in-memory, single-threaded database with no persistence.
Python has the unittest.mock
built-in module for creating mocks and using them in tests. Here, I define a Mock
instance that simulates the get_animals
function without actually connecting to the database:
from datetime import datetime from unittest.mock import Mock mock = Mock(spec=get_animals) expected = [ ('Spot', datetime(2019, 6, 5, 11, 15)), ('Fluffy', datetime(2019, 6, 5, 12, 30)), ('Jojo', datetime(2019, 6, 5, 12, 45)), ] mock.return_value = expected
The Mock
class creates a mock function. The return_value
attribute of the mock is the value to return when it is called. The spec
argument indicates that the mock should act like the given object, which is a function in this case, and error if it’s used in the wrong way.
For example, here I try to treat the mock function as if it were a mock object with attributes:
mock.does_not_exist >>> Traceback ... AttributeError: Mock object has no attribute 'does_not_exist'
Once it’s created, I can call the mock, get its return value, and verify that what it returns matches expectations. I use a unique object
value as the database
argument because it won’t actually be used by the mock to do anything; all I care about is that the database
parameter was correctly plumbed through to any dependent functions that needed a DatabaseConnection
instance in order to work (see Item 55: “Use Queue
to Coordinate Work Between Threads” for another example of using sentinel object
instances):
database = object() result = mock(database, 'Meerkat') assert result == expected
This verifies that the mock responded correctly, but how do I know if the code that called the mock provided the correct arguments? For this, the Mock
class provides the assert_called_once_with
method, which verifies that a single call with exactly the given parameters was made:
mock.assert_called_once_with(database, 'Meerkat')
If I supply the wrong parameters, an exception is raised, and any TestCase
that the assertion is used in fails:
mock.assert_called_once_with(database, 'Giraffe')
>>>
Traceback ...
AssertionError: expected call not found.
Expected: mock(<object object at 0x109038790>, 'Giraffe')
Actual: mock(<object object at 0x109038790>, 'Meerkat')
If I actually don’t care about some of the individual parameters, such as exactly which database
object was used, then I can indicate that any value is okay for an argument by using the unittest.mock.ANY
constant. I can also use the assert_called_with
method of Mock
to verify that the most recent call to the mock—and there may have been multiple calls in this case—matches my expectations:
from unittest.mock import ANY mock = Mock(spec=get_animals) mock('database 1', 'Rabbit') mock('database 2', 'Bison') mock('database 3', 'Meerkat') mock.assert_called_with(ANY, 'Meerkat')
ANY
is useful in tests when a parameter is not core to the behavior that’s being tested. It’s often worth erring on the side of under-specifying tests by using ANY
more liberally instead of over-specifying tests and having to plumb through various test parameter expectations.
The Mock
class also makes it easy to mock exceptions being raised:
class MyError(Exception): pass mock = Mock(spec=get_animals) mock.side_effect = MyError('Whoops! Big problem') result = mock(database, 'Meerkat') >>> Traceback ... MyError: Whoops! Big problem
There are many more features available, so be sure to see help(unittest.mock.Mock)
for the full range of options.
Now that I’ve shown the mechanics of how a Mock
works, I can apply it to an actual testing situation to show how to use it effectively in writing unit tests. Here, I define a function to do the rounds of feeding animals at the zoo, given a set of database-interacting functions:
def get_food_period(database, species): # Query the database ... # Return a time delta def feed_animal(database, name, when): # Write to the database ... def do_rounds(database, species): now = datetime.datetime.utcnow() feeding_timedelta = get_food_period(database, species) animals = get_animals(database, species) fed = 0 for name, last_mealtime in animals: if (now - last_mealtime) > feeding_timedelta: feed_animal(database, name, now) fed += 1 return fed
The goal of my test is to verify that when do_rounds
is run, the right animals got fed, the latest feeding time was recorded to the database, and the total number of animals fed returned by the function matches the correct total. In order to do all this, I need to mock out datetime.utcnow
so my tests have a stable time that isn’t affected by daylight saving time and other ephemeral changes. I need to mock out get_food_period
and get_animals
to return values that would have come from the database. And I need to mock out feed_animal
to accept data that would have been written back to the database.
The question is: Even if I know how to create these mock functions and set expectations, how do I get the do_rounds
function that’s being tested to use the mock dependent functions instead of the real versions? One approach is to inject everything as keyword-only arguments (see Item 25: “Enforce Clarity with Keyword-Only and Positional-Only Arguments”):
def do_rounds(database, species, *, now_func=datetime.utcnow, food_func=get_food_period, animals_func=get_animals, feed_func=feed_animal): now = now_func() feeding_timedelta = food_func(database, species) animals = animals_func(database, species) fed = 0 for name, last_mealtime in animals: if (now - last_mealtime) > feeding_timedelta: feed_func(database, name, now) fed += 1 return fed
To test this function, I need to create all of the Mock
instances upfront and set their expectations:
from datetime import timedelta now_func = Mock(spec=datetime.utcnow) now_func.return_value = datetime(2019, 6, 5, 15, 45) food_func = Mock(spec=get_food_period) food_func.return_value = timedelta(hours=3) animals_func = Mock(spec=get_animals) animals_func.return_value = [ ('Spot', datetime(2019, 6, 5, 11, 15)), ('Fluffy', datetime(2019, 6, 5, 12, 30)), ('Jojo', datetime(2019, 6, 5, 12, 45)), ] feed_func = Mock(spec=feed_animal)
Then, I can run the test by passing the mocks into the do_rounds
function to override the defaults:
result = do_rounds( database, 'Meerkat', now_func=now_func, food_func=food_func, animals_func=animals_func, feed_func=feed_func) assert result == 2
Finally, I can verify that all of the calls to dependent functions matched my expectations:
from unittest.mock import call food_func.assert_called_once_with(database, 'Meerkat') animals_func.assert_called_once_with(database, 'Meerkat') feed_func.assert_has_calls( [ call(database, 'Spot', now_func.return_value), call(database, 'Fluffy', now_func.return_value), ], any_order=True)
I don’t verify the parameters to the datetime.utcnow
mock or how many times it was called because that’s indirectly verified by the return value of the function. For get_food_period
and get_animals
, I verify a single call with the specified parameters by using assert_called_once_with
. For the feed_animal
function, I verify that two calls were made—and their order didn’t matter—to write to the database using the unittest.mock.call
helper and the assert_has_calls
method.
This approach of using keyword-only arguments for injecting mocks works, but it’s quite verbose and requires changing every function you want to test. The unittest.mock.patch
family of functions makes injecting mocks easier. It temporarily reassigns an attribute of a module or class, such as the database-accessing functions that I defined above. For example, here I override get_animals
to be a mock using patch
:
from unittest.mock import patch print('Outside patch:', get_animals) with patch('__main__.get_animals'): print('Inside patch: ', get_animals) print('Outside again:', get_animals) >>> Outside patch: <function get_animals at 0x109217040> Inside patch: <MagicMock name='get_animals' id='4454622832'> Outside again: <function get_animals at 0x109217040>
patch
works for many modules, classes, and attributes. It can be used in with
statements (see Item 66: “Consider contextlib
and with
Statements for Reusable try
/finally
Behavior”), as a function decorator (see Item 26: “Define Function Decorators with functools.wraps
”), or in the setUp
and tearDown
methods of TestCase
classes (see Item 76: “Verify Related Behaviors in TestCase
Subclasses”). For the full range of options, see help(unittest.mock.patch)
.
However, patch
doesn’t work in all cases. For example, to test do_rounds
I need to mock out the current time returned by the datetime.utcnow
class method. Python won’t let me do that because the datetime
class is defined in a C-extension module, which can’t be modified in this way:
fake_now = datetime(2019, 6, 5, 15, 45) with patch('datetime.datetime.utcnow'): datetime.utcnow.return_value = fake_now >>> Traceback ... TypeError: can't set attributes of built-in/extension type ➥'datetime.datetime'
To work around this, I can create another helper function to fetch time that can be patched:
def get_do_rounds_time(): return datetime.datetime.utcnow() def do_rounds(database, species): now = get_do_rounds_time() ... with patch('__main__.get_do_rounds_time'): ...
Alternatively, I can use a keyword-only argument for the datetime.utcnow
mock and use patch
for all of the other mocks:
def do_rounds(database, species, *, utcnow=datetime.utcnow): now = utcnow() feeding_timedelta = get_food_period(database, species) animals = get_animals(database, species) fed = 0 for name, last_mealtime in animals: if (now - last_mealtime) > feeding_timedelta: feed_func(database, name, now) fed += 1 return fed
I’m going to go with the latter approach. Now, I can use the patch.multiple
function to create many mocks and set their expectations:
from unittest.mock import DEFAULT with patch.multiple('__main__', autospec=True, get_food_period=DEFAULT, get_animals=DEFAULT, feed_animal=DEFAULT): now_func = Mock(spec=datetime.utcnow) now_func.return_value = datetime(2019, 6, 5, 15, 45) get_food_period.return_value = timedelta(hours=3) get_animals.return_value = [ ('Spot', datetime(2019, 6, 5, 11, 15)), ('Fluffy', datetime(2019, 6, 5, 12, 30)), ('Jojo', datetime(2019, 6, 5, 12, 45)) ]
With the setup ready, I can run the test and verify that the calls were correct inside the with
statement that used patch.multiple
:
result = do_rounds(database, 'Meerkat', utcnow=now_func) assert result == 2 food_func.assert_called_once_with(database, 'Meerkat') animals_func.assert_called_once_with(database, 'Meerkat') feed_func.assert_has_calls( [ call(database, 'Spot', now_func.return_value), call(database, 'Fluffy', now_func.return_value), ], any_order=True)
The keyword arguments to patch.multiple
correspond to names in the __main__
module that I want to override during the test. The DEFAULT
value indicates that I want a standard Mock
instance to be created for each name. All of the generated mocks will adhere to the specification of the objects they are meant to simulate, thanks to the autospec=True
parameter.
These mocks work as expected, but it’s important to realize that it’s possible to further improve the readability of these tests and reduce boilerplate by refactoring your code to be more testable (see Item 79: “Encapsulate Dependencies to Facilitate Mocking and Testing”).
Things to Remember
✦ The unittest.mock
module provides a way to simulate the behavior of interfaces using the Mock
class. Mocks are useful in tests when it’s difficult to set up the dependencies that are required by the code that’s being tested.
✦ When using mocks, it’s important to verify both the behavior of the code being tested and how dependent functions were called by that code, using the Mock.assert_called_once_with
family of methods.
✦ Keyword-only arguments and the unittest.mock.patch
family of functions can be used to inject mocks into the code being tested.
In the previous item (see Item 78: “Use Mocks to Test Code with Complex Dependencies”), I showed how to use the facilities of the unittest.mock
built-in module—including the Mock
class and patch
family of functions—to write tests that have complex dependencies, such as a database. However, the resulting test code requires a lot of boilerplate, which could make it more difficult for new readers of the code to understand what the tests are trying to verify.
One way to improve these tests is to use a wrapper object to encapsulate the database’s interface instead of passing a DatabaseConnection
object to functions as an argument. It’s often worth refactoring your code (see Item 89: “Consider warnings
to Refactor and Migrate Usage” for one approach) to use better abstractions because it facilitates creating mocks and writing tests. Here, I redefine the various database helper functions from the previous item as methods on a class instead of as independent functions:
class ZooDatabase: ... def get_animals(self, species): ... def get_food_period(self, species): ... def feed_animal(self, name, when): ...
Now, I can redefine the do_rounds
function to call methods on a ZooDatabase
object:
from datetime import datetime def do_rounds(database, species, *, utcnow=datetime.utcnow): now = utcnow() feeding_timedelta = database.get_food_period(species) animals = database.get_animals(species) fed = 0 for name, last_mealtime in animals: if (now - last_mealtime) >= feeding_timedelta: database.feed_animal(name, now) fed += 1 return fed
Writing a test for do_rounds
is now a lot easier because I no longer need to use unittest.mock.patch
to inject the mock into the code being tested. Instead, I can create a Mock
instance to represent a ZooDatabase
and pass that in as the database
parameter. The Mock
class returns a mock object for any attribute name that is accessed. Those attributes can be called like methods, which I can then use to set expectations and verify calls. This makes it easy to mock out all of the methods of a class:
from unittest.mock import Mock database = Mock(spec=ZooDatabase) print(database.feed_animal) database.feed_animal() database.feed_animal.assert_any_call() >>> <Mock name='mock.feed_animal' id='4384773408'>
I can rewrite the Mock
setup code by using the ZooDatabase
encapsulation:
from datetime import timedelta from unittest.mock import call now_func = Mock(spec=datetime.utcnow) now_func.return_value = datetime(2019, 6, 5, 15, 45) database = Mock(spec=ZooDatabase) database.get_food_period.return_value = timedelta(hours=3) database.get_animals.return_value = [ ('Spot', datetime(2019, 6, 5, 11, 15)), ('Fluffy', datetime(2019, 6, 5, 12, 30)), ('Jojo', datetime(2019, 6, 5, 12, 55)) ]
Then I can run the function being tested and verify that all dependent methods were called as expected:
result = do_rounds(database, 'Meerkat', utcnow=now_func) assert result == 2 database.get_food_period.assert_called_once_with('Meerkat') database.get_animals.assert_called_once_with('Meerkat') database.feed_animal.assert_has_calls( [ call('Spot', now_func.return_value), call('Fluffy', now_func.return_value), ], any_order=True)
Using the spec
parameter to Mock
is especially useful when mocking classes because it ensures that the code under test doesn’t call a misspelled method name by accident. This allows you to avoid a common pitfall where the same bug is present in both the code and the unit test, masking a real error that will later reveal itself in production:
database.bad_method_name() >>> Traceback ... AttributeError: Mock object has no attribute 'bad_method_name'
If I want to test this program end-to-end with a mid-level integration test (see Item 77: “Isolate Tests from Each Other with setUp
, tearDown
, setUpModule
, and tearDownModule
”), I still need a way to inject a mock ZooDatabase
into the program. I can do this by creating a helper function that acts as a seam for dependency injection. Here, I define such a helper function that caches a ZooDatabase
in module scope (see Item 86: “Consider Module-Scoped Code to Configure Deployment Environments”) by using a global
statement:
DATABASE = None def get_database(): global DATABASE if DATABASE is None: DATABASE = ZooDatabase() return DATABASE def main(argv): database = get_database() species = argv[1] count = do_rounds(database, species) print(f'Fed {count} {species}(s)') return 0
Now, I can inject the mock ZooDatabase
using patch
, run the test, and verify the program’s output. I’m not using a mock datetime.utcnow
here; instead, I’m relying on the database records returned by the mock to be relative to the current time in order to produce similar behavior to the unit test. This approach is more flaky than mocking everything, but it also tests more surface area:
import contextlib import io from unittest.mock import patch with patch('__main__.DATABASE', spec=ZooDatabase): now = datetime.utcnow() DATABASE.get_food_period.return_value = timedelta(hours=3) DATABASE.get_animals.return_value = [ ('Spot', now - timedelta(minutes=4.5)), ('Fluffy', now - timedelta(hours=3.25)), ('Jojo', now - timedelta(hours=3)), ] fake_stdout = io.StringIO() with contextlib.redirect_stdout(fake_stdout): main(['program name', 'Meerkat']) found = fake_stdout.getvalue() expected = 'Fed 2 Meerkat(s)\n' assert found == expected
The results match my expectations. Creating this integration test was straightforward because I designed the implementation to make it easier to test.
Things to Remember
✦ When unit tests require a lot of repeated boilerplate to set up mocks, one solution may be to encapsulate the functionality of dependencies into classes that are more easily mocked.
✦ The Mock
class of the unittest.mock
built-in module simulates classes by returning a new mock, which can act as a mock method, for each attribute that is accessed.
✦ For end-to-end tests, it’s valuable to refactor your code to have more helper functions that can act as explicit seams to use for injecting mock dependencies in tests.
Everyone encounters bugs in code while developing programs. Using the print
function can help you track down the sources of many issues (see Item 75: “Use repr
Strings for Debugging Output”). Writing tests for specific cases that cause trouble is another great way to isolate problems (see Item 76: “Verify Related Behaviors in TestCase
Subclasses”).
But these tools aren’t enough to find every root cause. When you need something more powerful, it’s time to try Python’s built-in interactive debugger. The debugger lets you inspect program state, print local variables, and step through a Python program one statement at a time.
In most other programming languages, you use a debugger by specifying what line of a source file you’d like to stop on, and then execute the program. In contrast, with Python, the easiest way to use the debugger is by modifying your program to directly initiate the debugger just before you think you’ll have an issue worth investigating. This means that there is no difference between starting a Python program in order to run the debugger and starting it normally.
To initiate the debugger, all you have to do is call the breakpoint
built-in function. This is equivalent to importing the pdb
built-in module and running its set_trace
function:
# always_breakpoint.py import math def compute_rmse(observed, ideal): total_err_2 = 0 count = 0 for got, wanted in zip(observed, ideal): err_2 = (got - wanted) ** 2 breakpoint() # Start the debugger here total_err_2 += err_2 count += 1 mean_err = total_err_2 / count rmse = math.sqrt(mean_err) return rmse result = compute_rmse( [1.8, 1.7, 3.2, 6], [2, 1.5, 3, 5]) print(result)
As soon as the breakpoint
function runs, the program pauses its execution before the line of code immediately following the breakpoint
call. The terminal that started the program turns into an interactive Python shell:
$ python3 always_breakpoint.py > always_breakpoint.py(12)compute_rmse() -> total_err_2 += err_2 (Pdb)
At the (Pdb)
prompt, you can type in the names of local variables to see their values printed out (or use p <name>
). You can see a list of all local variables by calling the locals
built-in function. You can import modules, inspect global state, construct new objects, run the help
built-in function, and even modify parts of the running program—whatever you need to do to aid in your debugging.
In addition, the debugger has a variety of special commands to control and understand program execution; type help
to see the full list.
Three very useful commands make inspecting the running program easier:
where
: Print the current execution call stack. This lets you figure out where you are in your program and how you arrived at the breakpoint
trigger.
up
: Move your scope up the execution call stack to the caller of the current function. This allows you to inspect the local variables in higher levels of the program that led to the breakpoint.
down
: Move your scope back down the execution call stack one level.
When you’re done inspecting the current state, you can use these five debugger commands to control the program’s execution in different ways:
step
: Run the program until the next line of execution in the program, and then return control back to the debugger prompt. If the next line of execution includes calling a function, the debugger stops within the function that was called.
next
: Run the program until the next line of execution in the current function, and then return control back to the debugger prompt. If the next line of execution includes calling a function, the debugger will not stop until the called function has returned.
return
: Run the program until the current function returns, and then return control back to the debugger prompt.
continue
: Continue running the program until the next breakpoint is hit (either through the breakpoint
call or one added by a debugger command).
quit
: Exit the debugger and end the program. Run this command if you’ve found the problem, gone too far, or need to make program modifications and try again.
The breakpoint
function can be called anywhere in a program. If you know that the problem you’re trying to debug happens only under special circumstances, then you can just write plain old Python code to call breakpoint
after a specific condition is met. For example, here I start the debugger only if the squared error for a datapoint is more than 1
:
# conditional_breakpoint.py def compute_rmse(observed, ideal): ... for got, wanted in zip(observed, ideal): err_2 = (got - wanted) ** 2 if err_2 >= 1: # Start the debugger if True breakpoint() total_err_2 += err_2 count += 1 ... result = compute_rmse( [1.8, 1.7, 3.2, 7], [2, 1.5, 3, 5]) print(result)
When I run the program and it enters the debugger, I can confirm that the condition was true by inspecting local variables:
$ python3 conditional_breakpoint.py > conditional_breakpoint.py(14)compute_rmse() -> total_err_2 += err_2 (Pdb) wanted 5 (Pdb) got 7 (Pdb) err_2 4
Another useful way to reach the debugger prompt is by using post-mortem debugging. This enables you to debug a program after it’s already raised an exception and crashed. This is especially helpful when you’re not quite sure where to put the breakpoint
function call.
Here, I have a script that will crash due to the 7j
complex number being present in one of the function’s arguments:
# postmortem_breakpoint.py import math def compute_rmse(observed, ideal): ... result = compute_rmse( [1.8, 1.7, 3.2, 7j], # Bad input [2, 1.5, 3, 5]) print(result)
I use the command line python3 -m pdb -c continue <program path>
to run the program under control of the pdb
module. The continue
command tells pdb
to get the program started immediately. Once it’s running, the program hits a problem and automatically enters the interactive debugger, at which point I can inspect the program state:
$ python3 -m pdb -c continue postmortem_breakpoint.py Traceback (most recent call last): File ".../pdb.py", line 1697, in main pdb._runscript(mainpyfile) File ".../pdb.py", line 1566, in _runscript self.run(statement) File ".../bdb.py", line 585, in run exec(cmd, globals, locals) File "<string>", line 1, in <module> File "postmortem_breakpoint.py", line 4, in <module> import math File "postmortem_breakpoint.py", line 16, in compute_rmse rmse = math.sqrt(mean_err) TypeError: can't convert complex to float Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program > postmortem_breakpoint.py(16)compute_rmse() -> rmse = math.sqrt(mean_err) (Pdb) mean_err (-5.97-17.5j)
You can also use post-mortem debugging after hitting an uncaught exception in the interactive Python interpreter by calling the pm
function of the pdb
module (which is often done in a single line as import pdb; pdb.pm()
):
$ python3 >>> import my_module >>> my_module.compute_stddev([5]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "my_module.py", line 17, in compute_stddev variance = compute_variance(data) File "my_module.py", line 13, in compute_variance variance = err_2_sum / (len(data) - 1) ZeroDivisionError: float division by zero >>> import pdb; pdb.pm() > my_module.py(13)compute_variance() -> variance = err_2_sum / (len(data) - 1) (Pdb) err_2_sum 0.0 (Pdb) len(data) 1
Things to Remember
✦ You can initiate the Python interactive debugger at a point of interest directly in your program by calling the breakpoint
built-in function.
✦ The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
✦ pdb
shell commands let you precisely control program execution and allow you to alternate between inspecting program state and progressing program execution.
✦ The pdb
module can be used for debug exceptions after they happen in independent Python programs (using python -m pdb -c continue <program path>
) or the interactive Python interpreter (using import pdb; pdb.pm()
).
tracemalloc
to Understand Memory Usage and LeaksMemory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the referenced object is also cleared from memory, freeing up that space for other data. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected.
In theory, this means that most Python programmers don’t have to worry about allocating or deallocating memory in their programs. It’s taken care of automatically by the language and the CPython runtime. However, in practice, programs eventually do run out of memory due to no longer useful references still being held. Figuring out where a Python program is using or leaking memory proves to be a challenge.
The first way to debug memory usage is to ask the gc
built-in module to list every object currently known by the garbage collector. Although it’s quite a blunt tool, this approach lets you quickly get a sense of where your program’s memory is being used.
Here, I define a module that fills up memory by keeping references:
# waste_memory.py import os class MyObject: def __init__(self): self.data = os.urandom(100) def get_data(): values = [] for _ in range(100): obj = MyObject() values.append(obj) return values def run(): deep_values = [] for _ in range(100): deep_values.append(get_data()) return deep_values
Then, I run a program that uses the gc
built-in module to print out how many objects were created during execution, along with a small sample of allocated objects:
# using_gc.py import gc found_objects = gc.get_objects() print('Before:', len(found_objects)) import waste_memory hold_reference = waste_memory.run() found_objects = gc.get_objects() print('After: ', len(found_objects)) for obj in found_objects[:3]: print(repr(obj)[:100]) >>> Before: 6207 After: 16801 <waste_memory.MyObject object at 0x10390aeb8> <waste_memory.MyObject object at 0x10390aef0> <waste_memory.MyObject object at 0x10390af28> ...
The problem with gc.get_objects
is that it doesn’t tell you anything about how the objects were allocated. In complicated programs, objects of a specific class could be allocated many different ways. Knowing the overall number of objects isn’t nearly as important as identifying the code responsible for allocating the objects that are leaking memory.
Python 3.4 introduced a new tracemalloc
built-in module for solving this problem. tracemalloc
makes it possible to connect an object back to where it was allocated. You use it by taking before and after snapshots of memory usage and comparing them to see what’s changed. Here, I use this approach to print out the top three memory usage offenders in a program:
# top_n.py import tracemalloc tracemalloc.start(10) # Set stack depth time1 = tracemalloc.take_snapshot() # Before snapshot import waste_memory x = waste_memory.run() # Usage to debug time2 = tracemalloc.take_snapshot() # After snapshot stats = time2.compare_to(time1, 'lineno') # Compare snapshots for stat in stats[:3]: print(stat) >>> waste_memory.py:5: size=2392 KiB (+2392 KiB), count=29994 ➥(+29994), average=82 B waste_memory.py:10: size=547 KiB (+547 KiB), count=10001 ➥(+10001), average=56 B waste_memory.py:11: size=82.8 KiB (+82.8 KiB), count=100 ➥(+100), average=848 B
The size and count labels in the output make it immediately clear which objects are dominating my program’s memory usage and where in the source code they were allocated.
The tracemalloc
module can also print out the full stack trace of each allocation (up to the number of frames passed to the tracemalloc.start
function). Here, I print out the stack trace of the biggest source of memory usage in the program:
# with_trace.py import tracemalloc tracemalloc.start(10) time1 = tracemalloc.take_snapshot() import waste_memory x = waste_memory.run() time2 = tracemalloc.take_snapshot() stats = time2.compare_to(time1, 'traceback') top = stats[0] print('Biggest offender is:') print('\n'.join(top.traceback.format())) >>> Biggest offender is: File "with_trace.py", line 9 x = waste_memory.run() File "waste_memory.py", line 17 deep_values.append(get_data()) File "waste_memory.py", line 10 obj = MyObject() File "waste_memory.py", line 5 self.data = os.urandom(100)
A stack trace like this is most valuable for figuring out which particular usage of a common function or class is responsible for memory consumption in a program.
Things to Remember
✦ It can be difficult to understand how Python programs use and leak memory.
✦ The gc
module can help you understand which objects exist, but it has no information about how they were allocated.
✦ The tracemalloc
built-in module provides powerful tools for understanding the sources of memory usage.