Cover Page

7. Concurrency and Parallelism

Concurrency enables a computer to do many different things seemingly at the same time. For example, on a computer with one CPU core, the operating system rapidly changes which program is running on the single processor. In doing so, it interleaves execution of the programs, providing the illusion that the programs are running simultaneously.

Parallelism, in contrast, involves actually doing many different things at the same time. A computer with multiple CPU cores can execute multiple programs simultaneously. Each CPU core runs the instructions of a separate program, allowing each program to make forward progress during the same instant.

Within a single program, concurrency is a tool that makes it easier for programmers to solve certain types of problems. Concurrent programs enable many distinct paths of execution, including separate streams of I/O, to make forward progress in a way that seems to be both simultaneous and independent.

The key difference between parallelism and concurrency is speedup. When two distinct paths of execution in a program make forward progress in parallel, the time it takes to do the total work is cut in half; the speed of execution is faster by a factor of two. In contrast, concurrent programs may run thousands of separate paths of execution seemingly in parallel but provide no speedup for the total work.

Python makes it easy to write concurrent programs in a variety of styles. Threads support a relatively small amount of concurrency, while coroutines enable vast numbers of concurrent functions. Python can also be used to do parallel work through system calls, subprocesses, and C extensions. But it can be very difficult to make concurrent Python code truly run in parallel. It’s important to understand how to best utilize Python in these different situations.

Item 52: Use `subprocess` to Manage Child Processes

Python has battle-hardened libraries for running and managing child processes. This makes it a great language for gluing together other tools, such as command-line utilities. When existing shell scripts get complicated, as they often do over time, graduating them to a rewrite in Python for the sake of readability and maintainability is a natural choice.

Child processes started by Python are able to run in parallel, enabling you to use Python to consume all of the CPU cores of a machine and maximize the throughput of programs. Although Python itself may be CPU bound (see Item 53: “Use Threads for Blocking I/O, Avoid for Parallelism”), it’s easy to use Python to drive and coordinate CPU-intensive workloads.

Python has many ways to run subprocesses (e.g., os.popen, os.exec*), but the best choice for managing child processes is to use the subprocess built-in module. Running a child process with subprocess is simple. Here, I use the module’s run convenience function to start a process, read its output, and verify that it terminated cleanly:

Table of Contents for Effective Python: 90 Specific Ways to Write Better Python, 2nd Edition

7. Concurrency and Parallelism

Item 52: Use subprocess to Manage Child Processes

Item 53: Use Threads for Blocking I/O, Avoid for Parallelism

Item 54: Use Lock to Prevent Data Races in Threads

Item 55: Use Queue to Coordinate Work Between Threads

Queue to the Rescue

Item 56: Know How to Recognize When Concurrency Is Necessary

Item 57: Avoid Creating New Thread Instances for On-demand Fan-out

Item 58: Understand How Using Queue for Concurrency Requires Refactoring

Item 59: Consider ThreadPoolExecutor When Threads Are Necessary for Concurrency

Item 60: Achieve Highly Concurrent I/O with Coroutines

Item 61: Know How to Port Threaded I/O to asyncio

Item 62: Mix Threads and Coroutines to Ease the Transition to asyncio

Item 63: Avoid Blocking the asyncio Event Loop to Maximize Responsiveness

Item 64: Consider concurrent.futures for True Parallelism

Highlight

Highlight

Table of Contents for
Effective Python: 90 Specific Ways to Write Better Python, 2nd Edition

Item 52: Use `subprocess` to Manage Child Processes

Item 54: Use `Lock` to Prevent Data Races in Threads

Item 55: Use `Queue` to Coordinate Work Between Threads

`Queue` to the Rescue

Item 57: Avoid Creating New `Thread` Instances for On-demand Fan-out

Item 58: Understand How Using `Queue` for Concurrency Requires Refactoring

Item 59: Consider `ThreadPoolExecutor` When Threads Are Necessary for Concurrency

Item 61: Know How to Port Threaded I/O to `asyncio`

Item 62: Mix Threads and Coroutines to Ease the Transition to `asyncio`

Item 63: Avoid Blocking the `asyncio` Event Loop to Maximize Responsiveness

Item 64: Consider `concurrent.futures` for True Parallelism