Befriending the itertools module for efficient iteration in Python
Prerequisites
You will need a working knowledge of Python to get the best out of this article.
Each code sample provided is a complete Python program you can run by pasting the contents in a python file and running the file as python name_of_your_file.py
on your terminal.
For each of the provided code examples, there is an accompanying link to a repl on replit where you can run the code. However, you will need a replit account. You can sign up here
Exploring itertools
The itertools module has various utilities that we can use to perform efficient operations on iterables. Some allow you to combine elements from different iterables, some allow you to group items in an iterable using different keys, some allow you to create iterators that run infinitely but efficiently. Getting familiar with these utilities will save you from reimplementing similar functionality. If you find yourself working with large lists, datasets, or just want to indulge your curiosity, the itertools module is definitely worth a look.
I call them utilities because although they quack and walk like functions, they are actually implemented as classes, except tee
, which is a builtin function.
Let's use the inspect
module to have a high level overview of the itertools
module.
We ignore names beginning with _
since these are generally not meant for public use.
You can run this example on replit.
You should see the following:
itertools.accumulate
accumulate(iterable, function, initial)
Makes an iterator that yields results of applying the function provided cummulatively to the items of the iterable from left to right. The function accepts two arguments; an accumulated total and the next value from the iterable. If function
is not provided, accumulate
returns an iterator that yields cummulative sums of the elements of the iterable. For example, accumulate([1,2,3])
would return an iterator that yields the items 1, 3, 6
, the result of performing (1), (1 + 2), (1 + 2 + 3)
.
If initial
is provided, it becomes the first value in the output iterator and provides a starting point for the accumulation.
Let's look at this example that tracks daily temperatures and records the maximum and minimum temperatures encountered so far.
You can run this example on replit.
You should see this output:
itertools.count
count(start=0, step=1)
Makes an iterator that infinitely yields numbers starting at start
and incremented by step
. If start
is not provided, the counting begins at 0
. If step is not provided, the increment happens by 1
.
SQL uses the AUTO_INCREMENT
keyword to create auto-incrementing ids, starting at 1 by default; although you can provide a starting id. Here is how we can leverage itertools.count
to implement SQL-like autoincrement functionality.
You can run the code above here
We can see that the auto-increment is working.
itertools.cycle
cycle(iterable)
Makes an iterator that consumes and yields elements from an iterable, one by one, until all items are exhasusted, then it repeats again. This happens indefinitely. Can be useful for situations where you want to "cycle" through the same elements over and over again.
Consider the traffic light simmulator below. Notice how color
goes from Red to Yellow to Green and back to Red over and over.
You can run this example on replit.
itertools.repeat
repeat(object, times)
Makes an iterator that returns object
repeatedely, up to times
times. If times
is not provided, yield object
infinitely.
We can use itertools.repeat
together with the csv
module to initialize a CSV template with a 1000 rows:
You can run this example on replit.
itertools.islice
islice(iterable, stop)
islice(iterable, start, stop)
islice(iterable, start, stop, step)
Works similarly to good 'ol slicing but returns an iterator while slicing returns a list or a tuple or whatever type you are slicing.
Let's create a list of numbers, from 0-100:
To get multiples of 10 between 10 and 50 (including 10 & 50), we can create the following slice:
This is the same as writing
Notice how slicing a list returns a list. This can be expensive memory-wise if we are creating large slices. How can we effectively create a slice without putting the whole thing in memory? Enter itertools.islice
. We can create an iterator that yields the same elements although efficiently.
Both multiples
and multiples_iter
can be iterated over to get the same elements. However, multiples_iter
takes up lesser memory compared to multiples
. Let's see:
Full code for reference:
I get the output below when I run the comparison:
You can run this example on replit.
The difference here might not be that life-changing but extrapolated over larger slices can be quite significant.
A few differences between slice and islice
:
islice
does not support negative values forstart
,step
,stop
.- iterators returned by
islice
are not subscriptable. For example, we can not domultiples_iter[0]
itertools.filterfalse
filterfalse(predicate, iterable)
Makes an iterator that "picks" those items for which predicate
returns False
or a falsy value. predicate
can be None
or a function that returns a boolean value, given an element from the iterable. If no predicate function is provided, returns items that are False
or falsy.
Consider this example where we want to filter odd numbers from a list of integers between 1 and 10.
You can run this example on replit. Running this gives the following output:
itertools.chain
chain(*iterables)
Creates an iterator that returns element from the first iterable, then elements from the next iterable and so on. Useful for iterating over multiple iterables as one iterable without creating a new list.
Let's see an example where we combine multiple shopping lists into a cart.
You can run this example on replit. You should see the output below:
itertools.batched
batched(iterable, n)
Introduced in Python 3.12, splits the iterable into "batches" (tuples) of size n
, without loading the entire iterable in memory.
Imagine you have a service where people upload image files and you want to compress them in batches of size n
.
replit does not explicitly support Python 3.12 so you will have to try this example locally.
batched
can also be used to batch requests to external APIs that implement rate limiting.
itertools.chain.from_iterable
chain.from_iterable(iterable)
This is useful for flattening nested sequences. Iterable here can be list of lists, list of tuples.
You can run this code here. The output will be as below:
itertools.compress
compress(data, selectors)
Makes an iterator that returns item in data whose corresponding index in selectors is True
or Truthy.
The length of selectors
should be shorter than or equal to the length of data
. If the length of selectors
is shorter than data
, compress only processes items in data
up to the corresponding length in `selectors.
Let's sat we have two lists. We can iterate over and print the names of registered voters as follows:
You can test this here. Expeected output will be as follows:
itertools.dropwhile
dropwhile(predicate, iterable)
Works by skipping items in the beginning of an iterable as long as the predicate function returns True
for each element. Once it encounters an element for which the predicate function returns False
, return that item and all other subsequent items, regardless of what the predicate function returns.
Imagine a mixed weather Formula 1 race; starts out raining but rain stops at some point and the rest of the race is dry. The drivers pit for slicks (dry weather tires) and their lap times start improving significantly. Let's assume that a dry lap time is expected to be 105 seconds (1 min 45 seconds) or less.
We can use dropwhile
to skip lap times higher than 105, effectively finding the start of dry weather laps. Once a laptime lower than 105 is found, that lap and all subsequent laps are included, even if a driver makes a mistake afterwards and has a laptime higher than 105.
The code below makes use of dropwhile
in the detect_transition
function to find the transition point and filter for dry laps from the provided lap times. I used, as examples my top three favorite drivers in the 2024 grid 😉.
Here is the associated repl for this example. Fork it, run it and you should see the output below:
itertools.groupby
groupby(iterable, key=None)
Accepts an iterable and a key
function and makes an iteraror that groups consecutive elements based on the key function. It's like creating subgroups within the iterable where each subgroup contains elements that are considered equal according to the key function. If a key function is not provided, consecutive identical elements are grouped together. If the same element appeaars later but not consecutively, it starts a new group. This is why it might be important to sort the data, especially if you might want all equal elements grouped together.
Scenario 1: Grouping when key is None
See this repl. This code creates 5 groups as seen in the output below:
Notice how we have two groups of A
. However, if we sort the data first, similar letters will be next to each other and thus grouped together.
Let's run the code again, we can now see that all similar characters are grouped together. Test it out on this repl
Scenario 2: Grouping using a key function
Let's say you have a list of cars and would like to group them by the make.
Running the code above, we see that our cars have indeed been grouped by make. You can confirm by forking and running the code in this repl.
If you would like to group by year, just change the sort key and groupby key from itemgetter(1)
to itemgetter(0)
.
itertools.pairwise
pairwise(iterable)
Makes an iterator that returns successive overlapping pairs from an iterable. Can be useful for time series data analysis.
Consider a scenario where you have the stock price for a hypothetical company over a 5-day period and you would like to analyze the day-to-day price movements.
You can try out this example by forking this repl. Run the code and you should see the output below:
*Note that pairwise
is only available in Python 3.10 and up.
itertools.starmap
starmap(function, iterable)
Makes an iterator that calls function
using arguments obtained from the iterable. starmap
unpacks the items of the nested iterable and passes them to the provided function. So if you provide an iterable that looks like [("Alice", 28, "New York")]
, the function you provide will be called with the arguments "Alice", 28, "New York"
.
Imagine you run a newsletter service and you have a list names and email addresses represented as a list of tuple items. Each tuple contains the name and email address of your subscribers. We can generate emails for each like so:
We call next
to get the next item from emails
. You can also loop over emails
using a for loop.
You can view, fork and run this example here.
Running the code above will produce the output below:
You might notice that starmap's signature here is somewhat similar to that of the inbuilt map function. Both functions accept a function as the first argument and an iterable as well. However, there are two major differences:
map
accepts one or more iterablesmap
does not unpack arguments whereasstarmap
unpacks arguments.
itertools.takewhile
takewhile(predicate, iterable)
Accepts a predicate function and an iterable and makes an iterator that yields elements from the iterable as long as the predicate function returns True
. It stops as soon as the predicate function returns False for an element.
I picked some logs from the ruff linter I use on my editor. The example below uses takewhile
to print logs before 30 August 20:51:49.
Running the code above will lead to the output below. You can run this example here.
itertools.tee
tee(iterable, n=2)
Accepts an iterable and creates n
independent iterators out it. tee
allows you to "split" an iterable into several "copies" that can be consumed independently. Once a tee is created, the original iterable should not be used. However, tee
iterators are not thread-safe. Note that tee
may use significant memory for large iterables.
Here is an example where we split an iterable and use one iterator for computing squares and another for computing logarithms to base 10. The .2f
in the f-string rounds the logs to 2 decimal places.
You can test this example by forking this repl here.
The expected output is as follows:
itertools.zip_longest
zip_longest(*iterables, fill_value=None)
Makes an iterator that combines elements from multiple lists/ iterables. However, unlike the regular zip
function, zip_longest
doesn't stop when the shortest list is exhausted.
fill_value is used to "fill" in the value when the shoter list/iterable is out of items. Think of it like a default value. If fill_value
is not provided, its value is set to None
.
Let's have a look at an example that uses zip_longest
to combine incomplete sales data from different branches of a supermarket.
You can test this example by forking and running this repl. The expected output will be:
itertools.product
product(*iterables, repeat=1)
Makes an iterator that returns the cartesian product of input iterables. Imagine you have two or more lists, and you want to combine every item from the first list with every item from the second list (and so on if you have more lists).
For example, product([1, 2], [3, 4, 5])
, will create an iterator that yields the items (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)
.
Conside this code below that generates all possible combinations of tshirt sizes and colors.
You can fork this repl and run the example to get the output below:
itertools.combinations
combinations(iterable, r)
Makes an iterator that generates all possible combinations of length r
from an iterable.
Order doesn't matter, and a combination can not have repeated elements.
itertools.combinations_with_replacement
combinations_with_replacement(iterable, r)
Makes an iterator that yields combinations of length r
from iterable
, but a combination can have repeated elements.
Order doesn't matter.
itertools.permutations
permutations(iterable, r)
Makes an iterable that yields all possible arrangements of a specified length from an iterable. Order matters, and elements are not repeated.
Let's look at an example that compares combinations
, combinations_with_replacement
and permutations
.
We should get the output below:
Apologies, I ran out out of repls. You'll have to run this one locally.
Conclusion
itertools is an amazing module. I hope you have learnt something.
P.S. If you find any typos, just pretend they're artisanal imperfections. Thanks for reading. You can share feedback with me at kiuraalex@gmail.com