Programming with futures: patterns and anti-patterns
Twitter’s Future library is a beautiful abstraction for dealing with concurrency. However, there are code patterns that seem natural or innocuous but can cause real trouble in production systems. This short article outlines a few of the easiest traps to fall into.
An example
Below is a method from a fictional web application that registers a user by calling the Foursquare API to get the user’s profile info, their friend graph and their recent check-ins.
There are some problems with this code.
Anti-pattern #1: Blocking in a yield or a map
The last part of the for
-comprehension desugars to
The problem here is that createDBUser
makes a blocking call to the database.
You should never do blocking work in map
on a Future.
Every Future runs in a thread pool that is (hopefully) tuned for a particular purpose.
Code inside the map
(generally) runs on the thread that completes the Future.
So you’re putting work in a thread pool that wasn’t designed to handle that work.
Furthermore, when you’re dealing with Futures composed from other Futures, it’s often hard to tell by inspection which
Future will be the last to complete (and whose thread pool will run the map
code).
It’s frequently not the “outermost” Future. For example:
It’s also possible that the Future completes before you call map
— in which case the work inside the map
happens in the main thread. This is bad if your callers expect you to to return instantly with a Future.
It’s also possible to cause a deadlock (and yes we’ve seen this in production) if the code inside the map
calls Await
on another thread in the same thread pool — but again, it’s hard to know which thread pool that is.
So instead, set up your own thread pool for blocking work:
And use it like this:
This now desugars to
which is safe.
So ALWAYS yield
a plain value or a simple computation. If you have blocking work, wrap it in a ThreadPool-backed
Future and flatMap
it.
It’s worth noting that in the Scala-native Future library (scala.concurrent.Future
), you must supply an implicit
or explicit execution context when you create a Future. That way, you do have control over where your code executes, so
the above warnings about map
do not apply.
Anti-pattern #2: Too much parallelism
The method apiFriendsF
creates a future for each item in a list of user IDs and collects the results into a single
Future:
But this is too much parallelism! You’ll flood the thread pool with a ton of simultaneous work. Some network or database drivers don’t even allow more than a certain number of concurrent connections, and you’ll get a bunch of exceptions, and you will not have a good day. A better way to do it is to limit how much you are doing in parallel.
The groupedCollect
helper method can be impemented as follows:
The par
parameter lets you specify how much work you want done in parallel. For example, if you specify 5, it will
take 5 items from the list, do them all in parallel, and wait for them to complete before moving on to the next 5 items.
This can be mitigated a different way, by configuring a thread pool with a maximum number of threads, and making sure that all database or network calls go through this pool. This has the advantage of limiting parallelism application-wide, rather than just at a given call site. It still might be a good idea to limit parallelism at an individual call site to prevent it from crowding out other work.
Anti-pattern #3: Not enough parallelism
This code invokes api.getSelfF()
and api.getCategoriesF()
sequentially when they could be run in parallel:
It desugars to
So one waits for the other even though it doesn’t need to. The fix is to invoke the methods outside of the
for
-comprehension.
Likewise, we have:
These two can also be done in parallel. Write it this way instead:
The join
method runs multiple Futures in parallel and collects their results in a tuple.
It also explicitly documents that the two calls will happen in parallel.
Conclusion
Here’s what we ended up with:
Things to note here:
- Nested methods take plain values and return Futures, for great
flatMap
ing. - All the work is set up ahead of time via
val
s and nested methods. - Everything is “glued” together with a
for
-comprehension at the end. - Parallelism and dependencies are made explicit in the
for
-comprehension. - Blocking work is explicitly wrapped in a thread pool.
blog comments powered by Disqus