HomeSealhub
Avoiding race conditions when caching async stuff

Let's assume you have a function that's expensive to compute:

function expensiveFunction(input: string): string {...}

It takes one string and does something cpu-expensive with it that turns it into another string. It's a synchronous function, so it blocks the main thread for the entirety of its execution. We want to avoid executing it twice for the same value, so we devise a simple cache:

const cache: Record<string, string> = {}

We can now wrap the function to create a "cached" version of it:

invalid async cache
function cachedExpensiveFunction(input:string) : string{
   if(cache[input]){
       return cache[input]
   }
   const result = expensiveFunction(input);
   cache[input] = result;
   return result;
}

The way it works is simple - if the value is already in the cache, return that value. Otherwise, calculate that value using the expensive function, save it in the cache and return it.

This way, the cachedExpensiveFunction always returns the correct value - but it's going to be considerably faster when executed with inputs that have already been "seen" before.

console.log(cachedExpensiveFunction("somestring"))        // returns the value slowly
console.log(cachedExpensiveFunction("some-other-string")) // returns the value slowly
console.log(cachedExpensiveFunction("somestring"))        // returns the value fast!

But what happens when the expensiveFunction is asynchronous?

async function asyncExpensiveFunction(input: string): Promise<string> {...}

Things are a little bit more complex now, and the naive aproach won't work

The naive approach that doesn't work

One might think that just turning the cache function into async/await might resolve the issue like so:

async function asyncCachedExpensiveFunction(input:string) : Promise<string> {  // let's add `async` here...
   if(cache[input]){
       return cache[input]
   }
   const result = await expensiveFunction(input); // and `await` here!
   cache[input] = result;
   return result;
}

Unfortunately, this is not enough. The above implementation does not prevent calling the asyncExpensiveFunction multiple times with the same input!

await Promise.all([
  asyncCachedExpensiveFunction("somestring"),
  asyncCachedExpensiveFunction("somestring"),
]);

With the above setup, the asyncExpensiveFunction will be called twice. So our cache doesn't work. But why?

Both of the calls for asyncExpensiveFunction are executed "in parallel". This means that when each of the calls gets to the if(cache[input]) part, nothing is there in the cache, so they decide to calculate the value by running the expensive function. The first function call that finishes writes the result to cache, which is promptly overwriten by the second function call.

This is the most problematic part:

const result = await expensiveFunction(input); // and `await` here!
cache[input] = result;

Look at that await here. When we run await, we tell JS to stop current execution and only resume it after the promise following the await keyword is resolved or rejected. While the current execution is stopped, JS is free to execute any other function that's queued up - in this case that would be the second call to asyncCachedexpensivefunction. We didn't write to cache[input] yet - it's empty, so the second call will not use the cache.

The root of the problem is the await between running the expensiveFunction and writing to cache.

Slightly less intuitive solution that *does* work

To avoid the problem described above, we need to write to the cache *immediately*. Here's where Promises come in handy. We can avoid the issue described above by writing to the cache not *after* the expensive function has been called, but *before*. But what do we write to the cache when we don't yet have the result calculated?

The trick is to write a Promise to the cache. We can store a yet-unresolved Promise in a variable and write *that* to the cache.

proper async cache

const cache: Record<string, Promise<string>> = {}

async function asyncCachedExpensiveFunction(input:string) : Promise<string> { 
   if(cache[input]){
       return cache[input]
   }
   const resultPromise = expensiveFunction(input); // no `await here!`
   cache[input] = resultPromise;
   return resultPromise;
}

This way the first function call to execute stores the promise, and the second one to execute (remember, JS can *really* only one function at a time) already sees that there's a promise stored in the cache and returns that promise instead of creating a new one.

The clue intution to gather here is that a Promise can be moved around as any regular variable, and it can be returned, and awaited multiple times, with many time breaks in between.

Written by kuba-orlik on Aug 24 2023, 20:13.
CEO
Projects
None
Subscribers
None

Event Timeline