Dec. 9, 2020
Kyle Kaniecki
Early into my refactor of Skirmish's tournament backend, I realized that I needed to make fetching and updating the bracket as quick and efficient as I could on the frontend. I ended up doing a lot of research into how I could speed up my queries to the database, but I also was reading a lot about caching and using in-memory storage to store and serve data instead of querying the database every time. However, there was one problem that I never really liked about Django's built-in caching system -- it lacked support for clearing a view's cache and forcing a recalculation of the data.
So, to give a clear example in tournament terminology, let's say that a player had reported the results for their match, and the match advanced the winner and the loser to their next respective matches. After this, if the bracket were to be requested again, django's built in @cache_page decorator would create a cache key for the request, look for it in our cache before performing calculations, grab the old data and serve it to the user. To show specifically why this is the case, let's look at @cache_page
's key creation implementation:
# https://github.com/django/django/blob/302caa40e4caab7d95ef7d0a88a90f935039ab09/django/utils/cache.py#L326
def _generate_cache_key(request, method, headerlist, key_prefix):
"""Return a cache key from the headers given in the header list."""
ctx = hashlib.md5()
for header in headerlist:
value = request.META.get(header)
if value is not None:
ctx.update(value.encode())
url = hashlib.md5(request.build_absolute_uri().encode('ascii'))
cache_key = 'views.decorators.cache.cache_page.%s.%s.%s.%s' % (
key_prefix, method, url.hexdigest(), ctx.hexdigest()
)
return _i18n_cache_key_suffix(request, cache_key)
Ok... there is a lot going on here, so let's boil it down into some bullet points
Ok, so what are the benefits of this approach?
However, what are some drawbacks?
Great, so now that we have evaluated Django's version of generating the cache key for a view response, I can determine that this will not work for my use case. I want to cache data about a tournament view (which won't change with request headers), and be able to clear a whole category of response caches if any of the data changes. Using our example from before, if a tournament game is reported and advanced, I want to make sure all of the bracket view responses that contain that tournament game are cleared from the cache and values are recalculated when the next person requests information about the bracket. Since tournament data changes tended to happen in waves as rounds were completed, say 10-15 minutes at a time, the data would largely remain static until it changes for everyone at the same time. With django's built-in cache, I wouldn't have a great way to clear all of that data all at once (or even really know which keys are which) due to key obfuscation from hashing.
All of the proposals I thought still used django's view decorator paradigm to cache a responses content. So when we are talking about the code in the proposals, all of them would be happening inside the get_cache_key funciton that our cache decorator uses to store the values in redis, to be clear. In the decorator, we would only cache the response if the view was successfuly. I used this blog article to create a basic view decorator for caching response content, I suggest you give it a look over as well.
Ok, on to the actual proposals. I thought of a few cache key formats that would let me group content together by prefix.
The first proposal centered around storing the entire URI in plain text in the cache key. This would allow us to use redis' key prefixing to clear a group of cache entries all at once. It would look something like this:
def get_cache_key(request, key_prefix='response_cache', method='get'):
return '%s.%s.%s' % (key_prefix, request.get_absolute_uri(), method)
During the second proposal, I really wanted to see if I could get the length of the cache keys down a little bit. So I decided that I could do that by only applying the request path in plain text in the key, and hashing the complete URI on the end. This allows us to have a fixed length hash on the end that is still unique to the entire request URI.
def get_cache_key(request, key_prefix='response_cache', method='get'):
url_hash = hashlib.md5(request.build_absolute_uri().encode('ascii')).hexdigest()
return '%s.%s.%s.%s' % (key_prefix, request.path, method, url_hash)
Proposal 3 was a little bit of the previous proposal, with the inclusion of django's header system in order to incorporate a hash for varying request headers This allowed me to save key space on views that didn't change based on authentication, but get unique hash values for views that do change based on request headers, such as the Authentication header. Also, proposal 3 moves away from having our cache key building function being tied to request objects. Instead, I created a more generic cache key function that builds our keys based on a string input instead. This way, functions that are not views can perform crud operations in the cache from anywhere the cache is available, like inside a Celery worker.
CACHE_SEPERATOR = "."
def get_cache_key(
key: str, prefix: str = "", version: int = 1, seperator: str = CACHE_SEPERATOR
) -> str:
"""
Builds a Skirmish cache key from the given values
Args:
key (str): The raw key to be formatted and Skirmish-ified
prefix (:obj:`str`, optional): A key prefix that will be prepended to the cache key
version (:obj:`int`, optional): The version of cache building that this is for
serperator (:obj:`str`, optional): A seperator for the key, defaults to ':'
Returns:
str: A new formatted key that will be inserted into the cache
"""
return seperator.join(filter(None, [prefix, str(version), key]))
def get_cache_key_for_path(
path: str,
query_string: Optional[str] = None,
additional_context: Optional[List[str]] = None,
**kwargs,
) -> str:
"""
Builds a cache key for a url path. This is used in Skirmish's endpoint caching so that
we can also _get_ keys to clear the cache when things change in the endpoint
Note:
The url path should not include query strings. Use the `query_string` kwarg to include query_params
Args:
path (str): The url path to get a key for
query_string (str): a urlencoded querystring that will be hashed
and appended to the cache key
additional_context (:obj:`list`, optional): Additional context that will be
hashed and appended to the cache key
"""
# if we actually have a query string, hash it
if query_string:
query_string = hashlib.md5(query_string.encode("ascii")).hexdigest()
# If we have additional context, let's hash it
additional_hash = None
if additional_context:
ctx = hashlib.md5()
for value in additional_context:
ctx.update(value.encode("ascii"))
additional_hash = ctx.hexdigest()
# https://stackoverflow.com/questions/26566663/does-filter-preserve-list-ordering
# Just to make sure
return build_cache_key(
CACHE_SEPERATOR.join(filter(None, [path, query_string, additional_hash])),
**kwargs,
)",
def cache_response(seconds: int, vary_headers: Optional[List[str]] = None):
def decorator(func: partial):
"""
This is a partial since we *should* be using django's method_decorator in CBV
"""
@wraps(func)
def inner(
request: HttpRequest, *args, **kwargs,
):
# Nonlocal scope access
# https://www.python.org/dev/peps/pep-3104/
nonlocal vary_headers
if vary_headers is not None:
vary_headers = list(
filter(
None,
[request.headers.get(header, None) for header in vary_headers],
)
)
# Get the cache key
cache_key = get_cache_key_for_path(
request.path,
query_string=request.GET.urlencode(),
additional_context=vary_headers,
)
# Get the cached value, if there is one
content = cache.get(cache_key)
# If we have a cached value, return it
if content is not None:
return Response(content)
# Get the normal return value from the view by running it
# This will run the view completely normally with no caching
response = func(request, *args, **kwargs)
# If the response is a valid one, put it in the cache
if isinstance(response, Response) and response.status_code in (200, 304):
cache.set(cache_key, response.data, seconds)
# Return the response
return response
return inner
return decorator
And then we can use the cache key functions in functions like this in celery tasks:
for phase_id in tournament.phases.values_list("id", flat=True):
clear_cache_by_key(
prefix=get_cache_key_for_path(
reverse("TournamentPhaseDetail", kwargs={"pk": phase_id})
)
)
At the end of the day, I ended up choosing proposal #3 for my needs with Skirmish. While varying headers does add quite a bit of complexity to the cache decorator and more keyword arguments to our cache key function, I think that it really adds a lot of extensionality to the view cache decorator. It will allow us to cache endpoints like "v1/account/wallet/", which would need to be unique for every user. Also, this proposal uncouples our cache keys from request objects and instead couples them to specific endpoints on our site. This allowed me to clear view cache for endpoints in celery tasks that would generate tournament brackets or tasks that would advance tournament games.
However, developers do need to be careful with this. With great power comes great responsibility. If we are blindly using caching to improve performance on all endpoints, we are probably using a band-aid on a bullet wound. Instead, small endpoints should probably use efficient database querying to serve content so we are not storing large amounts of data redundantly in the redis store. Just because our cache keys change based on very specific request parameters doesn't mean we should use the cache on all endpoints.
If you liked this article, or would like to suggest ways to improve it, drop a comment below! I'd love to hear other options and solutions to grow and learn.