Recently one of our Redis production servers started a slow decent in available memory. It was highly likely to be a code change from a previously released version but searching through the code would only lead to guessing. I needed to clean out whatever the keys were that were taking up the majority of the space quickly before we concentrated on a more permanent fix.
It’s common practice to namespace keys in Redis with a colon (for example,
collection:person:1234) so what I really need to do is reduce all the keys down to groups and count the number of keys and the size of each namespace.
A tool that:
- Has to be non-blocking. This Redis server is running in production and has 110 million keys. Blocking commands like
KEYSwould not possible.
- Should be able to scan a subset of the keys. This is so that we don’t need to go through the entire dataset to get an idea of what was going on.
- Grouping down the keys into their namespace is pretty easy (for example,
collection:person:1234would be reduced to
collection:person:*), but the number of keys in a namespace doesn’t reflect the size of the memory usage (which is what we really care about) so I needed some way to estimatethe amount of memory for a namespace within that sample of keys taken.
Numbers 1 and 2 were pretty easy to solve by using
SCAN (greatly sped up with a large
COUNT option) but number 3 was a bit more tricky because Redis v3.2 doesn’t provide a reliable way to say “how much memory would this key (string, sets, etc) be?”
I decided to use
DUMP which produces a binary JSON-like string for a key and measure the length of that. This is crude and does not directly relate to memory usage, but it does a good enough job of highlighting keys that contains sets with thousands of elements.
Unfortunately DUMPis very slow (compared to the other scanning operations) and some of these keys are very large (translates to network throughput) so it’s not practical (or really needed) to measure the the length of all of the keys in each namespace. Instead I must be able to configure how many items for a given namespace will be tested and that average is applied to the rest of the items.
This worked really well as a namespace that contained thousands of similar sized items only needs say 10 items to be measured to get an idea of the complete namespace.
Ultimately I had to create a CLI tool. The command, redis-usagehas a bunch of flexible options:
Usage of redis-usage:
SCAN COUNT option. (default 10)
Redis server database.
Use DUMP to get key sizes (much slower). If this is zero then DUMP will not be used, otherwise it will take N sizes for each prefix to calculate an average bytes for that key prefix. If you want to measure the sizes for all keys set this to a very large number.
Redis server host. (default "localhost")
Limit the number of keys scanned.
SCAN MATCH option.
Redis server port number. (default 6379)
You may specify custom prefixes (comma-separated).
Seperator for grouping. (default ":")
Number of milliseconds to wait between reading keys.
Milliseconds for timeout (default 3000)
Only show the top number of prefixes.
It even has a pretty a progress bar (thanks https://github.com/cheggaaa/pb!). It looks something like this:
$ ./redis-usage -limit 1000 -dump-limit 3
1002 / 1002 [=====================================================] 100.00% 15sorderkey:* -> 930 keys, ~14.5 KB estimated size
live___KountaCacheDependency_mysql:* -> 68 keys, ~3.79 KB estimated size
sqsworker:job:* -> 4 keys, ~52 bytes estimated size
With a larger sample size were able to find the keys that were causing the problem and add the appropriate fix to the application. Yay!
Originally published at http://elliot.land on June 25, 2018.