Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.

Two?

Yeah. Here's the list:

Migrating a Rust application from Centos 7 to Ubuntu 22.04 significantly increased memory consumption

Awhile back I was testing a Rust application I had written that principally consumed data from a high-throughput Kafka topic (~300 TiB/day uncompressed) in production and I decided to try using Ubuntu as the base for the Docker containers I was deploying to Kubernetes. My pods were OOM-churning over and over and it was pretty obvious this was the problem. I deployed them with Centos 7 and then there were no issues at all. Decided to leave the issue alone since I was under the gun for a deadline and needed to move on to things actually blocking rolling the service out.

While I was on parental leave, my coworkers were forced to migrate off Centos 7, fair enough. They did have to significantly increase the memory requests for the application pods though. Previously every single pod was inside a ~300 MiB memory utilization banding ranging from 2.6 to 2.9 GiB. On Ubuntu they were using 3.3 to 5.3 GiB depending on throughput and partition balance. At best the max - min was 900 MiB but often it was higher.

This had bothered me for quite awhile so I decided to lock in and figure it out once and for all.

Facts gathered:

In the course of kicking these facts around with a coworker, a realization struck me. tikv_jemallocator probably isn't linking malloc and free by default and is instead using a prefix. Investigated the crate documentation and yep, it prefixes by default because a number of platforms don't tolerate that well.

So to test this hypothesis, I disabled jemalloc and deployed a version that used libc malloc only. That used less RAM but it still wasn't what I had before or what I wanted. Average across all instances was ~3.2 GiB instead of ~3.6 GiB but still not what it was before (~2.7 GiB). Still higher variance too.

So then I used the crate feature in tikv_jemallocator to disable prefixing and boom, back to the way it was before. Digging into the 9 years of changes between Centos 7's libc malloc and Ubuntu 22.04's libc malloc, it seems like they've been trying to make libc malloc eat into tcmalloc and jemalloc's wheelhouse and that introduced additional overhead and volatility to my use-case. That and running two mallocs side-by-side unnecessarily.

OK, done and dusted. Separately, I probably should look into setting libzstd tuning parameters to calibrate for the level of throughput we're dealing with. There's an even bigger application than mine that could benefit from this anyhow.

Some context for work I did to improve Kafka Consumer throughput with librdkafka:

Rust Leptos app had steadily increasing memory consumption on the backend and would OOM every so often

Yeah this one is really simple: don't use leptos-query unless someone takes the project on and fixes it up. There was no way I could find to disable it in ssr mode and it wasn't freeing/deallocating anything properly. Heaptrack seemed to think it might've actually been leaking but heaptrack has given me false positives before so who knows.

GitHub issue for context: https://github.com/gaucho-labs/leptos-query/issues/36

This impacted ShotCreator which I am working on with a few others.