High memory usage that doesn't seem to slow down

Hello! We’re starting to hit an issue where memory slowly rises, then once high enough, user connections start to timeout. Some details:

  • Directus Version: Docker Image directus/directus:11
  • Using an S3-compatible bucket (Railway, alongside the Directus instance)
  • PM2 is configured and stops instances as shown in our config (below) but after a while this mechanism seems to stop working and lets instance memory runwaway.
  • Tested by running the image locally and running a script that requests a lot of files (about 512M each), and also saw this behavior

Some relevant config options we have set throughout the course of our troubleshooting. Is there something we’re not doing correctly with our setup?

```
PM2_INSTANCES=“1” # we had >1 before, and still hitting the issue, so we set to 1 to see if the restart mechanism was even working, which it seems it isn’t.
PM2_MAX_MEMORY_RESTART=“512M”
SYNCHRONIZATION_STORE=“redis”
REDIS_ENABLED=“true”
STORAGE_S3_CONNECTION_TIMEOUT=“30000”
ASSETS_CACHE_TTL=“1h”
CACHE_ENABLED=“true”
CACHE_TTL=“5m”
CACHE_VALUE_MAX_SIZE=“1000000”
CACHE_STORE=“redis”
PRESSURE_LIMITER_ENABLED=“true”
DB_CLIENT=“pg”
CACHE_AUTO_PURGE=“true”
DB_POOL__MIN=“0”
```

Screenshot of the memory graph. These sudden drops are restarts once we notice it’s affecting user connections

Heya!

A few things that might help here:

While Directus streams files rather than buffering them in memory, the pressure limiter’s memory-based limits are actually disabled by default. The event loop delay/utilization checks are active, but those won’t catch gradual memory creep. Try enabling the memory limits explicitly:

PRESSURE_LIMITER_MAX_MEMORY_RSS=536870912
PRESSURE_LIMITER_MAX_MEMORY_HEAP_USED=268435456
PRESSURE_LIMITER_RETRY_AFTER=30

(That’s 512MB RSS / 256MB heap – adjust to taste based on your container limits.)

This way Directus will start returning 503s before memory gets out of hand, rather than relying solely on PM2 to catch it after the fact.

Also worth checking: are those 512MB files being served as-is, or are they going through any image transformations? Transforms load the image into memory via Sharp, and a few concurrent large-image transforms can eat through memory fast. If that’s a factor, you could lower ASSETS_TRANSFORM_MAX_CONCURRENT (default is 25) and tighten ASSETS_TRANSFORM_IMAGE_MAX_DIMENSION.

Re PM2: its memory check runs on an interval, so it can miss rapid spikes. With PM2_INSTANCES="1" there’s also no graceful handoff during restarts. Bumping to 2 instances in cluster mode means one can restart while the other keeps serving.