GraphQL Performance Optimization

Measuring Performance

For each optimization attempt, we compared the performance of:

the `main` branch of the platform without the new optimization
the feature branch with the optimization implemented
the existing REST-ish API as a baseline

We used multiple tools to measure performance and help decide what changes to try.

Load Testing Tools

We used an internal tool for blasting thousands of requests at our API servers in a dedicated testing environment. Following the test, the tool reported various response time measurements, including average, median, and percentiles. This gave us a high-level view of how our changes performed over many requests.

Application Performance Monitoring Tools

Our APM tool let us dig into individual requests and visualize a single GraphQL query as a series of database queries, async operations, and other steps on a timeline. This gave us insights into which parts of the resolvers took the most time and where bottlenecks were.

An example of an Elastic APM trace

ElasticSearch's Reporting

Our primary database was ElasticSearch, which includes timing information with each response in the `took` field.

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    ...
  },
  "hits" : {
   ...
  }
}

`took` shows how long queries take within the database itself (typically 7–20ms). Combined with our APM tools, we could determine whether a particular DB query was slow, or if there was a problem in our infrastructure or code based on how long a query took round trip (usually 30–60ms in our case).

Ideas that didn't work

Not every optimization attempt yielded significant improvements. It's important to document these to avoid repeating unsuccessful approaches.

JS Runtime Replacements

We experimented with replacing our Node.js runtime with Bun, which yielded less than a 5% performance improvement and introduced some small incompatibilities with older NPM packages. Similarly, upgrading to Node.js v22 also resulted in less than 5% improvement. We skipped testing Deno based on the minimal gains from other runtime changes.

Web Server Replacements

We tried replacing Express with uWebSockets, but saw no meaningful improvement. This confirmed that Express wasn't the bottleneck in our system.

Other Minor Optimizations

Several smaller optimizations showed no or minimal improvement in response times:

enabling Automatic Persisted Queries (APQ) surprisingly resulted in either the same or slightly worse performance
various micro-optimizations – reducing promises, minimizing object/array creation, and streamlining loops showed no noticeable improvement
being more selective about which fields we retrieved from the database did reduce the amount of JSON being processed and reduced response times, but the gains weren't worth the added complexity in most cases. Our DB field names did not map 1:1 with GraphQL field names, and sometimes returning 1 field from the GraphQL field required fetching multiple seemingly-unrelated DB fields which made this challenging to maintain

Rejected Ideas

Some promising ideas were turned down before implementing them.

Rewriting in Go

A team member prototyped a single GraphQL resolver in Go and achieved 40% faster response times with significantly reduced CPU and memory usage than the Node.js. However, we didn't have a team ready to start writing Go code, so this approach was shelved.

Replacing Apollo GraphQL with Mercurius GraphQL

While Mercurius offered potential performance benefits, it lacked features that were built into Apollo. The team was reluctant to rewrite the code that depended on these features, so we decided against this change.

Replacing GraphQL Shield

Turning off GraphQL Shield improved performance, but we could not ship this change. We considered replacing GraphQL Shield with a custom solution but determined it wasn't worth the effort early in the optimization process and would likely be too error-prone.

Long-lived Caches for Third-Party Information

A large part of the GraphQL API's purpose is fetching information gathered from third party sources which are heavily regulated. Organizations providing these information sources can take legal action if we present cached information that isn't up-to-date with their compliance rules. The team was concerned that invalidating caches on rule changes would make the system too complicated, so we left it alone.

Ideas that worked

Fortunately, several optimization approaches did yield significant improvements.

Swapping Apollo Server for Yoga

Replacing Apollo Server with Yoga resulted in a 15% (13ms) improvement in benchmarks and a 2-14% (7-40ms) gain in production. This change also gave us access to many useful plugins in the Yoga ecosystem.

Replacing Apollo Gateway with Apollo Router

Switching from Apollo Gateway to Apollo Router provided a 6% (5ms) improvement in benchmarks and a 9% (27ms) gain in production. This change also decreased memory usage, allowing us to reduce our server count and save on costs. There was less code to maintain, although writing plugins in the Rhai scripting language proved challenging. It's worth noting that some features are behind paywalls since this is a commercial product. We’re ok with this trade-off for now and will go without the paid features.

Increasing the # of vCPUs/threads

A remnant of the project's early experimental days was that the containers for the GraphQL servers only had 1 thread or vCPU. Adding a second thread allows Node.js I/O, garbage collection, and container-level processing to run in a separate thread from JavaScript execution. With only one thread, JavaScript is blocked by all other CPU duties. This change resulted in a huge 27% performance boost on one benchmarked query.

GraphQL Resolver Improvements

We made several improvements to our GraphQL resolvers:

removing redundant database queries within resolvers yielded a 20% improvement in one query
parallelizing database and other requests that were previously executed serially reduced wait times
in some cases, extracting the GraphQL fields requested by our clients and using that to limit the fields selected by the DB yielded significant performance improvements. Reducing this overfetching reduced the amount of JSON being serialized, deserialized, and garbage collected, resulting in a 40% performance boost on one particularly overfetch-y query

Long-lived Caches for First-Party Data

While regulations prevented us from caching 3rd party data (as described above), we could cache reports we generated ourselves from 3rd party data. In some benchmarks, retrieving these reports from Redis by key was 10 times faster than querying ElasticSearch. We still need to benchmark this approach to see how it improves overall GraphQL query performance, and we need to determine if the cache hit rate will be high enough to justify the complexity and storage.

Key Learnings from this Experience

Our experiments taught us several valuable lessons:

Generalized Benchmarks can be Misleading

Benchmarks for runtimes like Bun, Deno, and web servers like Fastify or uWebSockets tout how little overhead the runtime or library adds to a request compared to alternatives. However, in our case, the overhead from web servers and runtime environments represents only a tiny fraction of the time our GrapQL API needs to fulfill a request. These benchmarks might be more relevant for APIs handling much higher request volumes or much simpler queries.

Real-World Workloads Are Essential for Accurate Benchmarking

We often test changes by repeatedly sending the same HTTP requests to our servers thousands of times because it is easy to set up, but it can lead to misleading results.

it makes caching appear more effective than it actually is due to artificially high hit rates
databases behave differently under real-world loads with a variety of queries rather than repeated queries
a wider variety of requests & responses might trigger Node.js garbage collection differently

The more improvements we ship to the production, the more we learn about weaknesses in our testing.

Layered Architecture Facilitates Experimentation

Testing different optimization strategies is easy due to our layered architecture:

the web server is separate from the GraphQL server
the GraphQL server is separate from the GraphQL resolvers
the GraphQL resolvers are separate from the business logic
the business logic is separate from the data manipulation layer
the data manipulation layer is separate from the database query layer

This separation allows us to experiment with changes at different levels without disrupting the entire system.

Looking Forward

Our experience optimizing this GraphQL API has shown that while some common optimization strategies might not yield the expected results, a methodical approach to testing and measuring performance can lead to substantial improvements. By focusing on the areas that truly impact performance and being willing to challenge assumptions, we've significantly improved our API's response times and resource utilization.

Rangle