Efficient GraphQL queries in Ruby on Rails & Postgres
In this article, we will discuss what N+1 queries are, how they are easily produced in GraphQL, and how to solve them using the graphql-batch gem along with a few custom batch loaders.
What are N+1 queries?
N+1 queries can occur when you have one-to-many relationships in your models. Each Event
belongs to a Category
. Let's say that you find the last five events and you want to get the category name for each of them.
Event.last(5).each { |event| puts event.category.name }
Seems simple enough! We unfortunately just produced six queries. The first query to find the events, and another query to find each category's name. This is an easy problem to solve in Rails by using eager-loading:
Event.includes(:category).last(5).each { |event| puts event.category.name }
By using the includes
method we've been able to knock our queries down from six to two: The first to find the events, and the second to find the categories for those events.
N+1 queries in GraphQL
The query below will produce N+1 SQL queries as it finds the category for each event:
{
events {
id
name
category {
id
name
}
}
}
Optimizing GraphQL queries
Yes, we could solve the N+1 query in the previous example by eager-loading the category relationship, but if the user didn't actually want the category, why load it? We don't know what the user will ask for. There just so happens to be a better way, by lazy-loading data only as its needed using the graphql-batch gem.
Batch loading single records
The simplest case for batch loading data is the example of each event belonging to a category. Inside of our EventType
class, there is a field called category which allows the user to access the category of an event
class Types::EventType < Types::BaseObject
field :category, Types::CategoryType, null: false
def category
# avoid `object.category`
RecordLoader.for(Category).load(object.category_id)
end
end
By using the RecordLoader
class to load the category, we actually avoid loading the category right away, and instead load all of the required categories with a single query. The query it ends up producing may end up looking like:
SELECT "categories".* FROM "categories" WHERE "categories"."id" IN ($1, $2, $3, $4, $5)
Looking at the RecordLoader
class we can see how it works. The perform method will receive all of the ids for a single model (Category
in this case), load the records in a single SQL query, and then call the fulfill method for each of them. The fulfill method resolves the promise, which is basically like putting a face to a name... you gave me an ID, and I've fulfilled my promise to provide you with the corresponding record.
class RecordLoader < GraphQL::Batch::Loader
def initialize(model)
@model = model
end
def perform(ids)
# Find all ids for this model and fulfill their promises
@model.where(id: ids).each { |record| fulfill(record.id, record) }
# Handle cases where a record was not found and fulfill the value as nil
ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
end
end
We can write a test for this class to ensure that it finds the records correctly, keeping in mind that in order for the lazy/promise based code to function correctly it needs to be wrapped inside something called an executor.
describe RecordLoader do
it 'loads' do
event = create(:event)
result = GraphQL::Batch.batch do
RecordLoader.for(Event).load(event.id)
end
expect(result).to eq(event)
end
end
Batch loading many records
We've covered the case where we are batch loading a single record at a time, but how do we handle the reverse scenario? We are displaying categories along with the first five events for each category, which would also produce an N+1 query, so let's see how we can solve it using a batch loader. The query we're discussing would look something like this:
{
categories {
id
name
events(first: 5) {
id
name
}
}
}
I have created a custom loader called ForeignKeyLoader
for this purpose. It will load the events using the foreign key category_id
. I also added the ability to pass a lambda to merge in additional scopes into the query that will be run.
class Types::CategoryType < Types::BaseObject
field :events, [Types::EventType], null: false do
argument :first, Int, required: false, default_value: 5
end
def events(first:)
ForeignKeyLoader.for(Event, :category_id, merge: -> { order(id: :asc) }).
load(object.id).then do |records|
records.first(first)
end
end
end
The query that gets produced looks something like:
SELECT "events".*
FROM "events"
WHERE "events"."category_id" IN ($1, $2, $3, $4, $5)
ORDER BY "events"."id" ASC
Notice in this case that we call the then method to execute some code after the promise has been resolved. Here we see the first issue with this method... we only wanted five events for each category, but our query will load ALL events for each category, and then, using the first method on the resulting Array, narrow it down to only the first five events. If there are thousands of events, we could run into some serious issues.
class ForeignKeyLoader < GraphQL::Batch::Loader
attr_reader :model, :foreign_key, :merge
def self.loader_key_for(*group_args)
# avoiding including the `merge` lambda in loader key
# each lambda is unique which defeats the purpose of
# grouping queries together
[self].concat(group_args.slice(0,2))
end
def initialize(model, foreign_key, merge: nil)
@model = model
@foreign_key = foreign_key
@merge = merge
end
def perform(foreign_ids)
# find all the records
scope = model.where(foreign_key => foreign_ids)
scope = scope.merge(merge) if merge.present?
records = scope.to_a
foreign_ids.each do |foreign_id|
# find the records required to fulfill each promise
matching_records = records.select do |r|
foreign_id == r.send(foreign_key)
end
fulfill(foreign_id, matching_records)
end
end
end
Batch loading many records more efficiently
It turns out that there is a way to perform a query that says "find me the first N records for each X" (find me the first 5 records for each category), and that involves using Postgres Window Functions. While researching this concept, this article about window functions was useful along with this article about bringing window functions into Rails.
The following query produces the data that we want... we just need to figure out how to write a batch loader that generates the same result.
SELECT "events".*
FROM (
SELECT
*,
row_number() OVER (
PARTITION BY category_id ORDER BY start_time desc
) as rank
FROM "events"
WHERE "events"."category_id" IN (1, 2, 3, 4, 5)
) as events
WHERE rank <= 5
For this we'll create a batch loader called WindowKeyLoader
which is used like:
class Types::CategoryType < Types::BaseObject
field :events, [Types::EventType], null: false do
argument :first, Int, required: false, default_value: 5
end
def events(first:)
WindowKeyLoader.for(Event, :category_id,
limit: first,
order_col: :start_time,
order_dir: :desc
).load(object.id)
end
end
You can see the difference already. I am no longer required to slice the first N array elements in the then
block of the resolved promise. The actual batch loader class looks like:
class WindowKeyLoader < GraphQL::Batch::Loader
attr_reader :model, :foreign_key, :limit, :order_col, :order_dir
def initialize(model, foreign_key, limit:, order_col: :id, order_dir: :asc)
@model = model
@foreign_key = foreign_key
@limit = limit
@order_col = order_col
@order_dir = order_dir
end
def perform(foreign_ids)
# build the sub-query, limiting results by foreign key at this point
# we don't want to execute this query but get its SQL to be used later
ranked_from = model.
select("*,
row_number() OVER (
PARTITION BY #{foreign_key} ORDER BY #{order_col} #{order_dir}
) as rank").
where(foreign_key => foreign_ids).
to_sql
# use the sub-query from above to query records which have a rank
# value less than or equal to our limit
records = model.
from("(#{ranked_from}) as #{model.table_name}").
where("rank <= #{limit}").
to_a
# match records and fulfill promises
foreign_ids.each do |foreign_id|
matching_records = records.select do |r|
foreign_id == r.send(foreign_key)
end
fulfill(foreign_id, matching_records)
end
end
end
We're able to test the WindowKeyLoader
by creating three events for a category but only asking for the first two of them:
describe WindowKeyLoader do
it 'loads' do
category = create(:category)
events = (1..3).to_a.map do |n|
create(:event, name: "Event #{n}", category: category)
end
result = GraphQL::Batch.batch do
WindowKeyLoader.for(
Event,
:category_id,
limit: 2, order_col: :id, order_dir: :asc
).load(category.id)
end
expect(result).to eq(events.first(2))
end
end
Batch loading active storage attachments
You may run into situations where you're loading polymorphic data, or other types of relationships which don't exactly fit into the mold of your standard has-many or belongs-to relationships. One case is with ActiveStorage. In the code below we'll load an image URL for an event:
class Types::EventType < Types::BaseObject
field :image, String, null: true
def image
# produces 2N + 1 queries... yikes!
# url_for(object.image.variant({ quality: 75 }))
AttachmentLoader.for(:Event, :image).load(object.id).then do |image|
url_for(image.variant({ quality: 75 }))
end
end
end
This data is stored using a polymorphic relationship that loads an ActiveStorage::Attachment
record, which then needs to load an ActiveStorage::Blob
record in order to produce the image url. It ends up producing a 2N + 1 query! Our AttachmentLoader is able to completely optimize this field by cutting it down to just two queries to load as many images as you'd like.
class AttachmentLoader < GraphQL::Batch::Loader
attr_reader :record_type, :attachment_name
def initialize(record_type, attachment_name)
@record_type = record_type
@attachment_name = attachment_name
end
def perform(record_ids)
# find records and fulfill promises
ActiveStorage::Attachment.
includes(:blob).
where(record_type: record_type, record_id: record_ids, name: attachment_name).
each { |record| fulfill(record.record_id, record) }
# fulfill unfound records
record_ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
end
end
In this case we are taking advantage of eager-loading, because for each attachment we will need its corresponding blob record.
Conclusion
GraphQL can be as efficient as REST, but requires approaching optimizations from a different angle. Instead of upfront optimizations, we lazy-load data only when required, loading it in batches to avoid excess trips to the database. In this article, we covered techniques to load single records, multiple records, and records with different types of relationships, as is the case with Active Storage which has a polymorphic relationship.