Posted on Oct 27, 2019 [ 10 min read ]

Efficient GraphQL queries in Ruby on Rails & Postgres

In this article, we will discuss what N+1 queries are, how they are easily produced in GraphQL, and how to solve them using the graphql-batch gem along with a few custom batch loaders.

What are N+1 queries?

N+1 queries can occur when you have one-to-many relationships in your models. Each Event belongs to a Category. Let's say that you find the last five events and you want to get the category name for each of them.

Event.last(5).each { |event| puts event.category.name }

Seems simple enough! We unfortunately just produced six queries. The first query to find the events, and another query to find each category's name. This is an easy problem to solve in Rails by using eager-loading:

Event.includes(:category).last(5).each { |event| puts event.category.name }

By using the includes method we've been able to knock our queries down from six to two: The first to find the events, and the second to find the categories for those events.

N+1 queries in GraphQL

The query below will produce N+1 SQL queries as it finds the category for each event:

{
  events {
    id
    name
    category {
      id
      name
    }
  }
}

Optimizing GraphQL queries

Yes, we could solve the N+1 query in the previous example by eager-loading the category relationship, but if the user didn't actually want the category, why load it? We don't know what the user will ask for. There just so happens to be a better way, by lazy-loading data only as its needed using the graphql-batch gem.

Batch loading single records

The simplest case for batch loading data is the example of each event belonging to a category. Inside of our EventType class, there is a field called category which allows the user to access the category of an event

class Types::EventType < Types::BaseObject
  field :category, Types::CategoryType, null: false

  def category
    # avoid `object.category`
    RecordLoader.for(Category).load(object.category_id)
  end
end

By using the RecordLoader class to load the category, we actually avoid loading the category right away, and instead load all of the required categories with a single query. The query it ends up producing may end up looking like:

SELECT "categories".* FROM "categories" WHERE "categories"."id" IN ($1, $2, $3, $4, $5)

Looking at the RecordLoader class we can see how it works. The perform method will receive all of the ids for a single model (Category in this case), load the records in a single SQL query, and then call the fulfill method for each of them. The fulfill method resolves the promise, which is basically like putting a face to a name... you gave me an ID, and I've fulfilled my promise to provide you with the corresponding record.

class RecordLoader < GraphQL::Batch::Loader
  def initialize(model)
    @model = model
  end

  def perform(ids)
    # Find all ids for this model and fulfill their promises
    @model.where(id: ids).each { |record| fulfill(record.id, record) }
    # Handle cases where a record was not found and fulfill the value as nil
    ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

We can write a test for this class to ensure that it finds the records correctly, keeping in mind that in order for the lazy/promise based code to function correctly it needs to be wrapped inside something called an executor.

describe RecordLoader do
  it 'loads' do
    event = create(:event)
    result = GraphQL::Batch.batch do
      RecordLoader.for(Event).load(event.id)
    end
    expect(result).to eq(event)
  end
end

Batch loading many records

We've covered the case where we are batch loading a single record at a time, but how do we handle the reverse scenario? We are displaying categories along with the first five events for each category, which would also produce an N+1 query, so let's see how we can solve it using a batch loader. The query we're discussing would look something like this:

{
  categories {
    id
    name
    events(first: 5) {
      id
      name
    }
  }
}

I have created a custom loader called ForeignKeyLoader for this purpose. It will load the events using the foreign key category_id. I also added the ability to pass a lambda to merge in additional scopes into the query that will be run.

class Types::CategoryType < Types::BaseObject
  field :events, [Types::EventType], null: false do
    argument :first, Int, required: false, default_value: 5
  end

  def events(first:)
    ForeignKeyLoader.for(Event, :category_id, merge: -> { order(id: :asc) }).
      load(object.id).then do |records|
        records.first(first)
      end
  end
end

The query that gets produced looks something like:

SELECT "events".*
FROM "events"
WHERE "events"."category_id" IN ($1, $2, $3, $4, $5)
ORDER BY "events"."id" ASC

Notice in this case that we call the then method to execute some code after the promise has been resolved. Here we see the first issue with this method... we only wanted five events for each category, but our query will load ALL events for each category, and then, using the first method on the resulting Array, narrow it down to only the first five events. If there are thousands of events, we could run into some serious issues.

class ForeignKeyLoader < GraphQL::Batch::Loader
  attr_reader :model, :foreign_key, :merge

  def self.loader_key_for(*group_args)
    # avoiding including the `merge` lambda in loader key
    # each lambda is unique which defeats the purpose of
    # grouping queries together
    [self].concat(group_args.slice(0,2))
  end

  def initialize(model, foreign_key, merge: nil)
    @model = model
    @foreign_key = foreign_key
    @merge = merge
  end

  def perform(foreign_ids)
    # find all the records
    scope = model.where(foreign_key => foreign_ids)
    scope = scope.merge(merge) if merge.present?
    records = scope.to_a

    foreign_ids.each do |foreign_id|
      # find the records required to fulfill each promise
      matching_records = records.select do |r|
        foreign_id == r.send(foreign_key)
      end
      fulfill(foreign_id, matching_records)
    end
  end
end

Batch loading many records more efficiently

It turns out that there is a way to perform a query that says "find me the first N records for each X" (find me the first 5 records for each category), and that involves using Postgres Window Functions. While researching this concept, this article about window functions was useful along with this article about bringing window functions into Rails.

The following query produces the data that we want... we just need to figure out how to write a batch loader that generates the same result.

SELECT "events".*
FROM (
  SELECT
    *,
    row_number() OVER (
      PARTITION BY category_id ORDER BY start_time desc
    ) as rank
  FROM "events"
  WHERE "events"."category_id" IN (1, 2, 3, 4, 5)
) as events
WHERE rank <= 5

For this we'll create a batch loader called WindowKeyLoader which is used like:

class Types::CategoryType < Types::BaseObject
  field :events, [Types::EventType], null: false do
    argument :first, Int, required: false, default_value: 5
  end

  def events(first:)
    WindowKeyLoader.for(Event, :category_id,
      limit: first,
      order_col: :start_time,
      order_dir: :desc
    ).load(object.id)
  end
end

You can see the difference already. I am no longer required to slice the first N array elements in the then block of the resolved promise. The actual batch loader class looks like:

class WindowKeyLoader < GraphQL::Batch::Loader
  attr_reader :model, :foreign_key, :limit, :order_col, :order_dir

  def initialize(model, foreign_key, limit:, order_col: :id, order_dir: :asc)
    @model = model
    @foreign_key = foreign_key
    @limit = limit
    @order_col = order_col
    @order_dir = order_dir
  end

  def perform(foreign_ids)
    # build the sub-query, limiting results by foreign key at this point
    # we don't want to execute this query but get its SQL to be used later
    ranked_from = model.
      select("*,
        row_number() OVER (
          PARTITION BY #{foreign_key} ORDER BY #{order_col} #{order_dir}
        ) as rank").
      where(foreign_key => foreign_ids).
      to_sql

    # use the sub-query from above to query records which have a rank
    # value less than or equal to our limit
    records = model.
      from("(#{ranked_from}) as #{model.table_name}").
      where("rank <= #{limit}").
      to_a

    # match records and fulfill promises
    foreign_ids.each do |foreign_id|
      matching_records = records.select do |r|
        foreign_id == r.send(foreign_key)
      end
      fulfill(foreign_id, matching_records)
    end
  end
end

We're able to test the WindowKeyLoader by creating three events for a category but only asking for the first two of them:

describe WindowKeyLoader do
  it 'loads' do
    category = create(:category)
    events = (1..3).to_a.map do |n|
      create(:event, name: "Event #{n}", category: category)
    end

    result = GraphQL::Batch.batch do
      WindowKeyLoader.for(
        Event,
        :category_id,
        limit: 2, order_col: :id, order_dir: :asc
      ).load(category.id)
    end

    expect(result).to eq(events.first(2))
  end
end

Batch loading active storage attachments

You may run into situations where you're loading polymorphic data, or other types of relationships which don't exactly fit into the mold of your standard has-many or belongs-to relationships. One case is with ActiveStorage. In the code below we'll load an image URL for an event:

class Types::EventType < Types::BaseObject
  field :image, String, null: true

  def image
    # produces 2N + 1 queries... yikes!
    # url_for(object.image.variant({ quality: 75 }))

    AttachmentLoader.for(:Event, :image).load(object.id).then do |image|
      url_for(image.variant({ quality: 75 }))
    end
  end
end

This data is stored using a polymorphic relationship that loads an ActiveStorage::Attachment record, which then needs to load an ActiveStorage::Blob record in order to produce the image url. It ends up producing a 2N + 1 query! Our AttachmentLoader is able to completely optimize this field by cutting it down to just two queries to load as many images as you'd like.

class AttachmentLoader < GraphQL::Batch::Loader
  attr_reader :record_type, :attachment_name

  def initialize(record_type, attachment_name)
    @record_type = record_type
    @attachment_name = attachment_name
  end

  def perform(record_ids)
    # find records and fulfill promises
    ActiveStorage::Attachment.
      includes(:blob).
      where(record_type: record_type, record_id: record_ids, name: attachment_name).
      each { |record| fulfill(record.record_id, record) }

    # fulfill unfound records
    record_ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

In this case we are taking advantage of eager-loading, because for each attachment we will need its corresponding blob record.

Conclusion

GraphQL can be as efficient as REST, but requires approaching optimizations from a different angle. Instead of upfront optimizations, we lazy-load data only when required, loading it in batches to avoid excess trips to the database. In this article, we covered techniques to load single records, multiple records, and records with different types of relationships, as is the case with Active Storage which has a polymorphic relationship.

X
- +
B
A - Z
Copyleft 2024 Gritwerkz.

Twitter Email

Back to Top