Faster Rails: How to Check if a Record Exists
Ruby and Rails are slow — this argument is often used to downplay the worth of the language and the framework. This statement in itself is not false. Generally speaking, Ruby is slower than its direct competitors such as Node.js and Python. Yet, many businesses from small startups to platforms with millions of users use it as the backbone of their operations. How can we explain these contradictions?
What makes your application slow?
While there can be many reasons behind making your application slow, database queries usually play the biggest role in an application's performance footprint. Loading too much data into memory, N+1 queries, lack of cached values, and the lack of proper databases indexes are the biggest culprits that cause slow requests.
There are some legitimate domains where Ruby is simply too slow. However, most of the slow responses in our applications usually boil down to unoptimized database calls and the lack of proper caching.
Even if your application is blazing fast today, it can become much slower in only several months. API calls that worked just fine can suddenly start killing your service with a dreaded HTTP 502 response. After all, working with a database table with several hundred records is very different from working with a table that has millions of records.
Existence checks in Rails
Existence checks are probably the most common calls that you send to your database. Every request handler in your application probably starts with a lookup, followed by a policy check that uses multiple dependent lookups in the database.
However, there are multiple ways to check the existence of a database record in Rails. We have present?
, empty?
, any?
, exists?
, and various other counting-based approaches, and they all have vastly different performance implications.
Let's run a simple test to get Build
records of past 7 days. Let's observe the database calls.
Build.where(:created_at => 7.days.ago..1.day.ago).passed.present?
# SELECT "builds".* FROM "builds" WHERE ("builds"."created_at" BETWEEN
# '2017-02-22 21:22:27.133402' AND '2017-02-28 21:22:27.133529') AND
# "builds"."result" = $1 [["result", "passed"]]
Build.where(:created_at => 7.days.ago..1.day.ago).passed.any?
# SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN
# '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND
# "builds"."result" = $1 [["result", "passed"]]
Build.where(:created_at => 7.days.ago..1.day.ago).passed.empty?
# SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN
# '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND
# "builds"."result" = $1 [["result", "passed"]]
Build.where(:created_at => 7.days.ago..1.day.ago).passed.exists?
# SELECT 1 AS one FROM "builds" WHERE ("builds"."created_at" BETWEEN
# '2017-02-22 21:23:04.066301' AND '2017-02-28 21:23:04.066443') AND
# "builds"."result" = $1 LIMIT 1 [["result", "passed"]]
The first call that uses .present?
is very inefficient. It loads all the records from the database into memory, constructs the Active Record objects, and then finds out if the array is empty or not. In a huge database table, this can cause havoc and potentially load millions of records, that can even lead to downtimes in your service.
The second and third approaches, any?
and empty?
, are optimized in Rails and load only COUNT(*)
into the memory. COUNT(*)
queries are usually efficient, and you can use them even on semi-large tables without any dangerous side effects.
The third approach, exists?
, is even more optimized, and it should be your first choice when checking the existence of a record. It uses the SELECT 1 ... LIMIT 1
approach, which is very fast.
Here are some numbers from our production database for the above queries:
present? => 2892.7 ms
any? => 400.9 ms
empty? => 403.9 ms
exists => 1.1 ms
This small tweak can make your code up to 400 times faster in some cases.
If you take into account that 200ms is considered the upper limit for an acceptable response time, you will realize that this tweak can spell the difference between a good, sluggish, and bad user experience.
Should I always use exists?
I consider exists?
a good sane default that usually has the best performance footprint. However, there are some exceptions.
For example, if we are checking for the existence of an association record without any scope, any?
and empty?
will also produce a very optimized query that uses SELECT 1 FROM ... LIMIT 1
form, but any? fill not hit the database again if the records are already loaded into memory.
This makes any?
faster by one whole database call when the records are already loaded into memory:
project = Project.find_by_name("gritwerkz")
project.builds.load # eager loads all the builds into the association cache
project.builds.any? # no database hit
project.builds.exists? # hits the database
# if we bust the association cache
project.builds(true).any? # hits the database
project.builds(true).exists? # hits the database
As a conclusion, my general advice is to always use exists? and improve the code based on metrics.