Mastering Timestamping in Rails
UPD: 31/03/2024 I've opened a PR to add touch
option to #update_columns
and #update_column
method
In modern web development, precise and efficient data management is crucial for making informed business decisions.
Data consistency, particularly in how database records are dated, plays a crucial role. Data engineers often rely on these dates not only for record-keeping but also as a method to download only the data that has changed, avoiding the need to process entire tables.
This guide addresses the challenge of mastering dating amid the inconsistencies found in Rails' handling of timestamps. It explores strategies for ensuring that timestamps are both accurate and dependable, offering valuable insights for developers looking to navigate these complexities. Whether you're new to the field or seeking solutions to specific dating challenges, this article provides practical tips and strategies to improve your data management practices.
TLDR
Not all ActiveRecord persistence methods affect timestamps or have a touch
option. For methods like update_columns
that don't automatically update timestamps, you can create a RuboCop custom cop, modify ActiveRecord directly, or use database triggers to keep updated_at
always up-to-date.
Each method has its advantages and disadvantages, from how easy they are to manage to potential compatibility issues with future Rails updates or the risk of SQL operations skipping ActiveRecord methods.
If you're interested in this topic, join the Rails Discussion thread https://discuss.rubyonrails.org/t/proposal-add-touch-option-for-update-columns-update-column/85388
ActiveRecord timestamps configuration
ActiveRecord automatically timestamps create and update operations if the table has fields named created_at/created_on
or updated_at/updated_on
.
For turning off timestamping, add:
config.active_record.record_timestamps = false # true is default
Timestamps are in UTC by default, but you can use the local timezone by setting:
config.active_record.default_timezone = :local # is :utc by default
ActiveRecord keeps all the datetime
and time
columns timezone aware. By default, these values are stored in the database as UTC and converted back to the current Time.zone
when pulled from the database.
This feature can be turned off completely by setting:
config.active_record.time_zone_aware_attributes = false # is true by default
ActiveRecord persistence methods and touching timestamps
There are a lot of persistence methods in ActiveRecord, but not all of them touch timestamps or have a touch
option. You can find most of them in the table below.
Module#method | updates timestamps if record_timestamps == true | has touch option |
Persistence#save | yes | yes |
Persistence#save! | yes | yes |
Persistence#create | yes | no |
Persistence#create! | yes | no |
Persistence#update | yes | no |
Persistence#update! | yes | no |
Persistence#update_attribute | yes | no |
Persistence#touch | yes | no |
Persistence#increment! | no | yes |
Persistence#update_column | no | no |
Persistence#update_columns | no | no |
Persistence#toggle! | yes | no |
Persistence#insert | yes | yes, via record_timestamps keyword argument |
Persistence#insert! | yes | yes, via record_timestamps keyword |
Persistence#insert_all | yes | yes, via record_timestamps keyword |
Persistence#upsert_all | yes | yes, via record_timestamps keyword |
Relation#update_all | no | no |
Relation#touch_all | yes | yes, via positional arguments |
Relation#update_counters | no | yes |
As you can see, three methods don't update timestamps by default nor provide a touch option: update_column
, update_columns
, and update_all
. Sometimes this may be a problem, i.e., there is some ETL processing that, instead of copying the whole table, looks into updated_at
timestamps. So if someone uses update_columns
because of performance reasons, it may lead to lost updates. However, there are a couple of methods to solve this problem.
Let's consider a pretty basic example.
class ApplicationController < ActionController::Base
before_action :update_last_user_ip
def update_last_user_ip
ip = request.remote_ip
return if current_user.last_ip != ip
# we don't perform any callbacks or validations here
# so use #update_columns
current_user.update_columns(
last_ip: request.remote_ip,
updated_at: Time.current # but we still want to keep track of the last changes, so have to provide timestamp explicitly
)
end
end
Here, we aim to update the user's last seen IP in their record without triggering any validations or callbacks. The simplest method for this is using the #update_columns
method. However, to ensure the timestamp remains current, we must explicitly include updated_at
. What issues might arise from this approach?
Several, including:
Remembering that
#update_columns
does not update timestamps, a behavior that is documented but might still catch you off guard.The need to explicitly set
updated_at/updated_on
.The absence of a
touch
option, unlike what you find in methods like#increment!
.The
record_timestamps
setting does not affect timestamp behavior.
So, what can we do if we want to consistently update timestamps across the application? There are a few solutions.
Rubocop Cop
RuboCop lets you make your own custom cops. You need to make a new file for your custom cop, which we'll name UpdateColumnsCop
. Put this file in a folder where RuboCop looks for custom cops. A usual spot for this is lib/rubocop/cop/
.
Here's a simple setup for your custom cop:
# rubocop/cop/rails/update_columns_timestamps.rb
module RuboCop
module Cop
module Rails
class UpdateColumnsCop < RuboCop::Cop::Base
extend RuboCop::Cop::AutoCorrector
MSG = "Ensure `updated_at` or `updated_on` is updated when using `update_columns`"
def_node_matcher :update_columns?, <<-PATTERN
(send _ {:update_columns} ...)
PATTERN
def on_send(node)
return unless update_columns?(node)
# Check if `updated_at` or `updated_on` is being updated
updated_at_or_updated_on_updated = node.arguments.any? do |arg|
arg.hash_type? && arg.pairs.any? do |pair|
pair.key.value == :updated_at || pair.key.value == :updated_on
end
end
return if updated_at_or_updated_on_updated
add_offense(node, message: MSG) do |corrector|
corrector.insert_after(node.loc.selector, ", updated_at: Time.current")
end
end
end
end
end
end
To make RuboCop aware of your custom cop, you need to register it. Create a .rubocop.yml
file in your project root if you don't already have one, and add the following configuration:
require:
- ./rubocop/cop/rails/update_columns_timestamps.rb
Rails/UpdateColumnsCop:
Enabled: true
It provides lint error in case of using update_columns
without updated_at
or update_on
attribute:
app/controllers/application_controller.rb:10:5: C: [Correctable] Rails/UpdateColumnsCop: Ensure updated_at or updated_on is updated when using update_columns
current_user.update_columns( ...
Pros:
Identifies violations without changing behavior
Can be modified or ignored like a standard RuboCop cop
Cons:
It can't handle
update_column
because it doesn't offer an option for timestamps. This requires an additional rule that completely discourages the use ofupdate_column
in favor ofupdate_columns
.It doesn't address every situation. For instance, using raw SQL might still bypass updating the
updated_at
field.
Monkey patch ActiveRecord update_column
, update_columns
This method, inspired by Tim McCarthy's gist (with Unathi Chonco as an original author), includes a few modifications for safer patching and extra features.
Add an initializer for patches
# config/initializers/core_ext_require.rb # NOTE: Require all patches in lib/core_ext Dir[Rails.root.join("lib/core_ext/**/*.rb")].each { |f| require f }
Add a patch for the
Persistence
module# lib/core_ext/active_record/persistence/update_columns_patch.rb module CoreExt module ActiveRecord module Persistence module UpdateColumnsPatch # https://github.com/rails/rails/blob/36c1591bcb5e0ee3084759c7f42a706fe5bb7ca7/activerecord/lib/active_record/persistence.rb#L931-L954 def update_columns(attributes) touch = attributes.delete(:touch) { self.class.record_timestamps } if touch names = touch if touch != true names = Array.wrap(names) options = names.extract_options! touch_updates = self.class.touch_attributes_with_time(*names, **options) attributes.merge!(touch_updates) unless touch_updates.empty? end super(attributes) end # https://github.com/rails/rails/blob/36c1591bcb5e0ee3084759c7f42a706fe5bb7ca7/activerecord/lib/active_record/persistence.rb#L910-L913 def update_column(name, value, touch: true) update_columns(name => value, :touch => touch) end end end end end ActiveRecord::Persistence.prepend(CoreExt::ActiveRecord::Persistence::UpdateColumnsPatch)
This patch mimics the behavior of the
#save
method: it updates timestamps by default and introduces atouch:
option to choose whether to skip the update. It also respects therecord_timestamps
setting, both globally and at the model level. With this change, we can simplify our example as follows:class ApplicationController < ActionController::Base before_action :update_last_user_ip def update_last_user_ip ip = request.remote_ip return if current_user.last_ip != ip # we don't perform any callbacks or validations here # so use #update_columns current_user.update_columns(last_ip: request.remote_ip) end end
#update_columns
does automatically update theupdated_at
field. However, if you need to avoid updating it for some reason, you can explicitly use thetouch
option:current_user.update_columns(last_ip: request.remote_ip, touch: false)
Also, if attribute names are provided, they are updated together with the
updated_at
/updated_on
attributes, similar to how#update_counters
works.current_user.update_columns( last_ip: request.remote_ip, touch: :last_ip_updated_at )
Pros:
An ad-hoc solution that's easy to manage and adjust.
Behaves similarly to what we're used to with most methods in the
Persistence
module.
Cons:
Involves monkey patching, which might break in future Rails updates.
Doesn't address all scenarios, for example, using raw SQL might still bypass updating the
updated_at
field.
If you think these changes are worth including in Rails, please join the Rails Discussion and leave a comment: https://discuss.rubyonrails.org/t/proposal-add-touch-option-for-update-columns-update-column/85388
Database triggers
If you need to update timestamps on each insert or update, even with raw SQL, you should use database triggers. Database triggers are pieces of procedural code that run in response to specific events in a database. For updating timestamps, this could be an UPDATE
SQL statement.
First, we'll create the trigger function. This function is triggered whenever an update operation happens on a table. It will automatically update the updated_at
column to the current timestamp.
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
This function, update_updated_at_column
, is a simple PL/pgSQL function that sets the updated_at
column of the new row (NEW
) to the current timestamp (NOW()
).
Next, you need to create a trigger for each table you want to track updates on. Here's how you can create a trigger for a specific table, let's say your_table_name
:
CREATE TRIGGER update_your_table_name_trigger
BEFORE UPDATE ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
This trigger, update_your_table_name_trigger
, is set to execute before any update operation on your_table_name
. It calls the update_updated_at_column
function, which updates the updated_at
column.
To handle both insert and update events for setting created_at
and updated_at
timestamps, you'll need to create two separate triggers for each event type. The first trigger will handle the insert event, setting both created_at
and updated_at
to the current timestamp. The second trigger will handle the update event, setting only the updated_at
column to the current timestamp:
CREATE OR REPLACE FUNCTION update_created_updated_at_columns()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
NEW.created_at = NOW();
NEW.updated_at = NOW();
ELSIF TG_OP = 'UPDATE' THEN
NEW.updated_at = NOW();
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
This function checks the operation type (TG_OP
) to determine if it's an insert or update operation. For inserts, it sets both created_at
and updated_at
to the current timestamp. For updates, it only updates the updated_at
column.
Now, create the triggers for both insert and update events:
-- Trigger for INSERT
CREATE TRIGGER insert_your_table_name_trigger
BEFORE INSERT ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION update_created_updated_at_columns();
-- Trigger for UPDATE
CREATE TRIGGER update_your_table_name_trigger
BEFORE UPDATE ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION update_created_updated_at_columns();
For better triggers management within Rails it's recommended to use tools like fx or hair_trigger.
Pros:
- Always up-to-date
created_at/updated_at
timestamps
Cons:
Triggers are difficult to manage
You need to add triggers for each new table where you want to keep the timestamps current
Triggers make the app behavior less obvious and, sometimes you might not want to update timestamps, and that removes control from the app