We’ve been working on integrations of many different warehouse systems with the Shopify platform. All data exchange between them utilizes Sidekiq workers’ background jobs. Generally, we want to be notified about the first occurrence of an error. So most exceptions are caught by Raven and sent to Sentry. However, we faced some exceptions at remote systems, for example, connection issues. Luckily, after some worker retries the problems were solved without any additional actions. In such cases we wanted Sidekiq workers to have silent retries without spamming our Slack channel with Sentry messages.
The bad news is that Sidekiq doesn’t offer access to retry_count param from a worker. Fortunately, Sidekiq offers us developers middleware that allows us to add a functionality which has access to job attributes including
retry_count. Raven allows to specify in config
should_capture where we will add
Proc, where we exclude custom error
Let’s start with registering our retry middleware.
Sidekiq.configure_server do |config| config.server_middleware do |chain| chain.add Middleware::Sidekiq::RetryMonitoring end end
We want to delay some specific network/api errors, so let’s define an array that contains some of them.
SILENT_RETRY_ERRORS = [ EOFError, Errno::ECONNRESET, Errno::EINVAL, Errno::ECONNREFUSED, Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError, Net::SSH::Exception, Timeout::Error, SocketError, ActiveResource::ServerError, ActiveResource::TimeoutError ]
In the next step we define a module which we include in our worker. This module will help us identify monitored worker and also allow us to define
retry_count _for_sentry there.
module Middleware::Sidekiq::RetryMonitoring::MonitoredWorker extend ActiveSupport::Concern included do def retry_count_for_sentry 10 end end end
Our custom error class:
module Sidekiq class SilentRetryError < StandardError; end end
In our middleware, we rescue errors for which we want to delay Sentry notifications. If worker’s
retry_count is lower than our
retry_count_for_sentry then we have to replace the original exception with a custom one and raise
Sidekiq::SilentRetryError – otherwise we would have to reraise the original one. Without that, Sidekiq treats the job as completed.
class Middleware::Sidekiq::RetryMonitoring def call(worker, job, queue) begin yield rescue *SILENT_RETRY_ERRORS => e if silent_error?(worker, job) raise Sidekiq::SilentRetryError.new([e.class, e.message].join(" ")) else raise end end end private def silent_error?(worker, job) retry_count = job["retry_count"].present? ? job["retry_count"].to_i + 1 : 0 worker.is_a?(Middleware::Sidekiq::RetryMonitoring::MonitoredWorker) && retry_count < worker.threshold_retry_count_for_sentry end end
The last thing to do is to define
should_capture for Raven. We can define
Proc which checks if the exception contains
Raven.configure do |config| config.should_capture = Proc.new do |e| e.to_s.exclude?("Sidekiq::SilentRetryError".freeze) end end
Sentry will be notified after ninth retry of some errors. We wanted to avoid overflooding Sentry/Slack with notifications. Some jobs after some retries are successful and there’s no need to get notifications from the very beginning.