Ruby: Count Articles by Category [Elastic Search]
Have you ever wanted to count how many articles will be left in a certain category after search? To give you an idea of what I mean, as an example let’s use the application that will be the outcome of this tutorial. Our database contains music albums which are categorized by the genre and subgenre.
Have you ever wanted to count how many articles will be left in a certain category after search? To give you an idea of what I mean, as an example let’s use the application that will be the outcome of this tutorial. Our database contains music albums which are categorized by the genre and subgenre.
After searching for, let’s say ‘Miles’ the album count in a category will change according to how many articles (albums in this case) meet the search requirement.
My solution utilizes Ruby on Rails elastic aggregations and Drapper. The best thing about it is that it doesn’t make additional requests to the database. Elastic makes the searching process much easier and faster. A solution on the basis of a database is much more time-consuming to develop and heavier on the server than mine.
The only downside I can think of is that you need to create an additional service.
Steps
I'll skip steps taken to create the app and to add some layout. I'll use controller articles with action index for listing and search.
Install prerequisites:
Elastic search or use docker image
Add these gems to the Gemfile:
gem 'elasticsearch-model'
gem 'elasticsearch-rails'
gem 'draper'
Github: 'drapergem/draper'
Bundle install
Add elastic initializer to point elastic host
config = {
host: 'http://elasticsearch:9200',
transport_options: {
request: { timeout: 5 }
}
}
Elasticsearch::Model.client = Elasticsearch::Client.new(config)
Create models:
rails g model category name:string
rails g model author name:string
rails g model article name:string authors:references
rails g model article_category article:references category:references
rake db:migrate
rails generate draper:install
Create a decorator for the Category:
rails generate decorator Category
Add attr_accessor :article_count
to the decorator of the class.
Create collection decorator
Add an apply_counts
method with a buckets_counts
argument.
class CategoriesDecorator < Draper::CollectionDecorator
def apply_counts(buckets_counts)
each { |obj| obj.article_count = buckets_counts[obj.id] }
end
end
Later this will allow to set the article_count
based on aggregations.
Add module Searchable
to concerns
module Searchable
extend ActiveSupport::Concern
included do
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
def index_document
__elasticsearch__.index_document
end
end
module ClassMethods
def recreate_index!
__elasticsearch__.create_index! force: true
__elasticsearch__.refresh_index!
end
end
end
The Article model
Include Searchable
:
Add associations:
has_many :article_categories
has_many :categories, through: :article_categories
Delegate author_name
:
delegate :name, to: :author, prefix: true
In the models
folder create Articles::Index
module to define the elastic index:
module Articles
module Index
extend ActiveSupport::Concern
included do
index_name "article-#{Rails.env}"
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :name, type: :text, analyzer: 'english'
indexes :author_name, type: :text, analyzer: 'english'
indexes :category_names, type: :text, analyzer: 'english'
indexes :category_ids, type: :integer
end
end
end
def as_indexed_json(*)
{
name: name,
author_name: author_name,
category_names: category_names,
category_ids: category_ids
}
end
private
def category_names
categories.pluck(:name).compact.uniq
end
def category_ids
categories.pluck(:id).compact.uniq
end
end
end
type: :text
is for thetext search.
indexes :category_ids
and type: :integer
will allow to aggregate the results by category.
Include the Articles::Index
in Article model.
Seed data
I've prepared the seed with some jazz albums with assigned jazz sub-genres.
jazz = Category.create(name: 'jazz')
fusion = Category.create(name: 'jazz fusion')
bebop = Category.create(name: 'bebop')
cool = Category.create(name: 'cool jazz')
author = Author.create(name: 'Miles Davis')
['Bitches Brew', 'A Tribute to Jack Johnson', 'Miles In The Sky', 'Pangaea'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: fusion, article: article)
end
['Kind of Blue', 'Sketches Of Spain', 'Birth of the Cool', 'Porgy And Bess'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: cool, article: article)
ArticleCategory.create(category: bebop, article: article)
end
author = Author.create(name: 'Sonny Rollins')
['Sonny Rollins With The Modern Jazz Quartet'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: cool, article: article)
end
['Next Album', 'Easy Living', 'The Way I Feel ', "Don't Stop the Carnival"].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: fusion, article: article)
end
['Saxophone Colossus', 'Plus Three'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: bebop, article: article)
end
author = Author.create(name: 'Chet Baker')
['Chet', 'My Funny Valentine'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: cool, article: article)
end
author = Author.create(name: 'Paul Desmond')
['Feeling Blue', 'Bossa Antigua', "We're all together again"].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: cool, article: article)
end
author = Author.create(name: 'Dave Brubeck')
['Concord on a Summer Night', 'Time Further Out', "Time Out"].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: cool, article: article)
end
author = Author.create(name: 'The Mahavishnu Orchestra')
['Birds Of Fire', 'Between Nothingness & Eternity', 'The Inner Mounting Flame'].each do |title|
article = Article.create(name: title, authors: author )
ArticleCategory.create(category: jazz, article: article)
ArticleCategory.create(category: fusion, article: article)
end
Article.recreate_index!
Article.import
The last two lines in the seed create the index in elastic, so:
rake db:seed
Now we can take care of searching and aggregates.
Oh look! We're halfway through the post! Here's a picture of a cute kitten:
Add a simple class for the search form:
class SearchForm
include ActiveModel::Model
attr_reader :search_text
def initialize(search_text)
@search_text = search_text
end
end
Add search query object:
class SearchQuery
def initialize(search_form)
@search_form = search_form
end
def call
Article.search(search_text).records.to_a
end
private
attr_reader :search_form
delegate :search_text, to: :search_form
end
Search with elastic could be as simple as Article.search(search_text).records.to_a
but we can ask elastic to count something for us in one go.
To do this we will need a little bit more complex query which we'll prepare using elastic DSL and put as an argument to the search method.
All methods beneath are private.
Search definition object will do almost the same thing as the search above.
def query
{
size: 100,
from: 0,
query: simple_query
}
end
def match_all
{ match_all: {} }
end
def simple_query
return match_all if search_text.blank?
{
query_string: {
query: add_wildcards(search_text)
}
}
end
def add_wildcards(text)
text.split(' ').map { |el| "*#{el}*" }.join(' ')
end
Attributes :size
, :from
are for paging, default elastic page size is 10.
How to add aggregations? Using elastic DSL allows us to define aggs criteria:
def aggs_categories
{
by_categories:{
terms:{
field: :category_ids
}
}
}
end
And add them to search definition object:
def query
{
size: 100,
from: 0,
query: simple_query,
aggs: aggs_categories
}
end
Result of our query object at the end should look like a.e.
{
:size => 100,
:from => 0,
:query => {
:query_string => {
:query => "*miles*"
}
},
:aggs => {
:by_categories => {
:terms=> { :field =>: category_ids }
}
}
}
When we assign a search object to a search variable as search = Article.search(query)
then we check search.response
on search object which should look like this:
{
"took"=>62,
"timed_out"=>false,
"_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0},
"hits"=> {
"total"=> 8,
"max_score" => 1.0,
"hits" => [
{"_index"=>"article-development", "_type"=>"article", "_id"=>"1", "_score"=>1.0, "_source"=>{"name"=>"Bitches Brew", "author_name"=>"Miles Davis", "category_names"=>["jazz", "jazz fusion"], "category_ids"=>[1, 2]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"2", "_score"=>1.0, "_source"=>{"name"=>"A Tribute to Jack Johnson", "author_name"=>"Miles Davis", "category_names"=>["jazz", "jazz fusion"], "category_ids"=>[1, 2]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"3", "_score"=>1.0, "_source"=>{"name"=>"Miles In The Sky", "author_name"=>"Miles Davis", "category_names"=>["jazz", "jazz fusion"], "category_ids"=>[1, 2]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"4", "_score"=>1.0, "_source"=>{"name"=>"Pangaea", "author_name"=>"Miles Davis", "category_names"=>["jazz", "jazz fusion"], "category_ids"=>[1, 2]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"5", "_score"=>1.0, "_source"=>{"name"=>"Kind of Blue", "author_name"=>"Miles Davis", "category_names"=>["jazz", "bebop", "cool jazz"], "category_ids"=>[1, 3, 4]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"6", "_score"=>1.0, "_source"=>{"name"=>"Sketches Of Spain", "author_name"=>"Miles Davis", "category_names"=>["jazz", "bebop", "cool jazz"], "category_ids"=>[1, 3, 4]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"7", "_score"=>1.0, "_source"=>{"name"=>"Birth of the Cool", "author_name"=>"Miles Davis", "category_names"=>["jazz", "bebop", "cool jazz"], "category_ids"=>[1, 3, 4]}},
{"_index"=>"article-development", "_type"=>"article", "_id"=>"8", "_score"=>1.0, "_source"=>{"name"=>"Porgy And Bess", "author_name"=>"Miles Davis", "category_names"=>["jazz", "bebop", "cool jazz"], "category_ids"=>[1, 3, 4]}}
]
},
"aggregations" => {
"by_categories" => {
"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>[
{"key"=>1, "doc_count"=>8},
{"key"=>2, "doc_count"=>4},
{"key"=>3, "doc_count"=>4},
{"key"=>4, "doc_count"=>4}
]
}
}
}
In aggregations
=> by_categories
we can find buckets
and that's what we're interested in! Key buckets
contain counts for category_ids
.
Extract them:
def buckets_categories_counts
@categories_counts ||= search.response
.deep_symbolize_keys[:aggregations][:by_categories][:buckets]
.map{ |bucket| OpenStruct.new(bucket) }
end
Map categories_ids
:
def buckets_categories_ids
buckets_categories_counts.map(&:key)
end
Prepare the bucket hash:
def buckets_hash
buckets_categories_counts.each_with_object({}) do |bucket, obj|
obj[bucket.key]= bucket.doc_count
end
end
Find categories, decorate the collection and apply counts:
def categories
::CategoriesDecorator.decorate(
Category.where(id: buckets_categories_ids)
).apply_counts(buckets_hash)
end
Update public method call:
def call
OpenStruct.new(
categories: categories,
articles: search.records.order('articles.name asc').includes(:author).to_a
)
end
We will return object with categories and articles.
Last step is to add some logic to ArticlesController
class ArticlesController < ApplicationController
def index
@search_form = SearchForm.new(search_text)
result = SearchQuery.new(@search_form).call
@articles = result.articles
@categories = result.categories
end
private
def search_text
params.dig(:search_form, :search_text)
end
end
Working app
That's all, you can check working example downloading repo:
Clone or download repo
Install docker if needed
Run
docker-compose build
docker-compose run web bundle install
docker-compose run web rake db:create db:migrate db:seed
Let's talk about Jamstack and headless e-commerce!
Contact us and we'll warmly introduce you to the vast world of Jamstack & headless development!