blog.sojoodi.com

December 2, 2007

Simple Search Feature in Rails/MySQL (Part 2)

Filed under: MySQL, Rails — Sahand @ 9:42 pm

I meant to make this post much earlier, but all the start-up fun kept me from doing it. The first part of the post was published on Oct 17 (Simple Search Feature) and covered my Do It Yourself solution for the catalog search feature in Giftify.

The problem with the original indexing scheme was that it created separate “lookup word” entry for words that were inclusive of one other. For example, it would be possible that the word “rose” appeared in the descriptions of gifts 1 and 3 and “rosemary” in 2 and 3. Note that the correct indexing scheme would dictate that all items of “rosemary” are a subset of those of “rose”.

Incomprehensive Indexing

 

One way of fixing this problem is using a query with the LIKE statement in SQL to get all LookupWords that are similar to the actual search term and then perform a complex query, which will have to be created dynamically based on the results of the LIKE query. The complexity of this solution, the shakiness of LIKE/DISTINCT constructs, and the fact that a lot of stuff needs to happen for every search led me to implement a better indexing scheme.

I have mentioned the solution above: when indexing, make sure that the items in “roses” are all included in the set of items associated with “rose”. This way when looking up the word “rose” in the catalog, all three items are returned.

The Ruby implementation of the improved indexing scheme follows. Note that the trivial case of plural words containing the singular ones is handled via pluralize and singularize functions in Rails.

$lookup_word_to_item_map = {}

Item.find(:all).each do |item|
  composite_search = item.name+" "+item.description+" "+item.categories_string

  # take all the words (alpha) and array-ize
  composite_search_array = composite_search.scan(/[a-z]+/).compact.collect {|w| w.singularize}

  # remove all words that are less than 3 letters long
  composite_search_array.collect! {|w| w unless w.size<3}
  composite_search_array.compact!
  composite_search_array.uniq!

  # add data to hash
  composite_search_array.each do |word|
    $lookup_word_to_item_map[word] << item.id unless $lookup_word_to_item_map[word].include? item.id
  end
end

# in ruby str1[str2] returns nil if str1 doesn't contain str2
$lookup_word_to_item_map.keys.each do |baseword|
  extra_items = $lookup_word_to_item_map.keys.collect do |biggerword|
    $lookup_word_to_item_map[biggerword] if biggerword[baseword]
  end
  $lookup_word_to_item_map[baseword] << extra_items
  $lookup_word_to_item_map[baseword].flatten!
  $lookup_word_to_item_map[baseword].compact!
  $lookup_word_to_item_map[baseword].uniq!
end

A quick note on efficiency. The indexing process becomes slower and the index table becomes larger as a result of this scheme, but the trade-off against making on-the-fly search faster is definitely worth it.

Hope this was useful and thanks for dropping by.

October 23, 2007

One-hot coded statuses in MySQL

Filed under: MySQL, Rails — Sahand @ 10:27 pm

So, I don’t remember if I mentioned my hardware engineering background before moving to the softer field. Anyhow, the low-level person inside me came up with this data model, inspired by the one-hot coding scheme of a state machine.

Assuming that we are trying to keep track of the moods of people in a database and do actions based on combinations of these moods. Here’s an alternative to using long logical statements such as “mood = 0 and mood = 1 and mood = 4 …”.

First, let’s define the model, Person.

# == Schema Information
#
# Table name: persons
#
#  id                  :integer(11)   not null, primary key
#  mood                :integer(64)
# ...

class Person < ActiveRecord::Base
  MOOD = [
      [:happy,         0x0001],
      [:sad,           0x0002],
      [:angry,         0x0004],
      [:excited,       0x0008],
      [:mellow,        0x0010],
      [:lost,          0x0020],
      [:refreshed,     0x0040],
      [:hopeful,       0x0080]
  ]
  # create an array
  MOOD_HASH = Hash[*MOOD.flatten]

  # complex lookup constants
  # ...
end

In order to find all the people who are happy, excited, mellow, refreshed, and hopeful, all you need to do is to define a constant once (capturing your business logic):


  # this needs to be
  MOOD_POSITIVE = [:happy, :excited, :mellow, :refreshed, :hopeful].inject(0) do |mask, stat|
    mask |= MOOD_HASH[stat]
  end

and perform the following query:


Person.find(:all, :condition => ['mood & ?', Person::MOOD_POSITIVE])

Compare the cleanness and maintainability of the above against the following.


MOOD_INV_HASH = Person::MOOD_HASH.invert
Person.find(:all, :condition => [
    'mood = ? and mood = ? and mood = ? and mood = ? and mood = ?',
    MOOD_INV_HASH[:happy], MOOD_INV_HASH[:excited], MOOD_INV_HASH[:mellow],
    MOOD_INV_HASH[:refreshed], MOOD_INV_HASH[:hopeful]
  ])

If there were only one query like that in the whole system, we would be able to get around this awkward query in other ways. However, when you’re dealing with many complex queries I think this way helps you write more readable and maintainable code. Please, let me know your opinion if you disagree.

One last note, it turns out that SQL has native support for this kind of status coding via SETs. However, I couldn’t find a nice Rails port for this data type.

Feedback is more than welcome. Thanks.

October 17, 2007

Simple Search Feature in Rails/MySQL

Filed under: MySQL, Rails, Ruby — Sahand @ 9:19 pm

Today, we decided that searching was a desirable feature for Giftify. So, I promised that we will have a search engine by the end of the day. I was able to get it done much faster, which is why I have time to make a post now.

One thing to note is that this is a very preliminary form of Search. The next iteration will definitely be acquiring Google! On a more realistic note, we will most likely use Ferret (Ruby implementation of Lucene) at some point. Take a look at here, here, and here.

But here’s short version of what I did today:

First, the table and objects to be searched. The items in our catalog have these characteristics which I would like to make searchable: “name”, “description”, “categories”. Note that the objective of this exercise is to search for an item which has the search word somewhere in its name, description, or list of categories which it’s associated with.

We already have the table/model Item. Add another model for the lookup words and a migration for a join table for the has_and_belongs_to_many relationship between Item and LookupWord.

class Item < ActiveRecord::Base
  has_and_belongs_to_many :lookup_words
end

class LookupWord < ActiveRecord::Base
  has_and_belongs_to_many :items
end

At this point you should be able to populate your lookup table of words. Note, that I put an index on the lookup_words table right off the bat which was probably a bad idea since it slowed down insertions into the table. I suggest building the lookup_words table first and then index it on the actual words for faster lookups.

The following Ruby script (within the Rails environment) does the job:

require 'rubygems'
require File.dirname(__FILE__) + '/config/environment'
item_to_lookup_word_map = {}

Item.find(:all).each do |item|
  composite_search = item.name+" "+item.description+" "+item.categories

  # take all the words (alpha) and array-ize
  composite_search_array = composite_search.downcase.scan(/[a-z]+/).compact

  # remove all words that are less than 3 letters long
  composite_search_array.collect! {|w| w unless w.size<3}
  composite_search_array.compact!
  composite_search_array.uniq!

  # add data to hash
  item_to_lookup_word_map[item.id] = composite_search_array
end

total = item_to_lookup_word_map.values.inject(0) {|t, val| t += val.size}
print "total number of search index items: #{total}"

LookupWord.destroy_all

# now do the deed
Item.find(:all).each do |item|
  item_to_lookup_word_map[item.id].each do |word|
    lw = LookupWord.find_or_create_by_name(word)
    lw.items << item
  end
end

© 2007 Sahand Sojoodi
Powered by WordPress