blog.sojoodi.com

December 2, 2007

Simple Search Feature in Rails/MySQL (Part 2)

Filed under: MySQL, Rails — Sahand @ 9:42 pm

I meant to make this post much earlier, but all the start-up fun kept me from doing it. The first part of the post was published on Oct 17 (Simple Search Feature) and covered my Do It Yourself solution for the catalog search feature in Giftify.

The problem with the original indexing scheme was that it created separate “lookup word” entry for words that were inclusive of one other. For example, it would be possible that the word “rose” appeared in the descriptions of gifts 1 and 3 and “rosemary” in 2 and 3. Note that the correct indexing scheme would dictate that all items of “rosemary” are a subset of those of “rose”.

Incomprehensive Indexing

 

One way of fixing this problem is using a query with the LIKE statement in SQL to get all LookupWords that are similar to the actual search term and then perform a complex query, which will have to be created dynamically based on the results of the LIKE query. The complexity of this solution, the shakiness of LIKE/DISTINCT constructs, and the fact that a lot of stuff needs to happen for every search led me to implement a better indexing scheme.

I have mentioned the solution above: when indexing, make sure that the items in “roses” are all included in the set of items associated with “rose”. This way when looking up the word “rose” in the catalog, all three items are returned.

The Ruby implementation of the improved indexing scheme follows. Note that the trivial case of plural words containing the singular ones is handled via pluralize and singularize functions in Rails.

$lookup_word_to_item_map = {}

Item.find(:all).each do |item|
  composite_search = item.name+" "+item.description+" "+item.categories_string

  # take all the words (alpha) and array-ize
  composite_search_array = composite_search.scan(/[a-z]+/).compact.collect {|w| w.singularize}

  # remove all words that are less than 3 letters long
  composite_search_array.collect! {|w| w unless w.size<3}
  composite_search_array.compact!
  composite_search_array.uniq!

  # add data to hash
  composite_search_array.each do |word|
    $lookup_word_to_item_map[word] << item.id unless $lookup_word_to_item_map[word].include? item.id
  end
end

# in ruby str1[str2] returns nil if str1 doesn't contain str2
$lookup_word_to_item_map.keys.each do |baseword|
  extra_items = $lookup_word_to_item_map.keys.collect do |biggerword|
    $lookup_word_to_item_map[biggerword] if biggerword[baseword]
  end
  $lookup_word_to_item_map[baseword] << extra_items
  $lookup_word_to_item_map[baseword].flatten!
  $lookup_word_to_item_map[baseword].compact!
  $lookup_word_to_item_map[baseword].uniq!
end

A quick note on efficiency. The indexing process becomes slower and the index table becomes larger as a result of this scheme, but the trade-off against making on-the-fly search faster is definitely worth it.

Hope this was useful and thanks for dropping by.

1 Comment »

  1. Just to show you how ugly the original query using LIKE would be:
    (note that I don’t claim to be a SQL guru, so there may be a better way.. please let me know)
    SELECT * FROM items AS i INNER JOIN lookup_words AS luw INNER JOIN items_lookup_words AS iluw ON luw.name LIKE ‘%rose%’ and iluw.item_id=i.id and iluw.lookup_word_id=luw.id;

    Comment by Sahand — December 3, 2007 @ 12:03 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

© 2007 Sahand Sojoodi
Powered by WordPress