Simple Search Feature in Rails/MySQL (Part 2)
I meant to make this post much earlier, but all the start-up fun kept me from doing it. The first part of the post was published on Oct 17 (Simple Search Feature) and covered my Do It Yourself solution for the catalog search feature in Giftify.
The problem with the original indexing scheme was that it created separate “lookup word” entry for words that were inclusive of one other. For example, it would be possible that the word “rose” appeared in the descriptions of gifts 1 and 3 and “rosemary” in 2 and 3. Note that the correct indexing scheme would dictate that all items of “rosemary” are a subset of those of “rose”.

One way of fixing this problem is using a query with the LIKE statement in SQL to get all LookupWords that are similar to the actual search term and then perform a complex query, which will have to be created dynamically based on the results of the LIKE query. The complexity of this solution, the shakiness of LIKE/DISTINCT constructs, and the fact that a lot of stuff needs to happen for every search led me to implement a better indexing scheme.
I have mentioned the solution above: when indexing, make sure that the items in “roses” are all included in the set of items associated with “rose”. This way when looking up the word “rose” in the catalog, all three items are returned.
The Ruby implementation of the improved indexing scheme follows. Note that the trivial case of plural words containing the singular ones is handled via pluralize and singularize functions in Rails.
$lookup_word_to_item_map = {}
Item.find(:all).each do |item|
composite_search = item.name+" "+item.description+" "+item.categories_string
# take all the words (alpha) and array-ize
composite_search_array = composite_search.scan(/[a-z]+/).compact.collect {|w| w.singularize}
# remove all words that are less than 3 letters long
composite_search_array.collect! {|w| w unless w.size<3}
composite_search_array.compact!
composite_search_array.uniq!
# add data to hash
composite_search_array.each do |word|
$lookup_word_to_item_map[word] << item.id unless $lookup_word_to_item_map[word].include? item.id
end
end
# in ruby str1[str2] returns nil if str1 doesn't contain str2
$lookup_word_to_item_map.keys.each do |baseword|
extra_items = $lookup_word_to_item_map.keys.collect do |biggerword|
$lookup_word_to_item_map[biggerword] if biggerword[baseword]
end
$lookup_word_to_item_map[baseword] << extra_items
$lookup_word_to_item_map[baseword].flatten!
$lookup_word_to_item_map[baseword].compact!
$lookup_word_to_item_map[baseword].uniq!
end
A quick note on efficiency. The indexing process becomes slower and the index table becomes larger as a result of this scheme, but the trade-off against making on-the-fly search faster is definitely worth it.
Hope this was useful and thanks for dropping by.