blog.sojoodi.com

October 31, 2007

Giftify Launch!

Filed under: Blogroll — Sahand @ 9:42 pm

Hello,

After many days of last minute code clean-ups, and gift description touch-ups (don’t even get me started on horrible it was, but at least I automated the spell-checking and price-checking) we are launching Giftify tonight.

http://apps.facebook.com/giftify/

It’s a really exciting and really scary moment. The fun begins now.

Sahand

October 23, 2007

One-hot coded statuses in MySQL

Filed under: MySQL, Rails — Sahand @ 10:27 pm

So, I don’t remember if I mentioned my hardware engineering background before moving to the softer field. Anyhow, the low-level person inside me came up with this data model, inspired by the one-hot coding scheme of a state machine.

Assuming that we are trying to keep track of the moods of people in a database and do actions based on combinations of these moods. Here’s an alternative to using long logical statements such as “mood = 0 and mood = 1 and mood = 4 …”.

First, let’s define the model, Person.

# == Schema Information
#
# Table name: persons
#
#  id                  :integer(11)   not null, primary key
#  mood                :integer(64)
# ...

class Person < ActiveRecord::Base
  MOOD = [
      [:happy,         0x0001],
      [:sad,           0x0002],
      [:angry,         0x0004],
      [:excited,       0x0008],
      [:mellow,        0x0010],
      [:lost,          0x0020],
      [:refreshed,     0x0040],
      [:hopeful,       0x0080]
  ]
  # create an array
  MOOD_HASH = Hash[*MOOD.flatten]

  # complex lookup constants
  # ...
end

In order to find all the people who are happy, excited, mellow, refreshed, and hopeful, all you need to do is to define a constant once (capturing your business logic):


  # this needs to be
  MOOD_POSITIVE = [:happy, :excited, :mellow, :refreshed, :hopeful].inject(0) do |mask, stat|
    mask |= MOOD_HASH[stat]
  end

and perform the following query:


Person.find(:all, :condition => ['mood & ?', Person::MOOD_POSITIVE])

Compare the cleanness and maintainability of the above against the following.


MOOD_INV_HASH = Person::MOOD_HASH.invert
Person.find(:all, :condition => [
    'mood = ? and mood = ? and mood = ? and mood = ? and mood = ?',
    MOOD_INV_HASH[:happy], MOOD_INV_HASH[:excited], MOOD_INV_HASH[:mellow],
    MOOD_INV_HASH[:refreshed], MOOD_INV_HASH[:hopeful]
  ])

If there were only one query like that in the whole system, we would be able to get around this awkward query in other ways. However, when you’re dealing with many complex queries I think this way helps you write more readable and maintainable code. Please, let me know your opinion if you disagree.

One last note, it turns out that SQL has native support for this kind of status coding via SETs. However, I couldn’t find a nice Rails port for this data type.

Feedback is more than welcome. Thanks.

October 21, 2007

Ruby Spell Checker

Filed under: Ruby — Sahand @ 9:04 pm

Inspired by Peter Norvig’s genius article, while learning Ruby back in August, I wrote this piece of code. Writing it made me realize how powerful (at least for prototyping) and intuitive Ruby is. I hope you enjoy it.

#this is a script that reads a text file, makes a histogram of all the
#words and then tells you the frequency of a random word of your choice
#In other words, it could be used as a spellchecker/suggestor

#http://snippets.dzone.com/posts/show/280
class String
  def swap!(a,b)
    self[a], self[b] = self[b], self[a]
    self
  end
  def swap(a,b)
    newword = self.dup
    newword[a], newword[b] = newword[b], newword[a]
    newword
  end
end

class Novel
  def initialize
    @number_of_words = 0
    @dictionary = Hash.new(0)
  end

  def add_word_to_dictionary(word)
    @number_of_words += 1
    @dictionary[word.downcase] += 1
  end

  def english_word?(word)
    @dictionary[word.downcase] != 0
  end

  def get_word_frequency(word)
    Float(@dictionary[word.downcase]) / Float(@number_of_words)
  end

  def read_novel(novel)
    IO.read(novel).scan(/w+/).each {|word| add_word_to_dictionary word}
  end

  def correct_word(word)
    if english_word?(word)
      return word
    else
      perms = self.single_letter_insert(word)
      perms += self.swap_distance_one(word)
      perms += self.swap_distance_two(word)
      perms += self.single_letter_delete(word)

      unique_permutations = perms.uniq
      probabilities = unique_permutations.collect {|perm| get_word_frequency(perm)}
      unique_permutations.find_all {|perm| get_word_frequency(perm) > probabilities.max * 0.2}
    end
  end

  #these are the different permutations on a word (i.e. when misspelled)
  def single_letter_insert(word)
    perms = Array.new
    for i in 0..word.length
      perms += ('a'..'z').collect {|letter| word[0...i] + letter + word[i...word.length] }
    end
    perms
  end
  def single_letter_delete(word)
    (0...word.length).collect {|i| word[0...i]+word[(i+1)...word.length] }
  end
  def swap_distance_one(word)
    (0...(word.length - 1)).collect {|i| word.swap(i,i+1)}
  end
  def swap_distance_two(word)
    self.swap_distance_one(word).collect {|perm1| swap_distance_one(perm1)}.flatten
  end
end

thisnov = Novel.new
thisnov.read_novel('MarkTwain_AdventuresOfHuckleberryFinn.txt')
puts thisnov.english_word?("Michel")
puts thisnov.correct_word("te")

October 17, 2007

Simple Search Feature in Rails/MySQL

Filed under: MySQL, Rails, Ruby — Sahand @ 9:19 pm

Today, we decided that searching was a desirable feature for Giftify. So, I promised that we will have a search engine by the end of the day. I was able to get it done much faster, which is why I have time to make a post now.

One thing to note is that this is a very preliminary form of Search. The next iteration will definitely be acquiring Google! On a more realistic note, we will most likely use Ferret (Ruby implementation of Lucene) at some point. Take a look at here, here, and here.

But here’s short version of what I did today:

First, the table and objects to be searched. The items in our catalog have these characteristics which I would like to make searchable: “name”, “description”, “categories”. Note that the objective of this exercise is to search for an item which has the search word somewhere in its name, description, or list of categories which it’s associated with.

We already have the table/model Item. Add another model for the lookup words and a migration for a join table for the has_and_belongs_to_many relationship between Item and LookupWord.

class Item < ActiveRecord::Base
  has_and_belongs_to_many :lookup_words
end

class LookupWord < ActiveRecord::Base
  has_and_belongs_to_many :items
end

At this point you should be able to populate your lookup table of words. Note, that I put an index on the lookup_words table right off the bat which was probably a bad idea since it slowed down insertions into the table. I suggest building the lookup_words table first and then index it on the actual words for faster lookups.

The following Ruby script (within the Rails environment) does the job:

require 'rubygems'
require File.dirname(__FILE__) + '/config/environment'
item_to_lookup_word_map = {}

Item.find(:all).each do |item|
  composite_search = item.name+" "+item.description+" "+item.categories

  # take all the words (alpha) and array-ize
  composite_search_array = composite_search.downcase.scan(/[a-z]+/).compact

  # remove all words that are less than 3 letters long
  composite_search_array.collect! {|w| w unless w.size<3}
  composite_search_array.compact!
  composite_search_array.uniq!

  # add data to hash
  item_to_lookup_word_map[item.id] = composite_search_array
end

total = item_to_lookup_word_map.values.inject(0) {|t, val| t += val.size}
print "total number of search index items: #{total}"

LookupWord.destroy_all

# now do the deed
Item.find(:all).each do |item|
  item_to_lookup_word_map[item.id].each do |word|
    lw = LookupWord.find_or_create_by_name(word)
    lw.items << item
  end
end

© 2007 Sahand Sojoodi
Powered by WordPress