blog.sojoodi.com

October 21, 2007

Ruby Spell Checker

Filed under: Ruby — Sahand @ 9:04 pm

Inspired by Peter Norvig’s genius article, while learning Ruby back in August, I wrote this piece of code. Writing it made me realize how powerful (at least for prototyping) and intuitive Ruby is. I hope you enjoy it.

#this is a script that reads a text file, makes a histogram of all the
#words and then tells you the frequency of a random word of your choice
#In other words, it could be used as a spellchecker/suggestor

#http://snippets.dzone.com/posts/show/280
class String
  def swap!(a,b)
    self[a], self[b] = self[b], self[a]
    self
  end
  def swap(a,b)
    newword = self.dup
    newword[a], newword[b] = newword[b], newword[a]
    newword
  end
end

class Novel
  def initialize
    @number_of_words = 0
    @dictionary = Hash.new(0)
  end

  def add_word_to_dictionary(word)
    @number_of_words += 1
    @dictionary[word.downcase] += 1
  end

  def english_word?(word)
    @dictionary[word.downcase] != 0
  end

  def get_word_frequency(word)
    Float(@dictionary[word.downcase]) / Float(@number_of_words)
  end

  def read_novel(novel)
    IO.read(novel).scan(/w+/).each {|word| add_word_to_dictionary word}
  end

  def correct_word(word)
    if english_word?(word)
      return word
    else
      perms = self.single_letter_insert(word)
      perms += self.swap_distance_one(word)
      perms += self.swap_distance_two(word)
      perms += self.single_letter_delete(word)

      unique_permutations = perms.uniq
      probabilities = unique_permutations.collect {|perm| get_word_frequency(perm)}
      unique_permutations.find_all {|perm| get_word_frequency(perm) > probabilities.max * 0.2}
    end
  end

  #these are the different permutations on a word (i.e. when misspelled)
  def single_letter_insert(word)
    perms = Array.new
    for i in 0..word.length
      perms += ('a'..'z').collect {|letter| word[0...i] + letter + word[i...word.length] }
    end
    perms
  end
  def single_letter_delete(word)
    (0...word.length).collect {|i| word[0...i]+word[(i+1)...word.length] }
  end
  def swap_distance_one(word)
    (0...(word.length - 1)).collect {|i| word.swap(i,i+1)}
  end
  def swap_distance_two(word)
    self.swap_distance_one(word).collect {|perm1| swap_distance_one(perm1)}.flatten
  end
end

thisnov = Novel.new
thisnov.read_novel('MarkTwain_AdventuresOfHuckleberryFinn.txt')
puts thisnov.english_word?("Michel")
puts thisnov.correct_word("te")

October 17, 2007

Simple Search Feature in Rails/MySQL

Filed under: MySQL, Rails, Ruby — Sahand @ 9:19 pm

Today, we decided that searching was a desirable feature for Giftify. So, I promised that we will have a search engine by the end of the day. I was able to get it done much faster, which is why I have time to make a post now.

One thing to note is that this is a very preliminary form of Search. The next iteration will definitely be acquiring Google! On a more realistic note, we will most likely use Ferret (Ruby implementation of Lucene) at some point. Take a look at here, here, and here.

But here’s short version of what I did today:

First, the table and objects to be searched. The items in our catalog have these characteristics which I would like to make searchable: “name”, “description”, “categories”. Note that the objective of this exercise is to search for an item which has the search word somewhere in its name, description, or list of categories which it’s associated with.

We already have the table/model Item. Add another model for the lookup words and a migration for a join table for the has_and_belongs_to_many relationship between Item and LookupWord.

class Item < ActiveRecord::Base
  has_and_belongs_to_many :lookup_words
end

class LookupWord < ActiveRecord::Base
  has_and_belongs_to_many :items
end

At this point you should be able to populate your lookup table of words. Note, that I put an index on the lookup_words table right off the bat which was probably a bad idea since it slowed down insertions into the table. I suggest building the lookup_words table first and then index it on the actual words for faster lookups.

The following Ruby script (within the Rails environment) does the job:

require 'rubygems'
require File.dirname(__FILE__) + '/config/environment'
item_to_lookup_word_map = {}

Item.find(:all).each do |item|
  composite_search = item.name+" "+item.description+" "+item.categories

  # take all the words (alpha) and array-ize
  composite_search_array = composite_search.downcase.scan(/[a-z]+/).compact

  # remove all words that are less than 3 letters long
  composite_search_array.collect! {|w| w unless w.size<3}
  composite_search_array.compact!
  composite_search_array.uniq!

  # add data to hash
  item_to_lookup_word_map[item.id] = composite_search_array
end

total = item_to_lookup_word_map.values.inject(0) {|t, val| t += val.size}
print "total number of search index items: #{total}"

LookupWord.destroy_all

# now do the deed
Item.find(:all).each do |item|
  item_to_lookup_word_map[item.id].each do |word|
    lw = LookupWord.find_or_create_by_name(word)
    lw.items << item
  end
end

September 19, 2007

Secure PayPal buttons with OpenSSL

Filed under: Crypto, Ruby — Sahand @ 10:44 pm

Today, while integrating PayPal payments with our website, I was introduced to the world of OpenSSL. Actually, this admission is a little embarrassing, given that I have actually worked at a Cryptography company before (Certicom)! But it was a long time ago and I was working on the really low-level optimizations not the user interface.

In any case, this post contains all the useful links I came across as well as cool little tricks I learned along the way.

First off, you can use OpenSSL to generate your own private key and public certificate. The following is an example with PayPal parameters (RSA 1024 and X.509)


openssl genrsa -out my-prvkey.pem 1024
openssl req -new -key my-prvkey.pem -x509 -days 365 -out my-pubcert.pem

Secondly, in order to generate encrypted buttons for PayPal, hence hiding all the information you are sending them, you will have to devise a simple Public Key Encryption scheme. For more details on how to submit your public certificate to PayPal and how to download theirs, go here. Also, for more information on PayPal button HTML options, refer to their website.

But assuming we have everything in place, I used the following lines of ruby code in order to generate the encrypted button (fictitious data):

button_options_hash = {
  :cmd => "_xclick",
  :business => "sahand_blahblah@ gmail.com",
  :item_name => "blahblah_item",
  :amount => "10",
  :item_number => "123456789",
  :shipping => "0.00",
  :no_note => "1",
  :return => "http://sojoodi.com/accepted",
  :cancel_return => "http://sojoodi.com/cancelled",
  :currency_code => "USD",
  :cert_id => "ABCDEFGHIJKLM"
}

ssl_command = "openssl smime -sign -signer my-pubcert.pem -inkey my-prvkey.pem " +
              "-outform der -nodetach -binary | openssl smime -encrypt -des3 -binary " +
              "-outform pem paypal_sandbox_cert.pem"
encryptor = IO.popen(ssl_command, "w+b")
button_options_hash.each { |i,j| encryptor.puts i.to_s+"="+j.to_s }
encryptor.close_write
@pp_button_encrypted_options = encryptor.readlines.join

There were two other very useful links that I used in order to get PayPal working with my app:
This was a short and sweet page on the Perl implementation. And this was a similar one using a BASH script. The examples provided on the official PayPal site were scary, so take a look at these two first for morale boost.

Cheers!

September 14, 2007

Luhn algorithm for credit card validation

Filed under: Ruby — Sahand @ 3:58 pm

I came across an interesting article on algorithms in, of all places, The Economist. It briefly describes the Luhn algorithm for credit card validation. So I hacked together the following piece of Ruby code which does just that.

print "Enter card number: "
cc_number = gets.chomp.tr(' -','')
checksum = 0
double = false
cc_number.reverse.each_byte do |digit|
  dig = digit.chr.to_i
  checksum += (double ? (dig*2)%9 : dig)
  double = !double
end

if checksum%10 == 0
  puts "valid CC number"
else
  puts "Invalid"
end

For more interesting articles on this, see the following:

http://www.darkcoding.net/index.php/credit-card-numbers/
http://www.merriampark.com/anatomycc.htm

© 2007 Sahand Sojoodi
Powered by WordPress