Sunday, January 29, 2012

Excel Automation With Ruby

Recently, I was asked to develop a tool that kind of processes a data in ASCII file and generates a report in Microsoft Excel while using another excel file as a reference database (for lookup tables etc). I had worked with Apache POI libraries for excel and such before and didn't quite like the level of abstraction. So I decided to give ruby a shot, as it happens to be my second favorite programming language (and first favorite scripting language).
Writing to excel files is no big deal, but I thought it would be cool to be able to send sql like queries to excel sheet and get the result set in return. Apparently, there's a way to do it in Ruby and the database driver that lets us do that is ADO. This is an excellent blog that hosts a treasure of lessons about how to do cool stuff with Ruby. Pretty much everything you need can be found there and so, I am not going to re-post it here. However, here are a few very basic things to get you started and help you to avoid pitfalls:

1. Install Ruby interpreter for windows from here. Ruby windows installer comes with 'win32ole', so you don't need to download it somewhere else. At this point, I would suggest you to avoid the temptation of downloading 1.9x ruby release because there's no easy way to get the ruby debugger to work on this version yet ( as of 01/29/2011). I'll talk about this in a little more detail in the debugger installation section.

2. Install Dev-Kit from here (the same place as above). Dev-Kit installation is really simple, do not succumb to the temptation of not installing it. At some point, you're going to need the ruby debugger which needs native compilation and for which you need Dev-Kit.

3. Install gem called spreadsheet using the usual gem install command as follows:
gem install spreadheet

This is all you need as far as the spreadsheet part is concerned. Some useful code lines on how to use this gem is given on the blog is mentioned above.

4. Ruby Debugger: If you are using ruby 1.8x, ruby debugger can be installed using the same gem install command as above...

gem install ruby-debug

This gem has several dependencies (like ruby-debug-base, linecache, ruby_sources something etc) but the good news is,  once you issue this command, all the dependent gems will be automatically downloaded and installed.
Anyone who does not like to work with plain text editors and prefers IDEs instead, would obviously want to be able to run and debug scripts using the IDE. Now, this is where going with ruby1.9 is not a good idea (as yet). ruby-debug gem is NOT meant for ruby1.9x, they have a different name for that gem instead which is ruby-debug19 (and the prefix 19 goes with all the debug related gems for ruby1.9x). First of all, I have had problems with installing ruby-debug19 in itself but there's a workaround for that (installing pre-release versions of linecache etc. However, even after successfully installing it, I was unable to get it working with IDEs. The reason for that is, IDEs look for ruby-debug for debugger instead of ruby-debug19. I tried this with Eclipse DLTK as well as other editors such as Aptana's RadRails and also their eclipse plugin. After wasting a lot of time trying to make everything work smoothly, I decided to give up my pride and simply go with something that works. And I highly recommend doing so for a non expert person like me. You also need ruby-debug-ide gem to get the debugger working with the IDE.

5. What IDE/Editor to use?
At this point, you're all set. But before starting to write the application, one might wonder about what editor or IDE to use. I personally prefer Eclipse for several reasons. For one that it's free and I've used it a lot for development in java. For two, installing the plugin is no pain at all and the DLTK plugin has been around for a while, so it's reliable. And finally, there's a nice code completion assistance with this plugin. If these sound like good enough reasons to go with Eclipse DLTK, here are the installation instructions (you have to add a new update site under Help >> New software download). There's also Aptana's plugin for eclipse. Find it here. I used to like NetBeans, but apparently, Oracle yanked the ruby support from future releases... bummer!
Before using the DLTK for the first time, you will need to specify the ruby interpreter to the IDE and eclipse can search that for you. It should be under Project >> Properties >> ruby... something like that.
In rare situations, your eclipse or even the ruby interpreter won't find the gems. The fix is easy. Add an environmental variable called RUBYOPT  to windows and set its value to rubygems. Eclipse also has an option to add environmental variable and that is under Run >> Run Configurations... (and same for debug). I observed that, I did not have to define/add RUBYOPT, but someone else might..

6. Few tips and tricks and pitfalls:
- Make sure that the data format of columns is the same as the sql query is expecting. E.g. If you type in a number into a cell in excel, it will store it as a text (you also might get a comment-like warning saying that). If you're planning onto write queries that need numerical comparison, change the format of these cells to number.
- Make sure that the database excel sheet you're querying is not opened by any other application, this will keep your ruby script waiting for that excel sheet to be available without any warning (according to my observation)
- Ruby 1.8x does not handle special characters all that well because there's no encoding support in ruby 1.8x. The easiest option is to avoid special characters of course. But then, you also have an option to do the development on 1.8x and later upgrade your ruby to 1.9x. 
- I have also seen that in some cases, SQL query would return nil values for the cells whose format is number. I could not figure out why that happened, but the problem went away on its own. If something of this sort happens to you, try making a fresh excel sheet for database.

Hope this was helpful! Good luck :)