Monday, August 04, 2008

Adding ruby thread scheduling to curb


Ruby threads require care when writing an extension - especially when the extension does anything over a network. In the case of curb, it's often used as a replacement to the built in ruby net/http library.



The basic experiment goes, you use a web service like, amazon or google within your web application. Let's say you've just start porting some parts of your rails application to merb, or even just a simple mongrel handler. You're hoping to increase the throughput of this application by taking advantage of the increased concurrency gain by using less locking. This works great testing locally. Only one thing we forgot, the threads in ruby are cooperative. This means, any one thread can block all other threads from running. So, while we spent this extra time and energy to rewrite specific parts of our application to increase its concurrency - we forgot that ruby threads don't always mix well with native extensions. Occasionally, one of these service calls will take a bit longer then normal. When this happens, all our threads are blocked. To make matters worse, we're actually not seeing too much of a throughput improvement after making our changes.



The few tests we did show a speed improvement for a single request, but multiple concurrent requests or throughput has stayed about the same, because the Curl::Easy.perform call is blocking all other ruby threads from running at the same time. Meaning, we mightiest well just leave our calls in the rails controller and move on right?



Well, by now you probably realized I'm not that kind of programmer... Giving up like this isn't why I like to write code... So, doing a little more digging I realized the answer is rb_thread_select.



Using rb_thread_select in place of select and making calls to libcurl's multi interface instead of curl_easy_perform, I've made it possible to run multiple Curl::Easy.perform calls concurrently. You'll still probably see much higher throughput using the Curl::Multi interface, but this should help those using the more direct Curl::Easy interface. I also updated the Curl::Multi interface to use ruby's rb_thread_select, meaning it can be used within multiple ruby threads as well.


Remember this does not mean I made Curl::Easy or Curl::Multi thread safe, but it means multiple instances of those objects can be used within multiple threads, running in parallel.



You can get my curb changes on github.

Tuesday, July 15, 2008

curb to github

I've made a few more improvements to curb and decided keeping track of all these patches is too much. I'm now tracking my changes on github, and hopefully the curb author will reappear to review the changes soon, so we can get them released in gem form.

Thursday, July 10, 2008

New curb patch, on_succes, and on_failure


I added a new callback hook to curb on_success and on_failure. They work as advertised and are esspecially useful when using the Curl::Multi interface from my last patch.



gc = Curl::Easy.new("http://www.google.com/")
gc.on_success{|curl| puts curl.body_str }
yc = Curl::Easy.new("http://www.yahoo.com/")
yc.on_success{|curl| puts curl.body_str }

mc = Curl::Multi.new
mc.add(gc)
mc.add(yc)

mc.perform


What's nice about this, is imagine you have some process running and want to add requests to it. When you add the request you also want to register a callback to handle the success or failure cases. Before this patch you'd have to check the content length and listen to the on_body handler. The problem with that, is the content length header is not always correct. These callbacks are guaranteed, because they happen during the cleanup process for each handle.


As an aside they'll work using the Easy handle diretly.


Here's the patch, enjoy!

Thursday, July 03, 2008

Freezing compiled gems in merb

Vendoring gems definitely makes deployment easier for both rails and merb. For merb, the solution is to create a local gem repository within your project.


gem install -i gems/ --no-ri --no-rdoc #{gem_to_install}


This works great, once you update your merb/config/init.rb to include the new gem repository.


Gem.clear_paths
Gem.path.unshift(Merb.root / "gems")


I also recommend saving space by deleting the gem cache.

rm -rf gems/cache/*


This creates one problem, if the gems you have installed include binaries (e.g. C extensions), then you'll have an issue if you develop and deploy to different architectures. I think the most common would be developing on Mac OSX and deploying to Linux. There are many different ways to solve this problem. One might use gem pristine, or as I do below maintain a list of the gems that need compiling and rebuild them for the deployment target.

GEM_BUILDS=['rbtagger','hpricot','mongrel','fastthread']
GEM_ROOT=File.join(File.dirname(__FILE__),'gems')

desc 'Refresh gem builds for the current system'
task :build_refresh do
gem_dir_list = File.join(GEM_ROOT,'gems')
gems_built = GEM_BUILDS
Dir["#{gem_dir_list}/*"].each do|gem_path|
gems_built.each do|gem|
if gem_path.match(gem)
if File.exist?("#{gem_path}/Rakefile")
puts "building #{gem_path}"
system("cd #{gem_path} && rake compile")
else
base = File.basename(gem_path)
version = base.split('-').last
cmd = "gem install -i gems --no-ri --no-rdoc #{gem} --version #{version}"
puts cmd.inspect
system(cmd)
end
gems_built.reject!{|g| g == gem}
end
end
end
end



Now to run the task:

rake build_refresh


To change the list of compiled gems just modify the GEM_BUILDS constant.

Reading list