Tuesday, September 25, 2007

Scalable File Uploads with Merb


For a new application I'm working on we need to upload some large files to our web application. Also because the files are both large and in processing them there are many sql insert and updates that follow the whole process is pretty intensive taking roughly 1 to 2 minutes a file. I've been reading about merb for a few months and off and on created small projects with it hoping to one day put it to use. Tonight I was pleased that it really does make file uploading easy and appears fast. Here's a quick overview of what I did to get it setup.



sudo gem install merb -y

Create the basic merb project



merb -g uploader

Setup the upload controller and view



cd uploader/dist
mkdir app/views/uploader
touch app/controllers/uploader.rb
touch app/views/uploader/index.rhtml

We'll have two actions defined in the Uploader controller:


  • index, to display the form

  • upload, to handle the file upload post


class Uploader < Application
# http://www.cs.tut.fi/~jkorpela/forms/file.html
def index
render
end

def upload
puts params[:file].inspect
FileUtils.mv params[:file][:tempfile].path, MERB_ROOT+"/uploads/#{params[:file][:filename]}"
redirect "/uploader"
end
end

Define the form view



<p>Upload a new file</p>
<form action="/uploader/upload" method="post" enctype="multipart/form-data">
<fieldset>
<input type="file" name="file" size="80"/>
<input type="submit" value="Upload"/>
</fieldset>
</form>

Finally, start your merb application up (from the uploader folder not instead the dist directory).



merb


You might need to disable sql_session in your dist/conf/merb.yml


change:

:sql_session: true

to:

:sql_session: false

The server is running on port 4000 and the request path is /uploader e.g. http://localhost:4000/uploader


Next I'd like to work more tightly integrating this with my rails application. A few things I want to focus on:



  • the traffic is routed in my nginx setup.

  • whether or not the merb process should do the long processing needed after the file is uploaded.

  • how to get the results of the processed xml files back into my database for my rails app to read from.

  • how to indicate to the users that the file has been uploaded but may still be being processed.


Here's a pretty simple Architecture I drew up using Graffle



Merb will handle reading files from the client and writing them to disk. Naming them with an identifiable name and sending the redirect back to our rails application or possibly a status page.


Next I'm thinking another process will monitor the folder merb copies files into. When new files appear pick them up, process them, and parse them back into our sql database. To determine when the filesystem has changed, I'm thinking inotify looks like the best approach and conveniently it already has a ruby binding so I can do my CRUD operations with ActiveRecord. At least this is my initial thinking as to how I can handle file uploads. Any suggestions or feedback would be great!

2 comments:

MyHep said...

Great article. I will be doing something like this soon and this is helped me to get a feet for the overall design.

Anonymous said...

Wondering what are the major benefits of directing file-uploads to merb instead of just Rails? Is Rails significantly slower at handling file uploads (and if so, do you have any idea why?) Or are you just trying to separate out the upload traffic so it doesn't take up mongrels, and wanted to try out merb? Thanks.
-Jon

Reading list