YAML.load considered harmful

The title is very much a rip-off from the famous letter by Edsger Dijkstra titled ”Go To Statement Considered Harmful”.

Actually YAML.load is not all that harmful. The problem is that version 1.0.0 of psych (psych is the default engine for parsing and emitting YAML in MRI 1.9.2) is part of MRI 1.9.2 and psych 1.0.0 has a memory leak somewhere in the parser.

What this means is that everytime YAML.load is called, more and more memory is leaked.

Normally this isn’t too big a problem - we don’t load that much YAML and we don’t do it too often. And when we’re talking webhosting where worker processes are shut down regularly, we usually don’t get hit severely enough by the memory leak to notice it.

But what happens when we need to iterate over a lot of ActiveRecord model objects with a serialized attribute (and we need to reference that serialized attribute) say in a Rake task? Then leak suddenly matters.

What can I do to avoid the leak?

Fortunately Aaron Patterson (who I have the utmost respect for!) has released psych as a gem and it’s been patched. But it’s not as simple as just grabbing the gem and require it - trying to require psych will just load the 1.0.0 version.

What is needed is a bit of Rubygems magic (seriously - I have no idea what happens). After installing the gem, this is what it takes to use psych 1.2.0 for YAML parsing:

    require 'rubygems'
    gem 'psych'
    require 'psych'
    
    Psych::VERSION
    => "1.2.0"

And with this, you’re ready to run processes that load a shitload of YAML without worrying it will gobble up the memory (at least because of a memory leak).

Update: I’ve written a bit about how to use the Psych gem in Phusion Passenger and what to do when testing Rails apps.