Archive for March, 2010

Weekly Links #1


This is the first post in what might become a weekly installment. Like all IT geeks, I read a serious amount of articles online. A lot is not that impressive, but each week, there’s usually a handful of good ones that I’d like to record for myself as well as share with others. So I’m planning to have a Weekly Links post here. To give it a clear structure and to avoid it becoming tedious, I’m imposing some rules on myself: a maximum of three links a week, with a maximum of three sentences of commentary per link. Enough talk, here we go.

Daniel Tenner: How to nap

Napping is not the same as sleeping, and 20 minutes of napping can do a lot of good without making you feel groggy.

Scott Berkun: The cult of busy

Time is the singular measure of life. The person who gets a job done in one hour will seem less busy than the guy who can only do it in five. Being in demand can have good and bad causes. The phrase “I don’t have time for” should never be said.

yield thought: On The Fear of Reading Code

The best analysis I’ve read so far of why programmers don’t want to read others’ code, butwhy you should anyway because it will make you better and your life easier.

A better less, playing nice with git


On the command line, I used to look at files with less or cat, depending on how long I expected the file to be. Of course I would constantly guess wrong, either blocking my screen with less showing only a handful of lines and making me press “q” needlessly, or having cat’s output scrolling past.

But recently I discovered how to make less do everything I want. -F or --quit-if-one-screen is self-explanatory and avoids needless “q”-presses, and -X or --no-init makes less not clear the screen when invoked. With these two options, less behaves like cat when it makes sense.

I enjoyed my new enhanced less-ing… until I invoked git diff. It looked like this, the line endings seemingly garbled:

ESC[1mdeleted file mode 100644ESC[m
ESC[1mindex bc7f954..0000000ESC[m
ESC[1m--- a/src/test/org/expasy/cvrelational/keywords/KeywordRoundtripTest.javaESC[m
ESC[1m+++ /dev/nullESC[m

Oops! The solution, thanks to those guys, is to add -R or --RAW-CONTROL-CHARS. I had git color its output automatically (git config --global color.ui auto), and less was choking on the control characters git added to its output.

Now, export LESS="-F -X -R" works like a charm.

Allison Randal: Exploring Dynamism


Here are some brief notes on Allison Randal‘s talk Exploring Dynamism, seen at InfoQ.

In this presentation from the JVM Language Summit 2009, Allison Randal discusses what it means for a language to be dynamic, the spectrum between static and dynamic languages, dynamic typing, dynamic dispatch, introspection, dynamic compilation, dynamic loading, and a summary of the main differences between static and dynamic.

Different ways of looking at things can yield very different, complementary insights, for instance regarding light as waves or particles.

Most languages are actually centering on the middle of the range from very static to very dynamic.

CS has been focusing on the static perspective. (But Smalltalk has been extremely dynamic all along.)

Dynamic vs. static typing is really only about when type constraints are checked. Own note: that’s not completely true – e.g. compiler optimization. Also, later she mentions computing dynamic dispatch at compile time which is only possible with a strong (explicit) type systems.

Dynamic dispatch

Introspection is not necessarily dynamic, just more common in such systems. Information comes from different sources: asking the VM/interpreter, compile-time annotations, execute-time annotations (e.g. annotated stacktraces). Meta object models are essentially introspection information for your object systems, provided by making the classes first-class objects.

Dynamic compilation: lots of options such as eval, JIT, file-based etc. She mentions the REPL without naming it, calling it “interactive compilation” and giving Python as example – odd.

Dynamic loading is surprisingly varied, ranging from linking to name binding to mixins, traits and roles (which often mean the same thing).

Her conclusion is that we’ll have both dynamic and static systems for a long time to come and that’s a good thing. In the end it’s about tighter control vs greater abstraction and productivity vs performance.

There’s an interesting-looking paper on the Further Reading slide: “Static typing where possible, dynamic typing when needed: the end of the cold war between programming languages” by E. Meijer and P. Drayton.

Hey, it’s Ada Lovelace Day!

Today I get to kill two birds with one stone. It is Ada Lovelace Day, “an international day of blogging to draw attention to the achievements of women in technology and science. Seeing that I’m just about to publish notes on a talk by Allison, why not write a little about her. And there sure is a lot to write.

I became aware of Allison’s work when I got interested in the Perls, both 5 and 6. When I start something new, I like to dig deeply into the web community around it; it tells you something about the culture surrounding a product. If you do that for Perl, it’s impossible to miss Allison.

The blurb on her website says

Her first geek career was as a research linguist in eastern Africa. But eventually her love of coding seduced her away from natural languages to artificial ones. A C and dynamic language (Perl/Python/Ruby/etc) programmer by trade, Allison is the architect of Parrot, chairman of the Parrot Foundation, on the board of directors of The Perl Foundation, and founder and president of Onyx Neon Press. She also works for O’Reilly Media, planning the program for their Open Source Convention (OSCON).”

Naturally, for Perl people writing this up means repeating the obvious. But given the unfortunate perception of Perl these days, it’s worth showing that cool things are still happening in that community. Like Perl 6 and Parrot. A virtual machine that can run 30+ languages? Check.

Allison is one of the lead developers as well as a manager for both Perl 6 and Parrot. She also wrote the current version of the Artistic License that Perl uses. You can easily find lots of talks she’s been giving over the years at many conferences. So for Ada Lovelace day, thank you Allison for all your hard work for Perl!

Biohackathon 2010


In February I’ve been to the third Biohackathon in Tokyo, sponsored by the Japanese Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC). As I’ve been travelling some more since then, I only got around to writing up my personal summary of the week just now. Here we go.

The Biohackathon is an annual meeting of bioinformatics developers. Toshiaki Katayama of the University of Tokyo, and founder of BioRuby, brought the hackathon idea to Japan, and lead the organization of the hackathon in the most perfect way. From the locations and the hotel, to the network and the catering (and the fact that there was catering!), it was all top notch. Not to mention the generosity of the sponsoring institutions to actually invite us all!

Now, where to start. It was such a packed and amazing week, and I feel very lucky for having gotten the chance to attend. Plus, it was my first trip to Japan, so the country itself was exciting enough! The schedule of the hackathon was simple enough: the first day was a symposium with lots of talks and the chance to learn about the other attendees and their projects. Day two to five were dedicated solely to hacking and discussion as people saw fit. It was my first meeting of that kind, and it was exciting to have that much freedom to turn the week into an interesting and useful time.

Arriving on Sunday morning, we first got our toes wet in Japan by placing an order in a noodle kitchen by randomly picking something on the menu. We wandered around the neighborhood of Tokyo University, or Todai, a charming part of town with small, old houses and narrow lanes I didn’t expect in Tokyo, and ended up in a quite amazing whisky bar and made some new friends. Good start.

The first actual hackathon day took us to the CBRC in Odaiba, a new and all shiny stretch of the city along the bay, dedicated to science and technology. But before enjoying the view from the cafeteria, we settled down to listen to talks and introduce ourselves to each other in the breaks. With about 60-ish attendees, the hackathon had a good size, allowing diversity but staying manageable. The idea of posting a mini-bio for each attendee along the walls was fantastic, as you could stroll around and get a good idea of who was there, and from what backgrounds they came.

A few of the participants presented the projects they’re working on, and they were all very interesting. You can find the list of speakers and their slides on the wiki. My colleague Jerven Bolleman presented our RDF efforts at UniProt. The day ended with a very nice buffet and some more socializing, and left everyone energized and motivated for a week of hacking.

The rest of the week took place at DBCLS on Todai campus, where people could form groups to their liking and pick among several rooms for quiet hacking. Inspired by the BioRuby and BioPython folks that were present, I started exploring the RDF support in Perl. We do all our RDF work in Java, as do most Semantic Web people, but I feel that puts off many people. Perl hits a sweet spot with its conciseness and pragmatism, and its position in bioinformatics is traditionally strong. I believe that good Perl support would be a major step forward to making biologists and bioinformaticiens warm up to RDF & co – I wrote a somewhat passionate mail about this on the hackathon mailing list recently, that I will post here, too. Anyway, so there are quite a few RDF-related modules on CPAN, most of them gathered at [], and I set out to try and compare them, and write some example code, possibly something to explore the UniProt RDF. While I didn’t get that far due to participating in lots of other discussions, it was very interesting to try this out, and I put a State of RDF in Perl page on the wiki and some example code on github. I also exchanged a lot of mails with Greg Williams of RDF::Trine, which was great. I’ll blog about this subject later.

While there were many different groups hacking away, on text mining and RDF generation and all kinds of things, one subject struck me as the subject of this Biohackathon: URIs. How to publish one’s own data with stable, sensible, and dereferenceable URIs, and what to use in your RDF when linking to others who don’t have such nice URIs? This question was discussed many times during the whole week.

Francois Belleau of bio2rdf led many of the discussions (thanks!), which focused mostly on central naming schemes/services for URIs. There seems to be a conflict between keeping content dereferencable and keeping URLs very stable for use as resource identifiers. For the latter goal you don’t need URLs, any string will do as long as it’s unique and stable. So this goal would benefit from a central registry like, as advocated by Francois,, because it would provide a predictable way of naming things uniquely. But it adds a single point of failure to the dereferencing of content. Andrea Splendiani remarked that he never followed a single URL from RDF anyway, while I argued that linking content is the point of the web and keeps the Semantic Web hackable – that will have to be yet another future blog post, I guess! Using providers’ actual URLs is often crappy because they don’t provide a predictable scheme (a=x&b=y vs. b=y&a=x), and you only get HTML anyway.

Opinions differed, and they still do. We arrived at an agreement on “Polite URIs” towards the end, but the discussion has been re-started on the mailing list.

And we haven’t even mentioned the dismal state of versioned URIs, (like UniProt’s non-existing ones…), which I also discussed with Andrea. He proposed including the entry version into the URI. Whole releases could be done via named graphs, although that sounds complicated. I was concerned about people who don’t care and just want to say “this protein” – for them (i.e., their reasoners), uniprot/P12345/v1 is not the same as uniprot/P12345/v2, but it should be. This seems impossible to resolve, it’s one or the other. Uh, ideas anyone?

I guess you got the idea by now – there was so much more happening this week that I can’t summarize it all. Fortunately, others also wrote about it. Brad Chapman wrote about his SPARQL and Python hacking, and the #biohackathon2010 Twitter tag has lots of interesting tidbits.

Let’s end with paraphrasing Toshiaki’s closing notes: a “clique of the world-top-level developers in bioinformatics” met, some great coding and discussion took place, and now that data providers understand the Semantic Web a lot better, services will come.

Thanks to all organizers, the people at DBCLS and CBRC who made this possible, to the participants who brought so much enthusiasm and knowledge to the event, and to Toshiaki in particular for tirelessly working throughout the week to keep everything running smoothly. And for taking us out for great dinners and giving us a tour of the Human Genome Center super computer in the week after the hackathon!


Parameter hash patterns in Perl 5


It’s a common pattern in Perl 5 to use a hash for a subroutine’s arguments, or some of them. Damian Conway explains this pattern in his excellent Perl Best Practices. I’ll first briefly recap the standard forms, then show how you can support both standard arguments and a hash for extra arguments.

The basic form looks like this:

pad({ text=>$line, cols=>20 }) 

You can actually leave out the curly hash-braces and just pass a list of key-value pairs:

pad( text=>$line, cols=>20 ) 

That’s what you often see in practice, but Conway argues against doing that. It allows mismatches such as passing cols=>20..21 (two values on the right hand side) to pass compilation.

Most of the time that won’t be a problem in practice, as the values of the pairs will be simple enough. But it’s better to do things in a uniform way that works in all situations, and the sub’s implementation depends on the way of passing the hash.

When passing an explicit hash enclosed in {}, you get it as a reference:

my ($hashref) = @_;
my $foo = $hash->{foo};

Using raw key-value pairs, you directly get a hash:

my %hash = @_;
my $foo = $hash{foo};

Obviously, the latter form does not allow to pass any arguments other than the hash. One more argument against doing that. I often write subs that take the necessary arguments directly, and optional ones, or “configuration” parameters, in a hash that may or may not be passed:

$uniprot->retrieve(@ids, {format=>'rdf', include=>1}): 

You can implement once and re-use a routine, say _get_args_and_conf, that handles this distinction between arguments and configuration so that your subs don’t have to. It looks at the arguments, checks if the last one is a hash, and if that’s the case, merges it with the default configuration and returns the arguments and the configuration separately. You would use it like that in your code:

    format => 'fasta',
    debug => 0 );

sub retrieve {
    my ($ids_ref, $conf_ref) =
        _get_args_and_conf(\%RETRIEVE_DEFAULTS, @_);
    # $ids_ref now contains the arguments, here some ids to
    # retrieve from, and $conf_ref contains the
    # configuration hash with the user's values if given, and the
    # default ones otherwise.

My implementation looks like that. The meat of the routine, the hash handling, is straight from Conway’s Best Practices.

sub _get_args_and_conf {
    my $default_conf_ref = shift;
    my @args = @_;
    croak "I need at least one argument!" if @_ < 1;

    # if last arg is a hash, it's additional configuration
    my %defaults = %{$default_conf_ref};
    my %conf = ref $args[-1] eq 'HASH' ?
        (%defaults, %{pop @args}) : %defaults;
    if (@args < 1) {
        croak "I need at least one argument in addition to the hash!";

    # TODO Deal with the case that the argument list is given as a
    # reference.

    return (\@args, \%conf);