Juggling Bits

Ack is awesome

2015-09-092020-06-14 thomas11

The grep replacement ack is an invaluable part of my toolbox. I’ll explain why and show some examples and some tips.

GMail mailing list fail

2010-12-102011-02-17 thomas11

Google has the image of a hacker-friendly company, a company that despite its growth still (occasionally) listens to its advanced users. Google is also an active Open Source contributor.

Why oh why, then, does GMail still not have a “reply to list” functionality? It’s been requested many times, and if it was offered as a Google Mail Labs feature, it wouldn’t clutter the interface for non-technical people. How do the Google engineers themselves participate in mailing lists, by cutting and pasting between the Cc: and To: fields?

Seriously, it’s about time.

Rant off.

Java anonymous classes are too verbose

2010-11-112010-11-11 thomas11

Java doesn’t have first-class functions or closures, but you can emulate some of that with anonymous classes. Alas, they are just too cumbersome and verbose—it’s not elegant anymore if you need more LoC than with the iterative loop.

Recently at work, I wanted to execute some code for each member of a List, and I needed to know which iteration step I was at. A straightforward solution is, obviously, the classic for loop:


for (int i = 0; i < keyword.getSynonyms(); i++) {
    String synonym = keyword.getSynonyms().get(i);
    // do something
}

That’s fine in many cases, but it has two problems. The extra line to get at the List element is annoying. More importantly, depending on the List implementation the get(i) operation might be in O(n), requiring another scan of the list each time.

So we could iterate normally and count ourselves:


int order = 0;
for (String synonym : keyword.getSynonyms())
{
    // do something
    order++;
}

It’s better, but I needed several such loops, and I wondered if I couldn’t write all that plumbing just once. I came up with this:


public abstract class Counting<T, E extends Throwable>
{
    public void loop(Iterable<T> things) throws E
    {
        int step = 0;
        for (T t : things)
        {
            iteration(t, step);
            step++;
        }
    }

    public abstract void iteration(T thing, int step) throws E;
}

As the loop body can throw exceptions, and we might want to declare the specific kind of exception, this needs to be an additional generic type. This breaks if you need more than one exception type.

The above loop then becomes


new Counting<String, SQLException>()
{
    @Override public void iteration(String synonym, int step)
        throws SQLException
    {
        // do something
    }
}.loop( keyword.getSynonyms() );

Hmmm. Even after writing an abstract class to extract the repeated parts, and not counting my preference for opening-brace-on-new-line, I still haven’t saved a single line. Can I have map and first-class functions, please? Time for Scala?

RDF or not in Gen2Phen – 6th Assembly Meeting

2010-10-04 thomas11

This is a personal account and not necessarily my employer‘s view.

Until two weeks ago, I had never heard of Gen2Phen. Then my colleague Livia asked me to join her to go to their 6th general assembly meeting and present something about UniProt in RDF.

Gen2Phen is a big consortium, including SIB, working on genotype-to-phenotype information. They have two years to go in their grant, and are thinking about adopting SemWeb technologies to enhance data exchange and integration, data interpretation, and to impress funding agencies. Therefore, they invited someone—me in the end— from SIB to speak about our experiences.

My presentation consisted of two parts, an introduction to RDF and why we provide it, and a tour of UniProt‘s RDF. I aimed for 15 minutes, and got only five to present it due to the packed schedule. So I explained the very gist of “why RDF”, showed some examples, and talked about the problems we are encountering.

The problems got, predictably, most attention. Semantic Web “believers” spreading the vision are plenty. Hands-on experiences with complex data sets such as UniProt’s are rarer. I need to write about this in depth at some point. Suffice it to say, I think I dampened some enthusiasm. This despite the fact that I repeatedly stressed that I think of RDF and related technologies as valuable building blocks in the bigger picture, and as clear steps forward on some problems. But the Semantic Web seems to be an all-or-nothing affair for most people.

Tony Brooks is right in saying that given there are only two years left to go for Gen2Phen, it might be late to start with SemWeb technology. A large modeling effort and uncertain scalability challenges could delay the benefits until it’s too late. On the other hand, it’s not that much work to start experimenting. Install Virtuoso and D2R, fire up Protege, write some RDF using Jena, and get a feeling for the whole thing. Design some RDF schema that expresses the basics of the information at the heart of Gen2Phen, and see if existing systems can add it as in- and output format. That would be my recommendation, which I might or might not have gotten across — it was a packed event about an unfamiliar project where the SemWeb was only one of many sessions, so communication was somewhat difficult.

The meeting as such was very nice. Good conversations and awesome food — La Maison de la Lozere in Montpellier was brilliant. So was the city itself; I enjoyed wandering around the beautiful old town.

One other presentation I found interesting was Gudmundur Thorisson’s about ORCID. This initiative aims to unambiguously researchers with an ID instead of their name, which might occur many times. ORCID will then map an article’s DOI to the IDs of the authors, when it’s submitted. Also, and perhaps even more important, ORCID aims to do the same for data sets. Science really needs more, larger, better data sets in the open for people to analyze and train their algorithms on, but currently there is very little benefit for researchers to publish them. ORCID is not really functioning yet, but is backed by more than 120 organizations, and so has a decent chance at becoming the de facto norm in academia.

FrOSCamp 2010 Zuerich

2010-10-01 thomas11

So, another one of those belated meeting/event reports: on 2010-09-17, I was in Zurich for the first-ever FrOSCamp. It was an Open Source/Free Software event with an exhibition floor, talks, and “a fancy party with creative commons licensed beer and music”—what’s not to like!

I presented my “Praktisches RDF in Perl” talk that I recycled from the German Perl Workshop, to spread the word some more. This time, I had prepared an English version, but as I only had German speakers in the audience, I presented in German.

Unfortunately my presentation only drew a handful of people this time. Note to self: work on the abstract some more. I had suspected that my FrOSCamp one was wordy and not catchy, but didn’t get around to rewriting it. At least the audience were pretty engaged and asked lots of questions, which I prefer to a larger crowd that’s half asleep.

The presentation was recorded and is now online as slides+audio. This was a first for me. I could forget about it while presenting, but I was pretty nervous listening to it for the first time, not sure what mess of incoherent rambling and half-finished sentences to expect. Fortunately, I found it ok in the end. Of course, I found several things to improve, but I guess that’s expected for someone who doesn’t present often and is just getting started. My list of the main points to improve is:

The introduction should be much shorter and more focussed. A bit like a sales pitch, not as in being obnoxious and fake, but as in focussed on getting the audience’s attention and appreciation for the topic.
Too many sentences didn’t flow properly. Simply doing one or two more dry runs should fix that.
Have some more visualizations such as diagrams on the slides.

On the other hand, I was pleased with a few things about my presentation: the style of having little text on the slides and more verbal explanation worked well, the code samples seemed to be the right size to digest during a talk, and the questions at the end showed that people had gotten the key points.

Before my presentation, I got to see Renee Baecker‘s talk about Perl::Critic. I’m using it on my code and thus knew the basics, but I appreciated the advanced example towards the end, where Renee walked us through writing our own critic rules. This works via PPI, so you can find patterns in the AST that match the constructs you want to check. I also found it interesting to hear Renee’s personal experience with the severity levels: he’s typically on 3, sometimes 2, but 1 is too harsh.

Other than that, I was mainly hanging out at the Perl booth, a first for me! The booth was staffed by Renee and Roman from Winterthur (CH), two really nice guys whom I had a great time with, discussing everything from Perl modules to freelancing.

BTW, remember the blurb from the FrOSCamp website I quoted at the top about creative commons licensed beer? That wasn’t a joke. FreeBeer is an organic beer, produced by an independent brewery near Zurich, and the recipe is online under a CC license. And it tastes great! A cloudy, full blonde just how I like it :-)