Examining invalid UTF

Recently we had this problem at work:
java.io.CharConversionException: Invalid UTF-8 start byte 0xb4 (at char #664428955, byte #664427999)
No one was exactly sure about the best way to debug this, so I set out to hack something together.

Here’s a Java program that reads bytes from a UTF-8 file and tries to interpret them as a UTF-8 string. If it succeeds, the string is printed, else if there’s an exception, the stack trace is printed. Usage:

$ java -cp "." ByteToUtfReader file.rdf 679609087 1000

reads byte 679609087 and the 1000 bytes preceding it from file.rdf.

Here’s the code in case it helps someone. Warning: quick and dirty!
Continue reading “Examining invalid UTF”

How to quickly print certain lines of a huge file by number

If you have to work with huge text files, you know that most editors as well as less are very slow at jumping to a certain line in the latter part of a large file. Turns out that GNU sed is very good at it, although it will of course not replace an editor. For most people at least, those who write Tetris in sed will probably be fine.

$ wc -l uniprot_sprot.dat
31510440 uniprot_sprot.dat

$ time sed -n '30000000p' uniprot_sprot.dat
real    0m8.982s

$ time sed -n '29999900,30000000p' uniprot_sprot.dat
real    0m9.242s

Wicket CheapoModel

Sometimes you just want a very simple Model in Wicket, where you can put stuff in and get it out later via a simple property expression or a method call. For a sign-in form, for instance, you’d have two text fields for user name and password, and in onSubmit() you’d get their values and authenticate with them.

In Jonathan Locke‘s simple sign-in example of the Wicket 1.3 examples, he creates a ValueMap and uses that in a PropertyModel with the simple “username” and “password” expressions:

// El-cheapo model for form
private final ValueMap properties = new ValueMap();

// Let Components set values via a PropertyModel.
add(new TextField("username", new PropertyModel(properties, "username")));

// Get those values, e.g. in onSubmit().

That works fine and is simple enough. It does require a bit of boilerplate code, though, and repetition of the property expression Strings in the code. Factoring them out into constants would work, but still be ugly.

The simple CheapoModel class below hides these internals and makes the code shorter and easier to read. It also provides a getter for the property expression itself, as it’s convenient and improves readability to use it as component id: one literal less. Use CheapoModel like this:

private final CheapoModel uri = new CheapoModel("uri");

// Use its PropertyModel and the property expression when building components.
add(new TextField(uri.id(), uri.model());

// Retrieve the value after the user has interacted with the component.
final String newUri = uri.value();

You can also pass in the ValueMap if you want to share one.

Here it is:
Continue reading “Wicket CheapoModel”