Do What I Mean: Moose Types and Type Coercions

API's should be simple. I hate it when a module that solves a non-trivial problem requires the user to make non-trivial decisions about every single detail in the domain problem.

In my opinion, a class should be smart enough to make all the reasonable assumptions so as to require the least possible amount of input from the user.

This might seem a little dangerous, and can certainly be if the approach is taken too far (IO::All is for some people an example of too a DWIMmy API), but there's a healthy middle-point in which the user is not only able to rely on the module to solve the problem at hand, but is also spared of most of the cognitive load that solving that problem requires.

It is very probable that someone looking for an already-cooked solution in the CPAN is not only not willing to code it up for himself, but also doesn't know enough about the problem to do so (at least not initially). This person is going to rely on the module's wisdom, and the least that it's asked of him, the better.

One of the modules that I'm working on deals with protein sequence optimization using genetic algorithms. The user has a collection of protein sequences phylogenetically related, and wants to produce an optimized sequence for a custom trait (solubility, hydrophobicity, digestibility, etc) that still belongs to the original protein family.

Under the hood, the algorithm that I implemented requires a multiple protein alignment as input, or profile. Naturally, the methods that do the heavy-lifting expect a Bio::SimpleAlign object. But the complication is that protein alignments can come in lots of different formats, many of which are also shared with plain protein, RNA and DNA file formats. Also, the user actually shouldn't be aware that the module requires a protein alignment. Of course it should be allowed to provide one, but if the only thing he has is a bunch of sequences in a flat file, it shouldn't be bothered with opening (how?), parsing (what format? What is its specification?) and aligning (with what algorithm? Gap penalty who?) them to cater to my particular implementation. All he should need to give is a simple filename as a string.

So to maximize for user convenience, I decided that the module should accept either of:

sequence files of as many formats known,
alignment files of as many formats possible,
collection of sequence objects (subclasses of Bio::Seq, also Bio::SeqIO objects), or
alignment objects (Bio::SimpleAlign or Bio::AlignIO objects).

In the case that there is ambiguity about whether the user supplied an alignment file or a sequence file (eg., fasta format is both an alignment and a sequence format), I'll make an educated guess and assume that it's an unaligned sequence. In the worst case scenario, It'll just realign an alignment. There is also an extra layer of guessing involved in determining what the format actually is in case that the file has an unknown extension or no extension at all (this is done by Bio::Tools::GuessSeqFormat).

All of this adds to the simplicity of the API in detriment of the simplicity of the underlying code. Luckily, Moose has the tools to make this as straightforward and clean as possible, using Types and Type coercions. The coercion map looks like this:

And the code that implements this is simply the following:

What's better is that all these types and type coercions are defined separately in a type library that uses MooseX::Types. They deal with both the input sanity checking and the type coercions. This way, not only this complexity is hidden from the user, it's also hidden from the main application. This is really helpful, since now most of the code in the module's main file describes the class behavior and it's not coupled with nor hidden by the juggling of all the possible user input types and input validation code.

Now future users of this module (most probably only myself) won't have to check the API's documentation that often; whatever representation of a collection of protein sequences they might have will serve as a valid input. I believe this to be a nice design choice.