A BWT compressor from 1999 which demonstrated high compression and speed
using a unique entropy encoding algorithm for direct encoding of the BWT.
The first and only context aware BWT compressor. Acheives very high
compression. It actually took until 2009 to implement and is a beta that
I wrote before the birth of my first daughter. (I knew I would not have
any time for development after that). I plan to reimplement M03 in the
future to include a technique I call 'context skipping' which, from early
experiments, doubled the speed of context modeling. It's still unclear
how much this will improve compression though. Only time will tell.
The last stable MSufSort release, dating back to 2007, introduced several
new concepts such as the tandem repeat sort, cache friendly
second stage ITS, and BWT directly from first stage ITS. These ideas have
subsequently been adopted by other top suffix array construction algorithms.
Alpha demo of MSufSort 4 - fully parallel suffix array construction algorithm
A special corpus that I put together as a robustness corpus for suffix array
construction algorithms. The collection is designed to stress SACAs with difficult
edge cases and inputs which are known to be problematic for various approaches to
SACA. This is an open corpus that anyone can contribute to if they can
demonstrate that their contribution is be problematic for one or more modern SACA.