M99 - A novel high throughput entropy coder that works well on BWT data

September 9, 2017

I was describing my old M99 algorithm to someone via email recently and decided to put together a working demo by a complete rewrite (the original code is lost on some broken drive in a shoebox somewhere and it's also over a decade old so it was time to re-write and document publicly - finally). The souce code is here on github.

M99 might be almost two decades old now but it's still very fast and efficient compared to more modern works. Plus it's a worthwhile alternative to Huffman in a lot of cases and achieves way better compression. I still believe that a specialized version designed for 8x8 DCT blocks would be very successful both in terms of encoding/decoding speed as well as overall compression. But that's a story for another day. Now I can get back to MSufsort 4 and also a clean re-write of M03 - if only there were enough time!

MSufSort 4 - update (fully parallel suffix array construction)

November 16, 2016

It took a lot longer than I wanted, although not long than I had anticipated, to produce a version of MSufSort which was fully parallel. There is still work to be done to make it worthy of 'Return of The King' status but I'm certainly happy that a lot of the road map has been achieved. There is a single component of MSufSort 4 remaining before the algorithm can clearly differentiate itself as best in class again. Specifically, induction sorting. This is a bit odd, really, considering that induced sorting is the single characteristic trait that every version of MSufSort has in common as a core algorithm. At it's heart the entire MSufSort family is a direct comparison, induced sort SACA. So, it is unexpected that it should be the only component of MSufSort 4 which remains on the road map.

The theoretical work has been done for a long time - how to do parallel induced sorting. But there are many variations to try out in practice and I'm not sure that I want to invest my time in that work at this point. I'm much more enthusiastic about Glimpse to be honest. It's not easy to decide that my personal project of a decade and a half is no longer a priority, but it just isn't. I guess it is true that with age comes wisdom. But I've come to realize that Glimpse has far more potential in the long term. It will become a product and not a project.

Anyhow, MSufSort 4/demo is posted on https://github.com/michaelmaniscalco/msufsort.

MSufSort 4 - multi-threaded performance analysis

November 14, 2016

The chart below graphs the performance of MSufSort 4 on enwik8 using an increasing number of CPUs. The test machine has 4 physical cores. Using virtual cores (hyper threading) has very little effect on performance and so are not presented in the graph. The red line line represents the 'theoretical' best performance assuming that increasing the number of cores from 1 to N would result in an overall decrease in time required to compute the suffix array by a factor of N. The blue line represents the 'actual' performance of MSufSort 4. The chart demonstrates that MSufSort scales almost exactly at the theoretical best as the number of cores increases. I'm reasonably pleased with this performance.

MSufSort 4

June 2, 2015

I've had MSufSort 4 algorithm improvements fleshed out for years now but
simply have not had any free time to work on the implementation in any way.
I was able to put together a prototype several years back which
demonstrated that it
clearly outperformed existing solutions.

Road map: *(* indicates new for version 4)*

- Three pivot multikey quicksort
- Insertion sort specialized for strings *
- Fully parallel quicksort/insertion sort
- Cache friendly improved two stage
- Tandem repeat sort
- Opportunistic induction sort *
- Fully parallel second stage for improved two stage *

Site Redo

June 1, 2015

Finally getting the site and its content back in order. Project work
will resume on MSufSort 4 as well as M03 with context skipping. I'll be
starting with MSufSort 4 and the content will be placed on github. Since I do not
have a lot of time available I'll be adding to the project feature by
feature until the project is complete. Updates will be documented here.