Home Resume Blog/Rants Downloads C++ Papers GitHub LinkedIn
Michael-Maniscalco.com
Resume
email: michael@michael-maniscalco.com

I am an expert C++ programmer with extensive experience in software architecture, algorithms, generic programming and the STL, object oriented design, code optimization, template meta programming, and ultra-low latency software development. My primary focus is algorithm development with a particular emphasis on data compression, string manipulation, sorting and searching algorithms.

I pride myself on authoring clean, scalable, maintainable, easily testable and defect free software using the most appropriate modern C++ design patterns, principles and best practices. I hold my work to the very highest standards and accept nothing other than the most efficient, innovative and elegant solutions that I can achieve. I am an ardent believer that quality is just as important as deadlines. Just as quality is meaningless when it can not meet a deadline, so too is meeting a deadline meaningless if it is not done with quality.

I am the inventor of several novel generic compression algorithms including M99 and M03 (a state of the art "context aware" Burrows/Wheeler compression algorithm) as well as MSufSort (a state of the art suffix array construction algorithm). I have also invented algorithms for massive scale dynamic delta compression as well as neural network algorithms for use in HTTP prefetch predictions. Over the years I have had the pleasure of co-authoring and advising on numerous papers through international collaborations with some exceptionally smart individuals. See "Publications and Citations" section below for details.

Lime Brokerage
Director of Engineering / Trade and Execution Services Division: Nov 2017 - present

  • Primary Roles/Responsibilities:
  • Director for C++ engineering division - Market Data and Trading Server products

Lime Brokerage
Principal Software Engineer and Lead Architect: July 2015 - present

  • Primary Roles/Responsibilities:
  • Lead architect, lead developer and mentor. Designed and implemented the company's next generation ultra-low latency market data product, "Citrius Direct". This project represented an entirely new code base designed specifically for this product (and subsequently migrated into existing products as needed). I worked almost exclusively on this project for about two years and was responsible for all architecture, design, documentation, and pretty much every line of code in the code base. The resulting work is an ultra low latency market data feed normalization product with an average market message latency of 750ns (from NIC to client API interface on fairly typical hardware). This projected included:

  • A complete networking library including ef_vi kernel bypass:
  • Developed a highly scalable networking library for easily managing large numbers of tcp/udp/multicast/custom ipc (shared memory) sockets with a configurable number of service threads (two, in practice) managing both kernel and kernel bypass (ef_vi).
  • Custom memory allocators:
  • Developed a memory management library exploiting compile time information in combination with lock free programming and thread local storage to produce an incredibly efficient and scalable memory allocator.
  • Exchange market feed protocol parsers:
  • Developed a technique for parsing and marshalling the various financial exchanges' market feed protocols with ultra-low latency (5-10ns / message).
  • Super simple, compile time messaging transport:
  • Invented an incredibly easy to use messaging transport system for inter-process/inter-machine communication using compile time hashing to mark multi-part/multi-protocol messaging between ultra-low latency systems across a single transport stream.
  • Non locking data structures and novel custom shared locking techniques:
  • Developed a shared locking technique which exploits single thread per core architectures to heavily favor reader locks providing 10ns/lock performance for typically read-only containers. Authored a large library of lockless container classes.
  • Shared memory sockets for inter-process communications:
  • IPC sockets were way too slow for sub-micro second message processing across processes so I wrote a complete solution for inter-process communication which emulates UDP sockets using shared memory.
  • Ultra-low latency logging and software profiling library:
  • Employed my Glimpse library to profile and tune the entire project. Without the Glimpse tool there is no way to test/measure/tune/iterate with such precision. Using Glimpse I was able to make tiny changes in the code and quickly measure nanosecond level changes in performance over hundreds of millions of samples with ease allowing me to tune the product in a real world environment and with total confidence.
  • Extensive use of template meta-programming (when needed):
  • Made extensive use of TMP where needed to gain the efficiency needed to achieve sub-micro second/message performance. Such techniques were used in memory allocation/deallocation, message transports, protocol parsing, inter-machine endian ordering management, etc. In every such instance, great care was taken to ensure that the code complexities introduced by TMP were contained behind an easy to use, easy to maintain, facade.

Viasat - Acceleration and Research Technologies Division
[Formerly "Intelligent Compression Technologies"]
Senior Software Engineer: January 2002 - July 2015

  • Primary Roles/Responsibilities:
  • Senior Engineer responsible for algorithm development including all compression related software, various proprietary hashes, string pattern matching algorithms, language parsers, etc. I was generally responsible for any aspect of the division's products which have either high demands in time or space and for any features which require specific coding optimizations to meet such demands. I was also generally involved in most architectural issues. I was with this company (and the original start up) for well over a decade and was responsible for a lot of features and design decisions during that time.
  • Compression Suite:
  • Authored the company's compression library including custom implementations of LZ77, LZP, PPMd/PPMII, Huffman, massive scale delta compression, block delta and predictive block delta algorithms.
  • Delta Compression Algorithm and Patent:
  • Invented and authored the company's massive scale delta compression algorithm. A system of identifying similar (not necessarily duplicate) data from a massive repository with high speed and applying it as reference data to achieve dynamic delta compression. This compression engine serves as a core piece of the company's WAN accelerator. This algorithm was granted patent #US8010705B1 "Methods and systems for utilizing delta coding in acceleration proxy servers." The algorithm is the underlying delta compression engine used in Cisco's software solution for WAN acceleration.
  • "Associative" HTTP Prefetching:
  • Invented a neural networking approach for predictive HTTP object prefetching for use in HTTP acceleration over high latency networks. This work is capable of identifying previously recorded HTTP experiences from a massive database which might contain similarities to the current experience and then blending these data to produce predictions of future HTTP requests and responses.
  • Video Predictive Block Compression Algorithm:
  • Authored the company's 'predictive block delta' compression algorithm which is currently used for video over HTTP acceleration with high latency connections. This project involved designing a storage solution for use with flash RAM which evenly distributes writes across the physical medium to maximize the storage medium's useful lifespan, writing code to provide C++11 style threading, async, futures etc. using only C++03, inventing algorithms to identify HTTP video streams in real time from any starting position, locating similar content in a local cache and maintaining HD video download rates using < 80KB of RAM to achieve near 100% compression rates over high latency networks.
  • HTTP Protocol Modeling and Compression:
  • Wrote the HTTP protocol parsing/modeling code which is at the core of the company's "Exede" HTTP web accelerator as well as in the Australian Government's NBN project for satellite internet. This project involved authoring an entire library for modeling streaming HTTP data such that the HTTP traffic could be altered directly by adjusting the model (add/remove headers, change encoding type, change content, etc) and converting the model back into valid HTTP traffic in a streaming fashion.
  • HTTP Prefetch and web acceleration:
  • Wrote the HTTP prefetching 'scanner' which responsible for scanning HTTP response data and prefetching of HTTP objects to achieve accelerated web browsing over high latency satellite networks.
  • Threading Library:
  • Authored a threading library and code to provide C++11 style futures/promises/async available using only C++03. The library includes additional features such as thread pools, timer queue and timers to provide time delayed async functionality.
  • Leak Tracker:
  • High performance, real time C++ allocation tracking software. Tracks all C++ allocations by class, line, file, size, clock time etc. Used for tracking memory leaks and for memory usage analysis. [screen shots]
  • Lock Tracker:
  • Real Time, mutex/lock tracking software. Tracks C++ lock acquisition and gathers statistics for unique call stacks, average, minimum, maximum lock times etc. Used primarily for dead lock analysis as well as application wide performance analysis.
  • Outlook Acceleration:
  • Studied and reverse engineered Microsoft's Outlook/MAPI protocol (long before they released the specification) and built the company's Outlook accelerator for WANs.
  • Miscellaneous:
  • Over the decade plus at ICT I had developed all kinds of algorithms and helpful classes including shared/timed mutexes, compile time endian wrapper classes, ported Windows events and WaitForSingleObject/MultipleObjects to Linux, atomics, high speed pattern matching using hidden markov models, efficient rolling hash algorithms etc. Some of this functionality is now available through third party code such as boost or by using C++11 stl. However, most of my contributions predated their current availability by many years.

Lexon Technologies
[Formerly "Chicago Map Company"]
Software Engineer: 1999 - 2001

  • Primary Roles/Responsibilities:
  • Developed a library of dynamic and reusable DHTML/JavaScript GUI controls for browsers, back end server development, ISAPIs etc

Algorithms, Research & Inventions
Contributions to computer science, open source projects, independent research, compression and sorting algorithms

  • M99 - High performance BWT compressor:
  • [github]
    I am the inventor of the M99 entropy encoding algorithm. Originally developed in 1999 as an entropy coder for the Burrows/Wheeler transform, this algorithm is a wavelet based entropy encoder specialized for encoding data which contains locally skewed symbol probabilities. It is a very simple, extremely fast, low memory encoding scheme which is highly effective on the right types of data (such as BWT data).
  • MSufSort - State of the art suffix array construction:
  • [github]
    MSufSort represents a large amount of my non-professional programming time. Over the years I have invented many specialized algorithms which have preserved MSufSort as the state of the art. When it was first introduced, MSufSort was 2-3x faster than the previous state of the art. The algorithm is described in the paper "An Efficient, Versatile Approach to Suffix Sorting", ACM Journal of Experimental Algorithmics Volume 12, Article 1.2 as well as in the paper Faster Lightweight Suffix Array Construction and is cited in numerous academic papers and journals . The most recent version of MSufSort (v4 alpha) achieves highly parallel suffix array construction which easily outperforms any existing suffix array construction solution by great margins. The work, however, remains incomplete and in alpha state.
  • Glimpse:
  • [github - private/on request]
    Ultra high performance object oriented application instrumentation and graphical analysis tools. This software can instrument C++ applications with incredibly low overhead (8ns-20ns per sample) and provide unbelievably rich, streaming, object oriented, real time instrumentation data which can be sampled, visualized, and mined by powerful visualization tools. Glimpse is, without a doubt, the future of logging, software instrumentation and distributed systems monitoring. I can foresee a time when enterprise software will not be considered shippable if it is not properly outfitted with Glimpse instrumentation.
  • M03 - Context aware BWT compression:
  • [github - proof of concept/alpha]
    I am the inventor of the M03 context based BWT compression algorithm. M03 is a progressive encoding scheme that achieves the highest compression of any generic Burrows/Wheeler based compressor. This is the only algorithm to date which can encode the Burrows Wheeler Transform with respect to the contexts contained in the original pre-transformed data. It is a fast, low memory compressor and has appeared in the paper "Post BWT Stages of the Burrows/Wheeler Compression Algorithm" by Dr Jeurgen Abel.
  • RLE-EXP - Exponential run length encoding:
  • Inventor of the RLE-EXP (exponential run length encoding algorithm). Since its first appearance in 2001 this simple enhancement on basic run length encoding has become a de facto standard encoding stage for many modern BWT compressors. RLE-EXP also appears in Dr Jeurgen Abel's paper "Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages"

Publications & Citations
Various collaborations, publications and citations of my work

Patents

  • Methods and systems for utilizing delta coding in acceleration proxy servers:
  • Patent #US8010705B1. This patent covers the delta compression algorithms used in the Viasat WAN accelerator product. The basic algorithm is capable of identifying sources which are similar (not necessarily identical) with exceptionally high speed and accuracy. Similar sources are then used as reference dictionaries to achieve extremely high compression ratios.
  • Selective prefetch scanning:
  • Patent #US9407717B1. This patent covers a method for scanning HTML and similar response data and predicting subsequent HTTP request produced by the browser which renders the data. These predictions are used to "pre-fetch" these HTTP requests and then position the response data closer to the requester in order to reduce page load times over high latency networks.

External links: