Column-Orientation, Italian Style
I spent a few moments on a flight back from Buenos Aires this past week, reading the comments of SAP’s co-founder Hisso Plattner, regarding the bright future for column-oriented databases, and can only conclude that we’re of like minds. [Now if I could achieve being of “like portfolios” I’d be truly set, but that’ll have to wait.]
Mr. Plattner, remarking on the subject of column-orientation and the future of databases at the recent SAPPHIRE conference, likening column databases to spaghetti in the package. I couldn’t agree more. In fact, have a look at several of his slides.
Look familiar? Here’s a diagram from a Sybase slide describing column-oriented architecture that we have been using to describe Sybase IQ for years.
So we’re on nearly the same page. Mr. Plattner makes mention of advantages:
• No redundant data, so less data administration
• No redundant software codes, so easier upgrades
• Data feeds directly into algorithms
• Greater flexibility
• Easy to add new fields in the customer database
What Mr. Plattner didn’t tell us is how his vision of a column-oriented database is really different from what we see in the market today, or frankly, from the original innovations dating to VSAM on IBM mainframes where indexes could be created to contain data, permitting deletion of the corresponding row-store. Sound familiar?
We believe there’s a good deal more to it, though. Column orientation itself, while a powerful platform, is merely an enabler for a series of subsequent innovations in the layers above. It is the combination of column-orientation and the series of innovations it enables that turns the trick. Once you use columns as the organizational principle, you can:
• Richly index all columns, yielding fast access to any data in the column
• Implement data-specific index types to speed operations whether low or high cardinality, Booleans or blobs
• Build in a cost-based query optimizer that exploits index metadata to speed queries
• Allow DBAs to create multiple indexes on a column, to provide even more metadata, and often satisfying some query operations directly from the metadata
• Build a query engine that conducts operations directly upon the bitmaped indexes, rather than the data itself, to save space and time
• Escape the modest page sizes needed by row databases, using instead large pages so as to capitalize on large-block transfer rates of even the lowest-cost disk drives (…think video…)
• Create a schema-flexible engine that delivers near real-time analytic performance, but avoids the cost, latency, inflexibility and extra work of layered OLAP engines
Mr. Plattner, I do like your analogy of columns and spaghetti, but I’d like to take the analogy further, [while remaining on Italian soil, of course].
If columns work like spaghetti, and the invention of the wheel contributed to the automobile, it is surely the layers of innovation above the wheels that differentiate Ferrari from Fiat. [My apologies to Fiat owners, I chose Fiat for the alliteration with Ferrari...]
It works much the same with column-orientated databases. In column-oriented databases, the columns are only the starting point. Intelligent indexing, index-informed query optimization, and a pile of additional innovations actually deliver the raw horsepower we see today in Sybase IQ.
To Mr. Plattner, I say “Welcome to the Column Club”. And if perchance, the mood ever strikes to chat about how the layered innovations in Sybase IQ help its 1500+ user companies, I’d happily join you on one of your ocean races – look for a guy on the pier with his foulies, sou’wester and boots atop the hood of a dusty, but gracefully-aging Fiat Spyder.