Benefit of Column-store Indexes

With the release of Microsoft SQL Server 2012, Microsoft has bought into the concept of column-store indexes with its VertiPaq technology.  This is similar to the approach taken by Vertica.  In a very simplistic example, below is a standard data set:

LastName FirstName BusinessUnit JobTitle
Johnson Ben Finance Director
Stevens George HR Recruiter
Evans Bryce Advancement Major Gift Officer

In a common row-store, this might be stored on the disk as follows:

  • Johnson, Ben, Finance, Director
  • Stevens, George, HR, Recruiter
  • Evans, Bryce, Advancement, Major Gift Officer

Column-store indexes store each column’s data together.  In a column-store, you may find the information stored on the disk in this format:

  • Johnson, Stevens, Evans
  • Ben, George, Bryce
  • Finance, HR, Advancement
  • Director, Recruiter, Major Gift Officer

The important thing to notice is that the columns are stored individually.  If your queries are just doing SELECT LastName, FirstName then you don’t need to read the business unit or job title.  Effectively, you read less off the disk.  This is fantastic for data warehouses where query performance is paramount.   In their white paper, Microsoft uses the illustration below to better explain how columnstore indexes work:

Microsoft's Columnstore Index Illustration

Microsoft’s Columnstore Index Illustration

The benefits of using a non-clustered columnstore index are:

  • Only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table)
  • It is easier to compress the data due to the redundancy of data within a column
  • Buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.

Microsoft recommends that columnstore indexes are used for fact tables in datawarehouses, for large dimensions (say with more than 10 millions of records), and any large tables designated to be used as read-only.  When introducing a columnstore index, tables do become read only.  Some may see this as a disadvantage.  To learn more about Microsoft’s columnstore indexes, read their article titled, “Columstore Indexes Described.”

So let’s take a quick look at how to do this in the 2012 SQL Server Management Studio.  If you need to add a new non-clustered column store index, you can do so as seen below:

Columnstore index in MS SQL Server 2012

Creating a new non-clustered columnstore index in Microsoft SQL Server 2012

To ensure that the non-clustered columnstore index is achieving the right result for you, test it.  Under the Query menu, ensure that you enable the “Display Estimated Execution Plan.”

Displaying the Query Execution Plan in SQL Server 2012

Displaying the Query Execution Plan in SQL Server 2012

Analyzing the Query Execution Plan in SQL Server 2012

Analyzing the Query Execution Plan in SQL Server 2012

You should now be in a position to analyze your old and new queries to determine which is optimal from a performance perspective.  You’ll likely see a great reduction in query time by using the non-clustered columnstore index.

What are your thoughts?  Are you using columnstore indexes in your environment?

Related Articles:

Gartner releases 2013 Business Intelligence & Analytics Magic Quadrant

Last month, Gartner released the 2013 version of their Business Intelligence & Analytics Platform Magic Quadrant.  I always look forward to the release of Gartner’s magic quadrants as they are tremendously helpful in understanding the landscape of specific technology tools.

Gartner Magic Quadrant for Business Intelligence & Analytics - Comparison of 2012 to 2013

This year, I was pleased to observe the following:

  • Microsoft has improved its overall ability to execute.  Overall, it seems that Microsoft is moving in the right direction with their SQL Server 2012 product.  I’m excited about the enhancements to Master Data Services and I like where they are headed with PowerPivot and Power Views.  A full list of new features can be found at Microsoft’s website.  I’m a big Microsoft fan and I’m excited about Office 2013 and the impact that it will have on BI.
  • IBM has maintained, and slightly increased, its market position.  IBM continues to expand upon the features of their key acquisitions (Cognos, SPSS).  They have done a nice job of migrating customers from the old Cognos 8 platform to IBM Cognos 10.x.  This has increased customer satisfaction.  I also really like their Analytic Answers offering.  In my opinion, BI will continue to become more service oriented – so a big applause for IBM’s analytics as a service offering.
  • Tableau has moved into the top right square.  Tableau deserves to be here and I’m excited to see this movement.  Tableau’s customer support and product quality has been consistently high.  They have also set a benchmark in terms of how straightforward it is to move to their platform and upgrade to the latest version release.
  • There is plenty of competition at the bottom of the market.  Niche players like Jaspersoft and Pentaho are at each others heels.  Competition is healthy!

The only thing that surprised me is that I didn’t see Pyramid Analytics on this list.  Microsoft acquired ProClarity back in 2006.  Extended support for ProClarity will soon end in 2017.  Given that Microsoft has not migrated all of the ProClarity features to PerformancePoint, I am speaking to many users that are jumping ship on the ProClarity front and moving toward Pyramid Analytics.  Pyramid Analytics has done a nice job to aid customers moving from ProClarity.  Keep an eye out.  We might see them on the list next year.

If you are interested in reading the full 2013 report, you may preview the online version here.