Benefit of Column-store Indexes

With the release of Microsoft SQL Server 2012, Microsoft has bought into the concept of column-store indexes with its VertiPaq technology. This is similar to the approach taken by Vertica. In a very simplistic example, below is a standard data set:

LastName	FirstName	BusinessUnit	JobTitle
Johnson	Ben	Finance	Director
Stevens	George	HR	Recruiter
Evans	Bryce	Advancement	Major Gift Officer

In a common row-store, this might be stored on the disk as follows:

Johnson, Ben, Finance, Director
Stevens, George, HR, Recruiter
Evans, Bryce, Advancement, Major Gift Officer

Column-store indexes store each column’s data together. In a column-store, you may find the information stored on the disk in this format:

Johnson, Stevens, Evans
Ben, George, Bryce
Finance, HR, Advancement
Director, Recruiter, Major Gift Officer

The important thing to notice is that the columns are stored individually. If your queries are just doing SELECT LastName, FirstName then you don’t need to read the business unit or job title. Effectively, you read less off the disk. This is fantastic for data warehouses where query performance is paramount. In their white paper, Microsoft uses the illustration below to better explain how columnstore indexes work:

Microsoft’s Columnstore Index Illustration

The benefits of using a non-clustered columnstore index are:

Only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table)
It is easier to compress the data due to the redundancy of data within a column
Buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.

Microsoft recommends that columnstore indexes are used for fact tables in datawarehouses, for large dimensions (say with more than 10 millions of records), and any large tables designated to be used as read-only. When introducing a columnstore index, tables do become read only. Some may see this as a disadvantage. To learn more about Microsoft’s columnstore indexes, read their article titled, “Columstore Indexes Described.”

So let’s take a quick look at how to do this in the 2012 SQL Server Management Studio. If you need to add a new non-clustered column store index, you can do so as seen below:

Creating a new non-clustered columnstore index in Microsoft SQL Server 2012

To ensure that the non-clustered columnstore index is achieving the right result for you, test it. Under the Query menu, ensure that you enable the “Display Estimated Execution Plan.”

Displaying the Query Execution Plan in SQL Server 2012

Analyzing the Query Execution Plan in SQL Server 2012

You should now be in a position to analyze your old and new queries to determine which is optimal from a performance perspective. You’ll likely see a great reduction in query time by using the non-clustered columnstore index.

What are your thoughts? Are you using columnstore indexes in your environment?

Related Articles:

Robert Goodman

Benefit of Column-store Indexes

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply