Benefit of Column-store Indexes


With the release of Microsoft SQL Server 2012, Microsoft has bought into the concept of column-store indexes with its VertiPaq technology.  This is similar to the approach taken by Vertica.  In a very simplistic example, below is a standard data set:

LastName FirstName BusinessUnit JobTitle
Johnson Ben Finance Director
Stevens George HR Recruiter
Evans Bryce Advancement Major Gift Officer

In a common row-store, this might be stored on the disk as follows:

  • Johnson, Ben, Finance, Director
  • Stevens, George, HR, Recruiter
  • Evans, Bryce, Advancement, Major Gift Officer

Column-store indexes store each column’s data together.  In a column-store, you may find the information stored on the disk in this format:

  • Johnson, Stevens, Evans
  • Ben, George, Bryce
  • Finance, HR, Advancement
  • Director, Recruiter, Major Gift Officer

The important thing to notice is that the columns are stored individually.  If your queries are just doing SELECT LastName, FirstName then you don’t need to read the business unit or job title.  Effectively, you read less off the disk.  This is fantastic for data warehouses where query performance is paramount.   In their white paper, Microsoft uses the illustration below to better explain how columnstore indexes work:

Microsoft's Columnstore Index Illustration

Microsoft’s Columnstore Index Illustration

The benefits of using a non-clustered columnstore index are:

  • Only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table)
  • It is easier to compress the data due to the redundancy of data within a column
  • Buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.

Microsoft recommends that columnstore indexes are used for fact tables in datawarehouses, for large dimensions (say with more than 10 millions of records), and any large tables designated to be used as read-only.  When introducing a columnstore index, tables do become read only.  Some may see this as a disadvantage.  To learn more about Microsoft’s columnstore indexes, read their article titled, “Columstore Indexes Described.”

So let’s take a quick look at how to do this in the 2012 SQL Server Management Studio.  If you need to add a new non-clustered column store index, you can do so as seen below:

Columnstore index in MS SQL Server 2012

Creating a new non-clustered columnstore index in Microsoft SQL Server 2012

To ensure that the non-clustered columnstore index is achieving the right result for you, test it.  Under the Query menu, ensure that you enable the “Display Estimated Execution Plan.”

Displaying the Query Execution Plan in SQL Server 2012

Displaying the Query Execution Plan in SQL Server 2012

Analyzing the Query Execution Plan in SQL Server 2012

Analyzing the Query Execution Plan in SQL Server 2012

You should now be in a position to analyze your old and new queries to determine which is optimal from a performance perspective.  You’ll likely see a great reduction in query time by using the non-clustered columnstore index.

What are your thoughts?  Are you using columnstore indexes in your environment?

Related Articles:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s