With the release of Microsoft SQL Server 2012, Microsoft has bought into the concept of column-store indexes with its VertiPaq technology. This is similar to the approach taken by Vertica. In a very simplistic example, below is a standard data set:
|Evans||Bryce||Advancement||Major Gift Officer|
In a common row-store, this might be stored on the disk as follows:
- Johnson, Ben, Finance, Director
- Stevens, George, HR, Recruiter
- Evans, Bryce, Advancement, Major Gift Officer
Column-store indexes store each column’s data together. In a column-store, you may find the information stored on the disk in this format:
- Johnson, Stevens, Evans
- Ben, George, Bryce
- Finance, HR, Advancement
- Director, Recruiter, Major Gift Officer
The important thing to notice is that the columns are stored individually. If your queries are just doing SELECT LastName, FirstName then you don’t need to read the business unit or job title. Effectively, you read less off the disk. This is fantastic for data warehouses where query performance is paramount. In their white paper, Microsoft uses the illustration below to better explain how columnstore indexes work:
The benefits of using a non-clustered columnstore index are:
- Only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table)
- It is easier to compress the data due to the redundancy of data within a column
- Buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.
Microsoft recommends that columnstore indexes are used for fact tables in datawarehouses, for large dimensions (say with more than 10 millions of records), and any large tables designated to be used as read-only. When introducing a columnstore index, tables do become read only. Some may see this as a disadvantage. To learn more about Microsoft’s columnstore indexes, read their article titled, “Columstore Indexes Described.”
So let’s take a quick look at how to do this in the 2012 SQL Server Management Studio. If you need to add a new non-clustered column store index, you can do so as seen below:
To ensure that the non-clustered columnstore index is achieving the right result for you, test it. Under the Query menu, ensure that you enable the “Display Estimated Execution Plan.”
You should now be in a position to analyze your old and new queries to determine which is optimal from a performance perspective. You’ll likely see a great reduction in query time by using the non-clustered columnstore index.
What are your thoughts? Are you using columnstore indexes in your environment?