Business Intelligence with Workday

As part of my current project at Georgetown University, we have a requirement to extract large volumes of data for business intelligence purposes from Workday.  We’ve been running Workday HCM for about 2 years and plan to go-live with Workday Financials this July.  Historically, the university has been largely an advocate of on-premise solutions – and Workday represents one of the first large steps we’ve taken to move ERP data “into the cloud”.


What does this mean for business intelligence?  Let’s dive a bit deeper.

Provided that Workday is a SaaS (software-as-a-service) vendor, a direct on-premise database connection isn’t an option.  That being said, we had to explore options to get data of out of the Workday platform – and to do it quickly and efficiently.

Below are a few data extraction methods which we considered:

  1. Utilize Workday Studio and leverage the web services-based Workday API
  2. Join required Workday objects in a custom report and expose them as a RaaS (report-as-a-service)
  3. Utilize Informatica Cloud

We quickly ruled out option #3.  We enjoy using Informatica and thought this might be a quick win.  Unfortunately, when aligning the data options available in Informatica Cloud with our requirements – we had several gaps (i.e. missing data, data was not defined down to the appropriate grain, etc).  Informatica offered to partner with us to expand capabilities (much appreciated), but unfortunately our timelines did not permit this to occur for our current project.  We continue to use Informatica PowerCenter as our best-in-class data integration tool.


So – which option was better #1 (API) or #2 (RaaS)?  This question was kicked around for quite some time.  After much deliberation, we decided on option #2 (RaaS).  You’re probably already asking why.  Isn’t the API faster?  Below were the results of our analysis:

  1. The Workday API was hitting a governor limit for our tenant.  This may have been a specific configuration setting for our tenant but it was prohibitive to pull large amounts of data out of the API.  The API was timing out for certain data pulls at the 2 hour mark.  It also required an API developer to manage.  Workday has done a nice job of documenting their API.  We found this to be extremely helpful along the way.  I’m sure we’ll have a future use for the API.
  2. Custom Reports allowed us the flexibility to join multiple objects that met our requirements.  The API calls were limited to what is publicly available in those calls.  We have a bit more flexibility to combine objects which support our requirements in Custom Reports.
  3. Custom Reports are built using the Workday GUI.  This is an added benefit at it did not require a developer to maintain the API calls.  We can have a functional resource maintain the custom reports which are used for data extraction.
  4. Custom Reports (so far) have been able to support the large data extraction that we require – without timing out.  An example is that we have been able to extract about 200K bi-weekly payroll records in about 12 minutes.  We will use this Custom Report-delivered-by-EIB mechanism to support incremental loads to our Finance and HCM data warehouses.


Using Workday’s Enterprise Interface Builder (EIB), we have been able to schedule the Custom Reports and have them output to a text file format which is conducive to load into our business intelligence platform.  The Workday Enterprise Interface Builder (EIB) tool provides an easy-to-use graphical and guided interface to define inbound and outbound integrations without requiring any programming. Used by the majority of Workday customers, the EIB can be used by both business and IT users to address a variety of integration needs.


Our project is still underway.  I’ll write another post with more details as our project moves further toward completion.   If you have any thoughts or suggestions regarding Workday data extraction / integration, please feel free to comment.  Suggestions and comments are welcome!

Considering BI? Follow these 10 steps for success!

Business intelligence.  It consistently ranks as one of the top priorities in various CIO and IT surveys.  In January 2013, Gartner conducted an executive program survey of over 2,000 CIOs.  I’ve listed the top 10 business and technology priorities from the survey below.

Top 10 Business Priorities Top 10 Technology Priorities
  1. Increasing enterprise growth
  2. Delivering operational results
  3. Reducing enterprise costs
  4. Attracting and retaining new customers
  5. Improving IT applications and infrastructure
  6. Creating new products and services (innovation)
  7. Improving efficiency
  8. Attracting and retaining the workforce
  9. Implementing analytics and big data
  10. Improving business processes
  1. Analytics and business intelligence
  2. Mobile technologies
  3. Cloud computing (SaaS, IaaS, PaaS)
  4. Collaboration technologies (workflow)
  5. Legacy modernization
  6. IT management
  7. CRM
  8. Virtualization
  9. Security
  10. ERP Applications

SaaS = software as a service; IaaS = infrastructure as a service; PaaS = platform as a service
Source: Gartner Executive Programs (January 2013)

That said, it is a known fact that many business intelligence (BI) projects fail. Are you planning to implement a BI and analytics program? Maybe thinking of big data?

Here are a few lessons learned that will put you on the path to success:

  1. Ensure executive sponsorship from the start of your project.  This may seem like a simple thing, but ensuring that the executives stand behind your project is critically important.  Before the project starts, lay out the business plan, create budgets (operating/capital), document key stakeholders, and setup a governance structure.  Keep these executives apprised of progress throughout and ensure they see the business value that you are providing.
  2. Understand your current software agreements – and choose the right software/database platform for your organization.  Many people ask me – what is your favorite BI software/database platform?  My answer is that is always depends.  It depends on what contracts your company already has in place.  It depends on the skills and expertise of your staff.  It depends on the budget of your project.  The net is that there are a variety of really good BI tools on the market.  To name a few – Microsoft, Oracle, MicroStrategy, IBM, Teradata, etc.  For a small scale rapid BI implementation, consider cloud-based tools such as Intuit’s Quickbase.
  3. Be inclusive during requirements gathering – don’t design in a bubble.  IT departments often get frustrated with business users as they feel that they can get the technical solution in place much more quickly without the business users involvement.  While this is probably true, if you don’t get critical buy-in from these business users – your initiative will ultimately fail.  The business users need to understand that the system will support their needs and requirements.  This is also critical when you get to the decommissioning phase (item #9 below).
  4. Employ a professional services team to help you.  This is not necessary, but in my personal opinion, I feel that it is tremendously helpful.  Your staff may or may not have dedicated time for the project.  Bringing on a few technical consultants and a project manager can really help to drive the project forward.  In addition, they hold an objective opinion and can help facilitate communication and decisions among departments.
  5. Don’t overlook security.  Security is often over-engineered in BI projects.  Please remember that BI projects don’t need to have ERP-level security.  You may consider bringing the security up a notch in order to gain better performance.  During your design, you may also identify that the users of the BI platform may be power-users by nature.  The platform doesn’t have to be available to everyone.  You may consider allowing a greater level of access to a few number of “power-users”.  This will depend on your specific deployment, but make sure you plan to discuss security early in the project and don’t leave it as an afterthought. 
  6. Document data definitions and ensure buy-in of that definition across groups.  Data definitions can be the most challenging part of a successful BI project – particularly if you have multiple groups involved and they each have their own definition of one piece of data.  This is a tedious process, but be diligent in working through definitions for your data and particularly for any calculated fields.  You may consider software packages that help you to manage this workload (i.e. Informatica’s Business Glossary)
  7. Keep it simple for the end-user and consider a portal.  Presentation is also important.   In a recent implementation, I used Microsoft’s SharePoint 2013 and its Team Site capability to disseminate reports and data by department.  We used team sites for the various departments and a public site to bring all of the departments together.  Consider building a portal similar to this for your end-users.  This makes the report/analytics delivery seamless and easy.  In an ever-growing mobile world, ensure that the portal is mobile capable.  Your users will want to access data when on the go.
  8. Allow for collaboration, but in a controlled environment.  Control the data model and only expose elements that have been approved by your data stewards.  Allow for power-users to create their own reports, but do it in a controlled way.  Expose approved data models to them.   Keep the report creation controlled to departments so that you can keep the solution clean and tidy.  You may even consider a workspace for the custom reports/analytics so that it can be separated from the standard content that is developed and available with the platform.
  9. Decommission old reporting solutions.  During go-live, don’t forget to decommission the legacy reporting solutions.  If you do not, people will continue to use them.  Caution:  if you decommission them, you’ll need executive sponsorship (item #1) and also assurance that you’ve captured the requirements from this system (item #3).
  10. Constantly innovate and evolve your new platform.  Don’t let the platform go-live and become immediately stale.  Engage the end-users and have a constant feedback loop established with them.  Get them engaged to have them help you innovate and build additional content that will better serve their community.  Hold a user-group for key power users so that they can collaborate and share lessons learned.

TrustRadius – A great new Technology Crowdsourcing concept from Vinay Bhagat

Earlier in the year, I wrote an article about Technology Crowdsourcing.  There is a new player in this space named TrustRadius.


The talented Vinay Bhagat and his team have created a unique platform which compiles software reviews sourced from product experts.  Key differentiators in this new platform include:

  1. TrustRadius is really focused on trying to source in-depth insights from primary product users/ experts
  2. The reviews of TrustRadius are structured, and soon will be able to be added to, and amended over time.  Structure allows for curation into things like comparisons.

They were launched in May and were funded by the Mayfield Fund in July and are rapidly starting to scale.  So far, they have 23k monthly visitors and are growing at 30% per month.

Review of Microsoft Business Intelligence at TrustRadius

For interested higher education folks, I’ve posted a few reviews there that you may find interesting:

Warning Signs That You Need Better Business Intelligence

Many people ignore the warning signs.  My memory immediately flashes to warning labels on cigarette packaging or the famous 80’s commercial by Partnership for a Drug-free America (This is your brain…this is your brain on drugs).

Unfortunately, warning signs for outdated business intelligence systems may not be that clear…and are much more easily ignored.  Let’s start with a few basic questions:

  1. Do you often go to the ‘data guy’ to get reports or data from your ERP systems?
  2. Do you have to contact multiple people to get the data that you require to run a report?
  3. Is there a central portal where you visit for reporting needs?
  4. Are you able to answer the majority of your business questions without involving the IT department?
  5. Are you stuck with canned reports, or do you have the flexibility to run in-memory BI with tools such as Microsoft PowerPivot, Qlikview, or Tableau?
  6. Are you able to write your own reports, and more importantly, understand the data that is available?
  7. Do you have dashboards setup to govern key business areas?
  8. Are you able to share those dashboards in real-time to further collaborate?
  9. Does your business intelligence system allow you to look across functional areas to compare and contrast?
  10. Is there a process in place to request that additional data be added to the system?

Now that you’ve had a chance to think about the questions above, let’s look at what I would consider a positive and negative response to each question.

No. Question Response
1 Do you often go to the ‘data guy’ to get reports or data from your ERP systems? Yes No
2 Do you have to contact multiple people to get the data that you require to run a report? Yes No
3 Is there a central portal where you visit for reporting needs? Yes No
4 Are you able to answer the majority of your business questions without involving the IT department? Yes No
5 Are you stuck with canned reports, or do you have the flexibility to run in-memory BI with tools such as Microsoft PowerPivot, Qlikview, or Tableau? Yes No
6 Are you able to write your own reports, and more importantly, understand the data that is available? Yes No
7 Do you have dashboards setup to govern key business areas? Yes No
8 Are you able to share those dashboards in real-time to further collaborate? Yes No
9 Does your business intelligence system allow you to look across functional areas to compare and contrast? Yes No
10 Is there a process in place to request that additional data be added to the system? Yes No
 Green text Positive response
 Red text Negative response

Use these questions as a quick litmus test.  If you answered negatively to 4 or more of the questions above, you should revisit your business intelligence strategy.  If you answered positively to 6 or more of the questions above, you are likely headed in the right direction.  That being said, it is never a bad idea to reassess and evaluate strategies to improve your business intelligence environment.  Considering the constant change and fantastic software tools that are out there – you can always find something that will add to the value of your BI strategy.  Caution:  many of these tools are expensive, so evaluate carefully and find the right fit for your budget and your organization.

If you would like more explanation as to the responses to my questions above, I have provided them below:

Do you often go to the ‘data guy’ to get reports or data from your ERP systems?

I have worked in many organizations to find that only one specific person, or a small group of people, have access to organization-wide data.  If you find yourself constantly going to one person to get the data that you require…or to get ‘that report’ written, then the business intelligence system is failing.  A proper business intelligence system should allow you to get the types of data that you require as an end user.  It may impose appropriate security on the data, but it should allow you to get access to the data that is required to do your job.

Do you have to contact multiple people to get the data that you require to run a report?

If you find yourself contacting one person in Finance to get the finance data – and one person in Human Resources to get the HR data – and one person in Sales to get the sales data, then the business intelligence system is failing.  An appropriate BI solution should bring this data together and make it accessible cross-functionally.

Is there a central portal where you visit for reporting needs?

Provided the cross-functional nature of any business intelligence solution, it is important to have a BI portal in place.  This portal may address key information such as:

  • Report glossary
  • Data definitions
  • Training
  • Change requests
  • Project activity
  • Maintenance schedules

Without such a portal, this information is difficult to cobble together and it leads to confusion as end-users are visiting multiple locations to find this information.  This portal serves as a one-stop shop for business intelligence, reporting, and analytics needs.

Are you able to answer the majority of your business questions without involving the IT department?

This question is two-fold.  The first part is to ensure that the business intelligence system has the data that is required to fulfill your request.  The second part is to ensure that if the business intelligence system does have all of the data that is required for your request, that it is accessible to you.  Security is important, but it can also be over engineered.  In a recent project, I stepped into a BI project where each report was executing a security table with over 3.5 M rows of data.  This killed performance and made the experience so frustrating for end-users that they started to spawn their own shadow systems.  Not good.  Make sure that you have the appropriate security in place for your data, but remember that security for BI projects will likely not be the same as the transactional security that you have setup in your ERP systems.

Are you stuck with canned reports, or do you have the flexibility to run in-memory BI with tools such as Microsoft PowerPivot, Qlikview, or Tableau?

Business users will always find a report that isn’t written…or a permutation of an existing report that is required to fulfill a specific need.  For the canned or standard reports that you make available, try to make them as flexible as possible with report parameters.  These filters allow you to further refine the report for various users.  Once the user community becomes used to the environment, they will quickly outgrow canned reports.  This is where in-memory BI plays a big part.  Allow power-users the flexibility of further analyzing data using in-memory tools.  I’ve seen this done very well with Microsoft PowerPivot, Qlikview, and Tableau among others.  This collaborative analysis may lead to future canned reports or dashboards that will provide benefit to the larger community.  And, if not, it provides the flexibility to perform quick cross-functional analysis without the effort of creating a full report.  In the case of Qlikview, users can collaborate and share dashboard data in real-time.  This is very neat.

Are you able to write your own reports, and more importantly, understand the data that is available?

If your business intelligence environment is setup correctly, power-users should have the ability to write custom reports and save them either to a public or personal directory.  If the BI implementation has been done well, power-users will also understand the metadata which fuels the reports.  In a recent implementation, I’ve seen this done well with IBM Cognos and their Framework Manager.

Do you have dashboards setup to govern key business areas?

Dashboards are always the eye-candy of a BI implementation.  With the proliferation of good dashboard tools, this is an integral part of a modern BI solution and is a key aid in helping to drive key business decisions.

Are you able to share those dashboards in real-time to further collaborate?

Not only are dashboards great to help the c-level suite make key business decisions, but they are also becoming a fantastic way to collaborate and share information.  As mentioned above, some of the more advanced dashboard tools allow you to share dashboard information in real-time.  This could be as simple as selecting a view parameters and sending over a copy of a dashboard that you’ve modified.  In some of the nicer tools, it could mean sharing a session and collaborating on the dashboard in real-time.

Does your business intelligence system allow you to look across functional areas to compare and contrast?

This may seem like a silly question, but it is an important one.  Does your business intelligence system encompass all of your key ERP systems?  If not, you need to reevaluate and ensure that all key business systems are brought into the BI solution.  Once the data from the key business systems is standardized and brought into the BI solution, you can start to join these data together for insightful cross-functional analysis.

Is there a process in place to request that additional data be added to the system?

Through exploration, collaboration, and report writing, you may identify data that has not been loaded into the BI solution.  This could be as simple as a data field or as complex as another transactional system.  Either way, you should ensure that your BI team has a process in place to queue up work items.  The BI solution needs to evolve as business needs are identified.  Personally, I have gained a lot of benefit from using Intuit’s Quickbase software to manage this process through technology.  Quickbase has a nice blend of tools that facilitate collaboration and allow end-users to submit requests without a lot of effort.

As you evaluate your BI solution, also ensure that your ETLs are loading data at the appropriate interval.  I’ve also seen many implementation where the data is so dated that is becomes useless to the end-users and impacts adoption of the platform.  Try to run ETL processes as quickly as possible.  In many implementations that I’ve completed, we’ve set data up to run nightly.  This isn’t always possible given the volume, but fresh data always allows for better adoption of the platform.

Real-time Data Collaboration by Qlikview


Yesterday, I had a very nice meeting with the folks at Qlikview (thanks, @DHurd11 & Andy for making the trip).  They partnered with a local DC consulting firm by the name of Tandem Conglomerate.  I had the pleasure of working with Ben Nah from Tandem Conglomerate last year – and I can vouch that their talent is top-notch.  Qlikview is smart to find partners of this caliber – and utilize them to better serve their customers.

As a technologist, I always have my eyes open to exploring new technology.  This is always challenging with long term contracts, university politics, and an ever-changing IT landscape.  However, for me, vendors have to prove why they should remain at the top.  Competition is healthy for everyone as it makes us constantly improve.  I should also say that as long as we have a defined data model, the reporting tools that we use on top of that data model are fluid.  The point is not to constantly change and rip out what you’ve done just for the sake of redoing it, but it is important to keep an eye on the latest technology, experiment, and find what is best for your organization.  This can be done gradually with small pilot projects to prove value.  We’re actually in the process of doing one of these pilot projects with Hadoop.

Ok – so you’re probably wondering why I titled this post ‘real-time data collaboration’?  During the Qlikview presentation yesterday, I saw something that really resonated with me.  And, that was the ability to collaborate, in real-time, on the same Qlikview dashboard.

This capability is a market differentiator for Qlikview.  As many of you may have seen from Gartner and in my previous post regarding Gartner’s Magic Quadrant for BI, this is one of the reasons that Qlikview remains ahead of competitors such as Tableau.  Other dashboard vendors may provide the ability to ‘share’ with others, or embed the dashboard into a web page or part.  Don’t misunderstand.  This is NOT the same.

During their product demonstration, Qlikview demonstrated the ability in real-time to share dashboards.  This means that you can select the filters/parameters that you would like to analyze, hone in on the particular area of interest, and share it in real-time.  The recipient can then see that selection, modify it, and as they modify the shared dashboard, it will update your dashboard.  You can modify and send changes back to the recipient as well.  VERY COOL!  Kudos to Qlikview on this feature.  Below are a few screen shots to show how it works:

Click ‘Share Session’

Click ‘Start Sharing’

Choose if you want to cut/paste the link, email it, or even expire the shared session.  By the way, recipients don’t have to have any sort of Qlikview license to be able to provide feedback in real-time.

Try it out in the demo below:


Related Articles:

Higher Education Data Warehouse Conference (HEDW) @ Cornell University

I just returned from an excellent conference  (HEDW) which was administered by the IT staff at Cornell University. Kudos to the 2013 Conference Chair, Jeff Christen, and the staff of Cornell University for hosting this year! Below is a little bit more information about the conference:

The Higher Education Data Warehousing Forum (HEDW) is a network of higher education colleagues dedicated to promoting the sharing of knowledge and best practices regarding knowledge management in colleges and universities, including building data warehouses, developing institutional reporting strategies, and providing decision support.

The Forum meets once a year and sponsors a series of listservs to promote communication among technical developers and administrators of data access and reporting systems, data custodians, institutional researchers, and consumers of data representing a variety of internal university audiences.

There are more than 2000 active members in HEDW, representing professionals from 700+ institutions found in 38 different countries and 48 different states.

This conference has proven to be helpful to Georgetown University over the last 5 years.  It is a great opportunity to network with peers and share best practice around the latest technology tools.  And, sorry vendors, you are kept at bay.  This is important as the focus of the conference is less on technology sales – and more about relationships and sharing successes.

Cornell University Outside of Statler Conference Center

Cornell University Outside of Statler Conference Center

Personally, this was my first year in attendance.  I gained a lot of industry insight, but it was also helpful to find peer organizations that are using the same technology tools.  We are about to embark upon an Oracle Peoplesoft finance to Workday conversion.  It was helpful to connect with others that are going through similar projects.  And for me specifically, it was helpful to learn how folks are starting to extract data from Workday for business intelligence purposes.

Higher Education Data Warehouse Conference

Higher Education Data Warehouse Conference

2013 HEDW Attendee List

2013 HEDW Attendee List

My key take-aways from the conference were:

  • Business intelligence is happening with MANY tools.  We saw A LOT of technology.  Industry leaders in the higher education space still seem to be Oracle and MicrosoftOracle seemed to be embedded in more universities; however many are starting projects on the Microsoft stack – particularly with the Blackboard Analytics team standardizing on the Microsoft platform.  IBM Cognos still seemed to be the market leader in terms of operational reporting; however Microsoft’s SSRS is gaining momentum.  From an OLAP and dashboard perspective, it seemed like a mixed bag.  Some were using IBM BI Dashboards, while others were using tools such as OBIEE Dashboards, Microsoft Sharepoint’s Dashboard Designer, and an emerging product – Pyramid Analytics. Microsoft’s PowerPivot was also highly demonstrated and users like it!  PowerView was mentioned, but no one seemed to have it up and running…yet.  Tableau was also a very popular choice and highly recommended.  Several people mentioned how responsive both Microsoft and Tableau had been to their needs pre-sale.
  • Business intelligence requires a SIGNIFICANT amount of governance to be successful.  We saw presentation after presentation about the governance structures that should have been setup.  Or, projects that had to be restarted in order be governed in the appropriate way.  This includes changing business processes and ensuring that common data definitions are put in place across university silos.  A stove-piped approach does not work when you are trying to analyze data cross functionally.
  • Standardizing on one tool is difficult.  We spoke to many universities that had multiple tools in play.  This is due to the difficulty of change management and training.  It is worth making the investment for change management in order to standardize on the appropriate tool set.
  • Technology is expensive.  There is no one size fits all.  Depending on the licensing agreements that are in place at your university – there may be a clear technology choice.  Oracle is expensive, but it may already be in use to support critical ERP systems.  We also heard many universities discuss their use of Microsoft due to educational and statewide discounts available.
  • Predictive Analytics are still future state.  We had brief discussions about statistical tools like SAS and IBM’s SPSS; however, these tools were not the focus of many discussions.  It seems that most universities are trying to figure out simple ODS and EDW projects. Predictive analytics and sophisticated statistical tools are in use – but seem to be taking a back seat while IT departments get the more fundamental data models in place.  Most had an extreme amount of interest in these types of predictive analytics, but felt, “we just aren’t there yet.”  GIS data also came up in a small number of presentations, but also has interest.  In fact, one presentation displayed a dashboard with student enrollment by county.  People like to see data overlaid on a map.  I can see more universities taking advantage of this soon.
  • Business intelligence technologists are in high demand and hard to find.  It was apparent throughout the conference that many universities are challenged to find the right technology talent.  Many are in need of employees that possess business intelligence and reporting skills.
  • Hadoop remains on the shelf.  John Rome from Arizona State gave an excellent presentation about Hadoop and its functional use.  He clarified how Hadoop got its name.  The founder, Doug Cutting, named the company after his son’s stuffed yellow elephant!  John also presented a few experiments that ASU has been doing to evaluate the value that Hadoop may be able to bring the university.  In ASU’s experiments, they used Amazon’s EC2 service to quickly spin up supporting servers and configure the services necessary to support Hadoop.  This presentation was entertaining, but was almost the only mention of Hadoop during the entire conference.  It may have more use in research functions, but does not seem widely adopted in key university business intelligence efforts as of yet.  Wonder if this will change by next year?
g with Son's Stuffed Elephant

Doug Cutting with Son’s Stuffed Elephant


A Compilation of My Favorite DW Resources

Recently, I received an email as part of a listserv from a colleague at  HEDW, or Higher Education Data Warehousing Forum, is a network of higher education colleagues dedicated to promoting the sharing of knowledge and best practices regarding knowledge management in colleges and universities, including building data warehouses, developing institutional reporting strategies, and providing decision support.

In the email that I referenced above, my colleague sent a link to an IBM Redbooks publication titled, “Dimensional Modeling: In a Business Intelligence Environment.”  This is a good read for someone that wants the basics of data warehousing.  It also may be a good refresher for others.  Here’s a short description of the book:

In this IBM Redbooks publication we describe and demonstrate dimensional data modeling techniques and technology, specifically focused on business intelligence and data warehousing. It is to help the reader understand how to design, maintain, and use a dimensional model for data warehousing that can provide the data access and performance required for business intelligence.

Business intelligence is comprised of a data warehousing infrastructure, and a query, analysis, and reporting environment. Here we focus on the data warehousing infrastructure. But only a specific element of it, the data model – which we consider the base building block of the data warehouse. Or, more precisely, the topic of data modeling and its impact on the business and business applications. The objective is not to provide a treatise on dimensional modeling techniques, but to focus at a more practical level.

There is technical content for designing and maintaining such an environment, but also business content.

Dimensional Modeling: In a Business Intelligence Environment

Dimensional Modeling: In a Business Intelligence Environment

In reading through a few responses on the listserv, it compelled me to produce a list of some of my favorite BI books.  I’ll publish a part II to this post in the future, but here is an initial list that I would recommend to any BI professional.  It is also worth signing up for the Kimball Group’s Design Tips.  They are tremendously useful.

Related Articles:

10 Steps to Data Quality Delight!

Data quality is always an aspect of business intelligence (BI) projects that seems to be deprioritized.  It is easy to look at the beautiful visualizations and drill-through reports that are key selling features of a BI project.  However, this article is about the value of cleansing your data so that these tools will work seamlessly with the data model that you establish.  Everyone knows the IT saying, “Garbage in.  Garbage Out.”  That holds entirely true with BI projects.  If the incoming data is dirty, it is going to be very difficult to efficiently process the data and make it available for a reporting platform.  This isn’t an easy problem to solve either.  When working across multiple functional areas, you may also have different sets of users that are entering data into the system in DIFFERENT ways.  So, in this instance, you may not have a data quality issue, but a business process issue.

As I have worked through my BI projects, here are 10 steps that I have followed to work with teams to create a data-centric culture and to improve data integrity.   I hope that these are of use to you…and please feel free to share any additional best practice in the comments of this blog!  We can all learn from one another.

Data Quality Workflow

Data Quality Workflow

  • Step #1:  Build data profiling and inspection into the design of your project
    Don’t wait until you are about to go-live to start looking at the quality of your data.  From the very beginning of your project, you should start to profile the data that you are loading into your BI platform.  Depending on your technology stack, there are multiple tools that will aid you in data profiling and inspection.  You might consider tools such as Informatica Analyst, or Microsoft SSIS Data Profiler.  A quick Google search will provide many alternatives such as Talend.  Regardless of the tool, make sure that you incorporate this activity into your BI project as soon as possible.  You’ll want to do a fair amount of inspection on each system that you intend to load into your BI platform.

    Informatica Analyst

    Informatica Analyst

    Microsoft SSIS Data Profiler

    Microsoft SSIS Data Profiler

    Talend Data Quality Tool

    Talend Data Quality Tool

  • Step #2:  Don’t be afraid to discuss these data quality issues at an executive level (PRIOR TO THE EXECUTION PHASE OF THE PROJECT)
    Awareness is always a key factor for your executive team.  Executives and executive sponsors need to know about the data quality issues as soon as possible.  Why?  You will need their support not only to address the data quality issues, but sometimes these issues stem from poor business process and training.  Their support will be critical to address either issue.

  • Step #3:  Assign ownership and establish accountability
    Assign ownership for data as soon as possible.  This will assist you to not only to resolve the data quality issues, but these key data stewards may be able to help identify additional data quality issues as they may be more familiar with their data than you.  In most cases, they have inherited this bad data too, and will likely want to partner with you to fix it.  However, you must consider that it will also place a burden on them from a bandwidth perspective.  Unless dedicated to your project, they will also have a day job.  Keep this in mind during your planning and see if you can augment and support these data cleansing efforts with your team.
  • Step #4:  Define rules for the data
    One of the biggest challenges that I continue to see is when data stewards do not want to cleanse their data, they want the ETL scripts to handle the 1,001 permutations of how the data should be interpreted.  While the ETLs can handle some of this logic, the business owners need to ensure that the data is being entered into the transactional system via a single set of business processes and that it is being done consistently and completely.  Usually, the transactional systems can have business rules defined and field requirements put in place that can help to enforce these processes.  In some cases, the transaction systems are sophisticated enough to handle workflow too.  Utilize these features to your advantage and do not over-engineer the ETL processes.  Not only will this be time consuming to initially develop, but it will be a management nightmare moving forward.
  • Step #5:  Modify business process as needed
    If you are working cross-functionally, you may run into the need to revise business processes to support consistent data entry into the transactional systems.  Recently, I was working on a project across 6 HR departments.  The net of their hiring process was the same, but they had 6 different processes and unfortunately, they were utilizing the same transactional system.  We had to get their executives together and do some business process alignment work before we could proceed.  Once the business process is unified, you then have to consider the historical data.  Does it need to be cleansed or transformed?  In our case it did.  Don’t underestimate this effort!
  • Step #6:  Prioritize and make trade-offs.  Data will rarely be perfect.
    Once you have revised business process and defined data cleansing activities, you will need to prioritize them.  Rarely are you in a position where data is perfect or resources are unlimited.  If you have done your design work correctly, you will have a catalog of the most critical reports and key pieces of data.  Focus on these areas first and then expand.  Don’t try to boil the ocean.  Keep your data cleansing activities as condensed as possible and make an honest effort to try to support the business units as much as possible.  In my experience, the BI developers can generally augment the full time staff to get data cleansing and data corrections done more efficiently.  However, make sure that the business unit maintains responsibility and accountability.  You don’t want the data to become IT’s problem.  It is a shared problem and one that you will have to work very hard to maintain moving forward.
  • Step #7:  Test and make qualitative data updates
    As you prioritize and move through your data cleansing checklist, ensure that you have prioritized the efforts that will reap the largest reward.  You might be able to prioritize a few smaller wins at first to show the value of the cleansing activities.  You should then align your efforts with the primary requirements of your project.  You may be able to defer some of the data cleansing to later stages of the project, or handle it in a more gradual way.
  • Step #8:  Setup alerts and notifications for future discrepancies
    After data cleansing has occurred and you feel that you have the data in a good state, your job is not over!  Data quality is an ongoing activity.  You almost always run into future data quality issues and governance needs to be setup in order to address these.  Exception reports should be setup and made available “on-demand” to support data cleansing.  Also, one of my favorite tools is data-driven subscriptions, or report bursts.  Microsoft uses the “data-driven subscription” terminology.  IBM Cognos uses the term “report burst.”  Once you have defined the type of data integrity issues that are likely to occur (missing data, incomplete data, inaccurate data, etc.), you can setup data-driven subscriptions, or report bursts, that will prompt the data stewards when these issues occur.  Of course, at the end of the day, you still have the issue of accountability.  We’ll take a look at that in the next step.  Depending on the tool that you using, you may have the capability of sending the user an exception report with the data issue(s) listed.  In other systems, you may simply alert the user of a particular data issue and then they must take action.  These subscriptions should augment the exception reports that are available “on-demand” in your reporting portal.

    Microsoft SSRS Data-Driven Subscription

    Microsoft SSRS Data-Driven Subscription

    IBM Cognos Report Burst

    IBM Cognos Report Burst

  • Step #9:  Consider a workflow to keep data stewards accountable
    So, what now?  The user now has an inbox full of exception reports, or a portal inbox full of alerts, and they still haven’t run the manual, on-demand exception report.  Data integrity issues are causing reporting problems as the data is starting to slip in its quality.  You have a few options here.  In previous projects, I have setup a bit of workflow around the data-driven subscriptions.  The first port of call is the data steward.  They are alerted of an issue with the data and a standard SLA is set to allow them an adequate amount of time to address the issue.  After that SLA period expires, the data issue is then escalated to their line manager.  This can also be setup as a data-driven subscription.  If both steps fail (i.e. both the data steward and the line manager are ignoring the data issue), then it is time to re-engage with your executive committee.  Make the data issues visible and help the executives understand the impact of the data being inaccurate.  Paint a picture for the executive about why data is important.  To further illustrate your point, if you have an executive dashboard that is using this data…it may be worthwhile to point out how the data integrity issue may impact that dashboard.  Not many executives want to be in a position where they are making decisions on inaccurate data.
  • Step #10:  Wash, rinse, and repeat
    By the time that you have gotten to this point, it will likely be time to fold in another transactional system into your BI platform.  Remember this process and use it again!WashRinseRepeat

Squeezing Value From Data – Higher Education Institutions: What do you think?

SAS sponsored this report which draws on a survey of 752 executives, as well as in-depth interviews with 17 senior executives and experts who are regarded as data pioneers.  I would love to hear from other higher education institutions in regard to “the importance of data to company functions” section.  This report focuses on a more corporate audience, but I imagine that in a university setting you might see Finance (Audit & Regulatory Compliance) and Advancement added to the list below.  And, perhaps, more mission-driven data such as Learning data (student/teacher performance).  Blackboard Analytics has a neat product in that arena (see screen shot below):

Blackboard Learn Analytics

Blackboard Learn Analytics

Overall, I think this infographic represents a comprehensive list of some of the key challenges of big data and business intelligence.  What is the most compelling bit of information in this infographic for you?

Squeezing Value From Data Infographic (credit:  SAS, All Analytics)

Squeezing Value From Data Infographic (credit: SAS, All Analytics)

I found the following two questions of particular interest.  These questions provoke thought about having the right type of talent within your organization and ensure that you have achieved the right type of outcome for your end-user.

Economist Intelligence Unit Survey:  Skills Gap

A skills gap question from the Economist Intelligence Unit survey in March 2012

Economist Intelligence Unit Survey:  Data Processing Speed

A data processing speed increase question from the Economist Intelligence Unit survey conducted in March 2012.

Related Articles: