Home > Excel, Microsoft, PowerPivot > Using Power Pivot I discovered–Huston we’ve got a problem…

Using Power Pivot I discovered–Huston we’ve got a problem…


New Toy! A new toy is always the best.  Old toys are OK, and certainly fun but there is nothing like a new toy.  So, I started playing with Power Pivot and found that the tool is really great at visibly showing issues.  Being able to visualize the data makes it easier to determine if there is a problem.  Traditionally, you could be down days, weeks even, before an issue like this could come to light and really slow down your whole process…

This is the story…

Ok, it was time to see what this puppy could do with my 39 million row record set.  Now, frankly I expect it to totally choke.  And I thought it had… but instead, it told me that the whole report would not fit on the sheet and then went ahead and produced one…

image

It didn’t freeze and it returned a pretty large set of data.  My set up was really basic:

image

The data was really raw.  So the next step is to do a little processing on the data and then return to PowerPivot. 

So, to start with I have total records of:

image

While I’ve proved that PowerPivot can work with this size of a data set, the limitations are the physical limitations of my system.  I’m running on a fairly beefy workstation as it is and it looks like I’ll need to reboot to recover some of the memory.   Things to keep in mind when working with large datasets.  What do I mean?  Oh, things like images stopped showing and the system just didn’t feel stable.

Next step, it’s time to think about some of the dimensions.  Sadly, the source database I’m working with is a pretty flat structure in many ways.  It’s what happens when you create a database that is to be all things to all people when it really should have been split apart.

Going on…

This is the staging table…

image

Description is the first Slicer.  It contains:

image

The numbers are pretty much what was expected except for City and County.   I’ll have to go back to the source draw to see why that is.

  • So first I looked at County: 528 rows
  • Ran the same check with City: 528 rows
    I sorted by the the “value” column to see if the counts varied by the value.  From a high level, that will be a sufficient test to proceed.  Actually, as I build out the dimensions I’ll be able to validate if the right data is coming across.

Hmmm… this is not looking good.  I retract that this is good enough for now.  It looks like the process needs to be re-evaluated.

After evaluating the data, I discovered the County and City Columns were reversed with each other.  So, I made the change on the back end.  Checked the results…

Now the data lines up but is it correct?  Cities and Counties still match.  Back to the source…

  • The first Slicer value returns 1993
  • The next ‘County’ returns 1523
  • Then ‘City’ also returns 1523
  • The last ‘Zip’ returns 2112
  • This all agrees with the Total Records picture.  Area and Zip have more records with a a slight edge to Zip.  That certainly plays true with the aggregated totals.  City and County do as well.  Now for the $64 question.  Is this correct?  My brain continues to argue that it is not…

    And then the light came on…
    Because of the way the query is structured, the following rule was applied:

Even if the filter (slicer) should return fewer rows, the minimum number of records returned will be the minimum that is determined by the uniqueness of the row returned.

To roll it all up:

Was there a problem?  Yes, two columns were reversed.  Did using power pivot help?  Yes, by showing a segment of data that intuitively looked incorrect  I discovered the error.  Part of this was user error also.  By relying on simple row counts, the picture presented was true but it did not have enough data.  Here’s the true picture…

image

Summary

While investigating PowerPivot, the right questions came up.  Would these questions have been asked later before the system went live?  Yes, as part of the Q&A process vetting the data is normal.  Using PowerPivot allowed me to save a lot of time by exposing issues and allowing the data to be visible without a lot of processing normally associated with large data sets.

Advertisements
Categories: Excel, Microsoft, PowerPivot Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: