Monday, October 12, 2009

Empowering the Data Warehouse with External data

By Rick Durham

To build a powerful data warehouse you must include as much relevant data from internal and external sources as possible to optimize the decision processes that managers and “C level” executives are called to make each day.

As an example, retailers have current/historic sales data along with pricing information, but this will only provide partial insight into the determinants that are driving sales. Information such as weather, income tax distribution periods, regional or local population growth, household demography, may also play a key factor in driving sales and must be taken into consideration.

Where do you go to get data that will complement your internal data sources to provide a much richer data warehouse and BI experience?

The government and many other organizations capture and deliver this data and distribute it free or for a nominal fee. Here are some examples:

Weather: Yahoo offers an RSS fed that can be called using an http request as follows:
http://weather.yahooapis.com/forecastrss?p=48161

The parameters to the request are the following:
Parameter, Description, Examples

p, US zip code or Location ID, p=95089 or p=USCA1116
u, Units for temperature (case sensitive), f: Fahrenheit or c: Celsius


The RSS response from this request includes the following information:
· Geographic latitude/longitude
· Weather Conditions (48 distinct codes)
· Temperature (F,C)
· Forecast (Condition, High Temperature, Low Temperature)


Importing this data daily or even hourly using SSIS packages along with Sales data goes a long way in understanding if weather is a factor in why certain items were purchased at a particular time/date and determining longer term sales trends.

Using Demographic Data along with internally generated data can go a long way to enhance the data warehouse. The following are examples of where this data can be obtained:
http://www.geolytics.com/?gclid=CMeliJqrqJ0CFU1M5QodekqHkA

With limited data (address or lat/long information) you can get 60 demographic attributes for that address that include factors such as income, average number of people per home, average age, education…

Likewise another good site for demographic data and data validation is:
http://www.melissadata.com/dqt/index.htm

This site offers validation against address, phone numbers, email and perform name parsing via Web Services calls which can help accelerate the ETL development process, provided you do not have to develop the code and maintain large demographic databases onsite. Additionally, this site offers demographic data on income, media locations, reverse phone and mailing lists.

Finally, the Federal government maintains thousands of databases with data gathered from various agencies that contain information that can be coupled with internal data to make your data warehouse far more powerful. For example:
http://research.stlouisfed.org/fred2/
http://www.data.gov/catalog
http://www.census.gov/

These sites contain historic economic and demographic data the government has collected regarding income, population, interest rates, commodity prices, housing sales and the downloads are free.

The goal of data warehouse development should be to provide the tools and data for optimal decision making. To assure this goal is achieved, make sure external source are also included in the initial and ongoing data warehouse implementation.


No comments:

Post a Comment