What is ETL and Why do I Need It?
What is ETL and Why Do I Need It? Is that the same as a data warehouse? "Times, they are a-changin'…" to quote Bob Dylan. The world of data management is almost unrecognizable even from as little as five years ago.10 years ago if someone had said Data Warehouse, most everybody would have assumed it was a place where the wholesalers kept their hard drives and memory cards. Even now when you mention it, people look at you wondering if you're just using some buzzword to make it sound like your smarter. In reality, every modern business needs to understand the potential value of a Data Warehouse. We all recognize that better information leads to better decisions and a Data Warehouse makes this possible. Most businesses have no shortage of data, but organizing that data for easy access requires a data warehouse.
So what Data is Needed?
In an ideal state, all the information you've collected about your business. Anything you have got, that has ever been recorded in a computer record, forms part of your data warehouse. This can potentially differ from the scope of data required to for “Big Data” which is traditionally defined as massive amounts of data which are outside the capabilities of an ordinary filing system to manage.When combined, all your advertising campaigns in the last 20 years (and the results from them), the individual sales records of all the sales people you've had in the last 20 years, and everything else, might suddenly reveal something riveting.Pick a profession; it doesn't have to be computers; choose any profession at all...say Farm Equipment & Supplies, for example.Those facts you collected reveal that no salesman has ever sold a tractor in his first two years if they were under 30 years old. They can sell arc welders, harrows, seed planters, or irrigation systems, but never a tractor. The "why" of that might make you change your hiring practices to prefer those over 30; it might make you send out teams of salesman with at least one person over 30. A simple little change might result in you doubling your annual tractor sales.It works the same for any business model. Companies have blithely made the same mistake for years without even knowing that they were doing it. Having a data warehouse means you are no longer ignorant; mistakes like that stand out in the light of day.
How Do I Get A Data Warehouse?
You own a collection of invaluable information, but heretofore it's been virtually inaccessible. It has been like trying to watch a construction project through the little viewing holes provided in the surrounding fence. You can only see a little bit at a time, and not necessarily in relation to any other bit.ETL is relevant to the Warehouse discussion because it can often consume 70% of the time and effort to build a Data Warehouse (The Data Warehouse Lifecycle Toolkit, 2nd Edition). The ETL process can be viewed as a system goes through every scrap of data that you own, and orders it in a useful fashion; often it is comprised of a separate server running an isolated DBMS (Database Management System).The letters stand for Extract, Transform, and Load. Through them, information from many disparate sources is made into a homogeneous whole. Now, finally, your data becomes useful as a tool to forecast performance, isolate trends, and literally predict the future. Let's look at each one individually.
The most important step, extraction, involves accessing all your Storage Systems whether they're relational databases, Excel files, Extensible Markup Language (XML), or even completely unstructured Flat Files. It will search your Indexed Sequential Access Method (ISAM) files, hierarchical databases (IMS), or even visual information collected by a screen scraper. If you have spent some time using a web crawler (a "spider" program) it can even add that information.Part of the extract function leading to the next stage is to make sure that all the material is sortable by familiar parameters. Every item's parameters are clearly identified, no matter what its source.
Transformation then analyzes all of the individual records to make them conform to the expected output. Sorting, and a selection of algorithms, are customized so that (for example), entries with null fields aren't accepted; instances of "1=Male, 2=Female" are replaced with "M" and "F" respectively; there is automated merging of disparate sources of similar information and removal of duplicate entries; and it even includes validating data with lookup tables and reference files.
Finally the information is loaded into your brand new DBMS. It is now in a consistent format where you can grab any particular piece of data and compare it in relation to any other pieces of data.Data warehouses exist in two states: automatic update mode, or manually triggered updates. Automatic is generally better since it can be continuous, hourly, daily, weekly, etc. Some have the ability to overwrite earlier information to reflect changes, depending on the business's needs.
Additional Uses for ETL
If you're transitioning from a diverse collection of programs that have acted as your ERP (Enterprise Resource Planning) solution, to an all-in-one solution, you're going to be faced with some challenges. Moving your information from old systems to a new system is going to be a living nightmare…ETL to the rescue! Once again it can collect all your information from all your disparate systems and make it completely homogeneous. That means it can be fed straight into your brand new system with its integrity intact.
It's a complex process
Sorting through all your information, deciding what to keep, what to compile, what to consolidate, and what to throw away might be beyond your own programmers and staff. Often this process is outside of the scope of the traditional development skillset, thus it is often best to work with a custom software development company with a specialization in business intelligence. You have a much better chance of ending up with a usable, scalable end-product.
The future belongs to those who are in control of their data. Simply put:“Better Information + Better Decisions = Better Performance.”The right Business Intelligence toolset allows you to create never before imagined business insights by giving you full visibility into your the treasures hidden in your business data.