KG header
consulting
kimball university
course descriptions
class schedule
logistics
pricing & policies
registration
on-site classes
Linkedin
events
books
articles
design tips
about us
contact us
 


Kimball University: Course Descriptions

DWLD
DMD
MSDWD



ETL Architecture in Depth
This class, taught by Ralph Kimball and Bob Becker, makes sure that you understand all the factors necessary for effectively designing the back room of a data warehouse that can gracefully evolve over time as your needs mature and new technologies become available.

Why You Should Attend
This course is intended for the data warehouse designer who has identified the sources of data and the target end users and is ready to start implementing.

Above all, this course tries to guarantee that you don’t overlook a critical requirement. For example you dare not design your data warehouse while ignoring
• Compliance
• Integration of diverse sources
• Increasingly demanding real-time pressures
• The time variance of your major dimensions such as customer and product
• Being able to resume or back out a partially completed load
• Having a 100% certainty that you have captured all the changes in the source systems
• And a host of other requirements that you will learn about in this course

Even if you don’t have an immediate qualified need for every item on our list, over time it is likely that that you will. At the end of this course you will understand how your data warehouse ETL system can be built to anticipate all of the possible requirements.

This is not a microscopic code-oriented implementation class. Rather, it is an architecture class for the designer who must keep a broad perspective, and who needs to know what the latest technologies and techniques make possible. The course is organized around 34 necessary ETL subsystems which are developed in detail as the course progresses. See the course outline below for the names of the 34 subsystems.

In this course, you will circle around a series of design issues starting with the first steps of extraction, on through to the final steps of delivery of properly formatted data suitable for your BI tool. In this four day class, each student builds on paper a comprehensive ETL system based on a realistic complex example. All 34 subsystems are included.

Who should attend
This course is designed for data warehouse implementers, who are responsible for building the back room, or ETL portion, of a data warehouse environment. This would include ETL developers, ETL architects, data warehouse operational staff, compliance tracking data warehouse professionals and real time data warehouse designers.

Prerequisites
Familiarity with the basic principles of dimensional modeling is necessary since dimensional models are designed as the ultimate ETL deliverables. Students can gain this familiarity by reading the following articles written for Data Management Review:

Nov 2007 Resist the Urge to Start Coding
Dec 2007 Set Your Boundaries
Jan 2008 Data Wrangling
Feb 2008 Dimensional Perspectives

The rest of the DMR series is also recommended reading. The class will include brief reviews of dimensional modeling principles so that everyone has the same vocabulary.

COURSE OUTLINE
(Numbered items refer to the 34 subsystems taught in this course)
Day 1
Surrounding The Requirements
• Business needs
• Compliance
• Data profiling
• Security latency
• Archiving
• End user profiles
• Skills
• Licenses
• Coding vs. tool choice
• The restaurant analogy
• Data types used in ETL systems

• (1) Data Profiling
• Source to target map
• Access methods, source types
• Software, techniques
• (2) Change data capture
• (3) Extract window
• (3) Immediate transformations
• (3) Extract staging table designs, table types, retention, backup
• (22) Job scheduler
• (22) Exception handling architecture
• (23) Backup, (24) recovery, (24) restart
• Historical versus incremental load
• Team Responsibilities

Day 2
Cleaning
• (4) Data quality architecture
• (4) Data quality screens
• (5) Error event fact table
• (6) Audit dimension, compliance tracking
• (28) Sorting
• Module designs: (7) customer deduplication, address validation, ...
• Final clean data table designs

(8) Conforming
• Definition of conformed dimensions and facts
• Using the matrix
• Master data management
• Mapping incompatible structures into common structure
• (25) Version control
• (26) System and version migration, testing and regression
• (27) Workflow monitor
• (23) Job scheduler
• (29) Lineage and dependency analyzer
• (30) Problem escalation system

Modifying your ETL architecture for Real Time data warehousing
• The Hot Partition
• Streaming ETL vs. batch ETL
• Streaming extract
• Streaming cleaning and conforming
• Streaming delivery, query, reporting, dashboards, notifications
• EII architecture (Enterprise Information Integration)
• CTF architecture (Capture, Transform, and Flow)
• EAI architecture (Enterprise Application Integration)
• MBETL architecture (Micro Batch ETL)

Day 3
Delivering Dimension Tables
• Referential integrity
• (9) Time variance designs (Slowly Changing Dimensions)
• (10) Surrogate key generator
• (15) Multi-valued dimensions, bridge tables I
• Special cases (extreme dimensionality, extreme dimension width, many incompatible
   members)
• (11) Hierarchical dimensions (fixed, variable, ragged), bridge tables II
• (12) Special dimensions (mini, junk, outrigger, shrunken, step, text fact)

Delivering Fact Tables
• (13) Fact table builder (transaction, periodic, and accumulating grains)
• (14) Surrogate key pipeline
• Graceful extensibility (add attributes, add facts, add dimensions to existing schemas)
• Handling structure changes
• (16) Late arriving data design

Day 4
• (17) The dimension manager, responsibilities and procedures, real time complexities
• (18) the fact provider, responsibilities and procedures, real time complexities
• Distributed, federated data warehouses
• Delivering remote dimensions and attributes
• Delivering remote facts
• (19) Aggregations
• (20) Feeding OLAP cubes
• (21) DI (Data Integration) manager (feeding data mining, presentation layer extracts,
   3rd party flat files)

Development and Operations
• (31) Parallel processing and pipelining
• (32) Security
• (33) Compliance

(34) Metadata
• Process metadata
• Run results, exception handling, immediate schedule
• Technical metadata
• System inventory, data models, data definitions,
• Business rules, ETL jobs, transformations, batch parameters
• Business metadata
• Business definitions, source system info, DW data dictionary,

• Responsibilities
• Team roles
• Stepping back from the details
• The important decisions
• The important deliverables

 Home  |  Kimball University  |  Consulting  |  Events  |  Books  |  Articles & Papers  |  Design Tips  |  About Us  |  Contact Us  |  Site Map