🚀 Power BI Retail ETL Project Part 2: Combining Two Years of Retail Data Using Append Queries
🚀 Power BI Retail ETL Project Part 2
Combining Two Years of Retail Data Using Append Queries
From Separate Tables to a Unified Staging Layer
📖 What We Accomplished in Part 1
In Part 1, we imported the Online Retail II Dataset into Power BI and selected both worksheets:
- 📅 Year 2009–2010
- 📅 Year 2010–2011
We also learned why professional ETL workflows use Transform Data instead of directly loading data into the model.
Now it is time to move one step closer to a real Data Engineering workflow.
🎯 Today’s Goal
We currently have two separate retail datasets.
Our goal is to combine them into a single staging table that can be cleaned, validated, and transformed in future steps.
Before cleaning data, we must first create a reliable staging layer.
A staging layer becomes the foundation of all future transformations.
🏗 Our ETL Architecture So Far
At this stage, both datasets are independent.
To perform analytics across the full business history, we need a single table.
⚡ Step 1 – Rename Queries
To make our ETL process more professional, we renamed the original queries:
- ✅ retail_raw_2009_2010
- ✅ retail_raw_2010_2011
Using meaningful names improves maintainability and makes the project easier to understand.
⚡ Step 2 – Append Queries as New
Power Query provides an extremely useful feature called:
Append Queries as New
This combines rows from multiple tables into a brand-new query while preserving the original source tables.
Navigate to:
Home → Combine → Append Queries → Append Queries as New
Select:
- retail_raw_2009_2010
- retail_raw_2010_2011
Power BI will create a new combined query.
📸 Append Queries Configuration
📷 Insert Append Queries Dialog Screenshot Here
(Select both raw tables and create a new query)
📸 Result After Append
📷 Insert Combined retail_staging Screenshot Here
(New staging table successfully created)
🏗 Introducing the Staging Layer
After combining both years, we created a new query:
✅ retail_staging
This is one of the most important concepts in modern ETL and ELT pipelines.
Instead of transforming raw data directly, we first consolidate data into a staging area.
🔍 Why Use a Staging Layer?
- ✅ Centralized data preparation
- ✅ Easier troubleshooting
- ✅ Better data quality management
- ✅ Supports large-scale transformations
- ✅ Industry-standard architecture
Whether you work with Power BI, PostgreSQL, BigQuery, dbt, or Airflow, staging layers are everywhere.
🚀 Updated Architecture
This architecture is now aligned with the same layered approach we use in enterprise-grade data platforms.
📷 Insert Updated ETL Architecture Diagram Here
🎓 What Happens Next?
Now that the staging layer is ready, we can begin our first real Data Quality Assessment.
In the next part, we will investigate:
- 🔍 Total row count
- 🔍 Missing Customer IDs
- 🔍 Cancelled invoices
- 🔍 Invalid quantities
- 🔍 Invalid prices
- 🔍 Data quality issues
This is where ETL becomes exciting.
💙 DETLENG Learning Philosophy
Don’t rush to dashboards. Build strong foundations first.
A beautiful dashboard built on poor data is still a poor solution.
Professional Data Engineers focus on data quality before visualization.
🚀 One Dataset. Seven ETL Journeys. Endless Possibilities.
Build • Learn • Engineer • Grow
🌐 DETLENG.COM
