🚀 Power BI Retail ETL Project Part 2: Combining Two Years of Retail Data Using Append Queries

🚀 Power BI Retail ETL Project Part 2

Combining Two Years of Retail Data Using Append Queries

From Separate Tables to a Unified Staging Layer


📖 What We Accomplished in Part 1

In Part 1, we imported the Online Retail II Dataset into Power BI and selected both worksheets:

  • 📅 Year 2009–2010
  • 📅 Year 2010–2011

We also learned why professional ETL workflows use Transform Data instead of directly loading data into the model.

Now it is time to move one step closer to a real Data Engineering workflow.


🎯 Today’s Goal

We currently have two separate retail datasets.

Our goal is to combine them into a single staging table that can be cleaned, validated, and transformed in future steps.

💡 Think Like a Data Engineer

Before cleaning data, we must first create a reliable staging layer.
A staging layer becomes the foundation of all future transformations.

🏗 Our ETL Architecture So Far

At this stage, both datasets are independent.

To perform analytics across the full business history, we need a single table.


⚡ Step 1 – Rename Queries

To make our ETL process more professional, we renamed the original queries:

  • retail_raw_2009_2010
  • retail_raw_2010_2011

Using meaningful names improves maintainability and makes the project easier to understand.


⚡ Step 2 – Append Queries as New

Power Query provides an extremely useful feature called:

Append Queries as New

This combines rows from multiple tables into a brand-new query while preserving the original source tables.

Navigate to:

Home → Combine → Append Queries → Append Queries as New

Select:

  • retail_raw_2009_2010
  • retail_raw_2010_2011

Power BI will create a new combined query.


📸 Append Queries Configuration

📷 Insert Append Queries Dialog Screenshot Here
(Select both raw tables and create a new query)


📸 Result After Append

📷 Insert Combined retail_staging Screenshot Here
(New staging table successfully created)


🏗 Introducing the Staging Layer

After combining both years, we created a new query:

retail_staging

This is one of the most important concepts in modern ETL and ELT pipelines.

Instead of transforming raw data directly, we first consolidate data into a staging area.


🔍 Why Use a Staging Layer?

  • ✅ Centralized data preparation
  • ✅ Easier troubleshooting
  • ✅ Better data quality management
  • ✅ Supports large-scale transformations
  • ✅ Industry-standard architecture

Whether you work with Power BI, PostgreSQL, BigQuery, dbt, or Airflow, staging layers are everywhere.


🚀 Updated Architecture

This architecture is now aligned with the same layered approach we use in enterprise-grade data platforms.

📷 Insert Updated ETL Architecture Diagram Here


🎓 What Happens Next?

Now that the staging layer is ready, we can begin our first real Data Quality Assessment.

In the next part, we will investigate:

  • 🔍 Total row count
  • 🔍 Missing Customer IDs
  • 🔍 Cancelled invoices
  • 🔍 Invalid quantities
  • 🔍 Invalid prices
  • 🔍 Data quality issues

This is where ETL becomes exciting.


💙 DETLENG Learning Philosophy

Don’t rush to dashboards. Build strong foundations first.

A beautiful dashboard built on poor data is still a poor solution.

Professional Data Engineers focus on data quality before visualization.


🚀 One Dataset. Seven ETL Journeys. Endless Possibilities.

Build • Learn • Engineer • Grow

🌐 DETLENG.COM

Popular posts from this blog

Production ETL Pipeline Execution Overview

Choosing the Right Dataset for a Realistic ETL / ELT Pipeline Project

🔍 Power BI Retail ETL Part 4 | Investigating Data Quality Issues in Online Retail Data