Available for senior architecture engagements — Remote / Casablanca

Tarik Boulaajoul.
Data Architect.

I design systems that turn raw data into organizational memory — governed lakehouses, distributed pipelines, and decision-grade BI for telco, BPO, insurance and SAP-driven enterprises.

6+ years across 5 enterprises ENSIAS State Engineer · e-Mgmt & BI FR · EN fluent Casablanca, Morocco
Architecture showcase

Three systems I designed end to end.

Real platforms, real stakeholders, real numbers. Each case below is a problem statement, the architectural decision I made, and what it cost — or saved.

Case 01 · Lakehouse modernization

Killing the Excel pipeline at a 1,000-seat BPO.

Assist Digital · 2025–present · Data Platform Lead role

Problem

A call-center BPO ran daily reporting on Python scripts that emailed Excel files. Numbers diverged across teams, refreshes broke silently, and a new client onboarding could take weeks of manual ETL.

Decision

Replace the script-and-email loop with a governed medallion lakehouse: Airflow as the only scheduler, dbt as the only transformer, Power BI as the only consumption layer. Bronze captures raw vendor exports, Silver enforces conformed dimensions, Gold serves business marts.

Stack

Apache Airflowdbt Power BIDAX SQLPython Medallion (Bronze/Silver/Gold)Git
↓ BI cost
Tableau Server retired after audit-driven catalog rationalization. Daily-refreshed dashboards replaced ad-hoc Excel emails for ops & execs.
Sources Ingest Lakehouse Serve CRM exportsCSV / API TelephonyAvaya logs Workforce mgmtXLSX / SFTP Quality QAREST Airflow DAGsextract & land@daily BRONZEraw, immutable SILVERconformed dims GOLDbusiness marts dbttests + lineage Power BIops dashboards Power BIexec KPIs Self-serveCSV upload UI
Case 02 · Distributed compute · predictive ops

Industrializing Big Data for an international telecom operator.

Orange Business Services · 2022–2025 · Big Data Engineer

Problem

A strategic data transformation needed scalable Big Data flows feeding both real-time KPIs and predictive models that flag system incidents before they page someone. Existing batch jobs were brittle and ran on an unhealthy mix of cron and tribal knowledge.

Decision

Standardize on Spark over HDFS/Hive for distributed processing, with a hardened pipeline pattern: idempotent writes, partitioned storage, and resilience baked in. Predictive models live alongside the pipelines so feature freshness is the platform's responsibility, not a notebook's.

Stack

Apache SparkHive HDFSPython Power BISQL Bash · UnixGit
−60%
processing time on critical batch flows after partition + shuffle tuning. Predictive layer surfaced incident precursors hours earlier.
Sources Compute Storage Consumption Network telemetrystreams Ticketingincident logs Service meshprobes Operator metricsKPI feeds Sparkdistributed ETL+ feature eng. Predictiveincident scoringPython · scikit HDFSparquet, partitioned Hive metastoreSQL on big data Curated martsKPI tables Power BIops + exec Alertsto NOC Sponsorstrategy KPIs
Case 03 · Cloud-native BI

From shared drives to a governed AWS warehouse for an insurance SME.

NGIS · 2020–2021 · BI Analyst & Architect

Problem

An insurance file-management SME had years of historical data sitting in spreadsheets and operational databases, with no central place to ask "how are we doing this quarter?" — and no automation behind the reports they did produce.

Decision

Stand up a cloud-native BI environment on AWS: S3 as raw landing, RDS as analytical store, Airflow + Pandas to automate the prep, and persona-tailored Power BI dashboards so claims handlers, ops managers and execs each see what they need — not what someone else needs.

Stack

AWS S3AWS RDS Apache AirflowPandas Power BIPython SQL
3 personas
handlers, ops, execs — each on a tailored dashboard, all sourced from one governed warehouse. Manual report production replaced by scheduled refreshes.
Sources Orchestrate Cloud DWH Personas Claims systemSQL exports Ops sheetsXLSX Legacy DBhistorical Airflow + Pandasprep + validate+ governance AWS S3raw landing AWS RDSanalytical store Semantic modelconformed KPIs Handlerscase dash Opsthroughput Execsportfolio KPIs
Skills map

The stack, organized the way I think about it.

Five concerns every data platform has to solve. Hover any node for proficiency and where I last shipped it in production.

Storage Orchestration Modeling Governance Visualization Postgres SQL Server HANA HDFS S3 / RDS Hive Neo4j Airflow dbt ADF Spark Docker Bash dbt models DAX SQL Dimensional Medallion Python DQ tests Lineage PBI gov Git CI/CD Docs Audit Power BI DAX Tableau Elastic Self-serve KPI design
Storage Orchestration Modeling Governance Visualization hover any node
Proficiency
Architecture decision records

Decisions worth writing down.

The interesting part of architecture isn't the diagram — it's the "and the alternative was…". Three calls I've defended in production.

ADR-001 · Accepted

Why we replaced Tableau Server with Power BI.

+
Context

An existing Tableau Server install with hundreds of reports — many unused, several duplicated, a few load-bearing. License cost was material; usage telemetry was thin.

Decision

Audit the catalog (who opens what, when, and why), keep the load-bearing reports, retire the rest, and rebuild the keepers in Power BI on top of the new dbt-curated marts. Single semantic layer, one BI license to renew.

Consequences

Lower BI spend, stronger consistency (one model = one truth), and a forcing function to actually understand which reports drive decisions. Cost: a quarter of careful migration, not a weekend script.

Status: acceptedDomain: BI consolidation
ADR-002 · Accepted

Star Schema vs Data Vault: a real tradeoff.

+
Context

Bronze landing was easy. The fight was at Silver: do we model conformed dimensions (Kimball star) or capture every source change in hubs/links/satellites (Data Vault)?

Decision

Star schema at Silver, exposed to BI through Gold marts. Data Vault is correct when source systems shift constantly and full historical replay is non-negotiable; for a BPO with stable vendor schemas and a strong "query me right now" need from the business, the dimensional model wins on time-to-insight per analyst-hour.

Consequences

Faster delivery, simpler DAX, easier Power BI semantic layer. We accept that schema-changing source systems will force ETL rework — and we monitor for that explicitly.

Status: acceptedDomain: warehouse modeling
ADR-003 · Accepted

dbt for transforms. Airflow only for orchestration.

+
Context

We could write transformation logic inside Airflow PythonOperators and skip a tool. We could also stuff orchestration logic inside dbt and skip Airflow. Both work. Both rot.

Decision

One tool per concern. Airflow schedules, retries, captures SLAs and emits state. dbt models the data, runs tests, and produces lineage. They communicate through a thin contract: Airflow calls dbt run --select, dbt fails loudly if a contract test breaks.

Consequences

Each tool stays good at its one job. Onboarding new analysts takes hours instead of weeks because the mental model is small. The cost is two systems to operate — worth it at any non-trivial scale.

Status: acceptedDomain: pipeline architecture
Tech stack timeline

Six years, five companies, one trajectory.

Each stop sharpened a different concern — from raw ETL to distributed compute to governed lakehouses. Scroll →

2019
TraInvestment
Data Science Intern
End-to-end fraud detection system — first taste of designing modular pipelines instead of one-off scripts.
Lang: Python · FlaskPipe: AirflowInfra: Unix · Git
2020 — 2021
NGIS
BI Analyst
Cloud-native BI for an insurance SME. First production AWS deployment; first time owning a semantic model end to end.
Cloud: AWS S3 / RDSPipe: Airflow · PandasBI: Power BI · DAX
2021 — 2022
Grupo Avalon
Data Engineer
Heterogeneous SAP + SQL Server integration. Where I learned that data quality is a system property, not a script.
Sources: SAP HANA · SQL ServerETL: SAP Data Services · PythonQuality: validation pipelines
2022 — 2025
Orange Business Services
Big Data Analyst
Distributed Spark pipelines + predictive analytics for an international telecom operator. −60% processing time on critical batches.
Compute: SparkStorage: HDFS · HiveBI: Power BI
2025 — present
Assist Digital
Data & Platform Lead
Modernization of a BPO reporting stack into an Airflow + dbt + Power BI medallion lakehouse. Tableau Server retired. Self-serve CSV intake for business users.
Pipe: Airflow · dbtModel: Bronze · Silver · GoldBI: Power BI · DAX governance
Writing

Opinions I'll defend in a code review.

Short, opinionated essays from the field. No vendor takes, no "modern data stack" bingo.

modeling

The dashboard isn't the product — the model is.

Stakeholders point at the dashboard, but every interesting question they'll ask next year depends on the semantic layer underneath. Build the model, and the dashboards become cheap.

6 min readRead →
platform

Stop calling it a data lake if you can't query it.

A bucket of parquet files isn't a lake — it's a graveyard. The line between "lake" and "lakehouse" is whether the next analyst can answer a question without summoning you.

4 min readRead →
medallion

Why your medallion architecture leaks: the silver layer trap.

Most teams do bronze well and gold reasonably. Silver is where conformed dimensions live or die — and where well-meaning teams ship marts disguised as Silver tables.

7 min readRead →
Contact

Designing your next data platform?

I take on a small number of senior architecture engagements per year — lakehouse builds, BI modernization, predictive ops platforms. Send a note with a one-paragraph problem statement and I'll reply within two business days.