Organisation & Culture
Commitment · Culture · Leadership · Enablement · Adoption
Strategy & Roadmap
Vision · Prioritization · Trade-offs · Use Cases · Outcomes
Teams & Execution
Execution · Cadence · Knowledge Sharing · Skillsets · Training · Project Management
Activation
Governance & Discovery
Managing data as a discoverable, governed asset.
Activation
Governance & Discovery
Managing data as a discoverable, governed asset.
Catalog & Discovery
The directory of every dataset — searchable, owned, used.
Business Glossary
Shared definitions — "active customer" means one thing.
Access Control
Who can read what — decided once and enforced everywhere.
Classification & Tagging
Sensitivity, ownership, domain — tagged at scale.
Privacy Engineering
PII, consent, retention, deletion — engineered in.
Audit & Compliance
Logs and reports that satisfy regulators and auditors.
Activation
Analytics & BI
Where data meets decisions.
Activation
Analytics & BI
Where data meets decisions.
Dashboards & Reports
The board's view, the team's view, the operator's view.
Reporting & Distribution
Scheduled reports, operational reports, board packs.
Visualisation & Statistical Analysis
Beyond dashboards — exploratory viz, statistical analysis, custom notebooks.
Ad-hoc & Exploratory Analysis
Notebook deep-dives and exploratory queries.
Self-Service Analytics
Business users answering their own questions, governed.
Conversational Q&A
Asking questions in natural language, getting answers from the data.
Activation
AI & Machine Learning
Predictive, generative, evaluated.
Activation
AI & Machine Learning
Predictive, generative, evaluated.
Predictive Modelling
Classical ML — train classifiers, regressors, forecasters; serve predictions to production.
MLOps & Deployment
Versioning, deployment, monitoring — the production side of models.
LLMs & Generative
Foundation models, fine-tunes, prompts, applications.
Retrieval-Augmented Generation
Grounding LLM answers in your own knowledge.
Agents & Tooling
LLM-driven workflows with tools, memory, and decisions.
Evaluation & Testing
Testing what a model does — before and after deployment.
Activation
Data Products & Apps
Putting data to work.
Activation
Data Products & Apps
Putting data to work.
Data Products
Data as a managed offering — owned, versioned, contracted.
Custom Data Apps
Full-stack apps where the data is the point.
Embedded Analytics
Charts inside your customers' product, not your dashboard.
Data APIs
Serving data to internal and external consumers.
Reverse Sync
Pushing modelled data back into Salesforce, Hubspot, and ops tools — the reverse-ETL outcome.
Data Marketplace
Internal data exchange — find, request, subscribe.
Engineering
Transformation & Modelling
Shaping data for the question being asked.
Engineering
Transformation & Modelling
Shaping data for the question being asked.
SQL Transformations
dbt, SQL, in-warehouse compute — declarative, the workhorse.
Code Transforms
Python, Spark, PySpark, Polars — procedural transforms when SQL isn't enough.
Modelling Patterns
Dimensional/star, snowflake, 3NF, OBT, data vault — chosen for fit.
Semantic Layer
Metrics, dimensions, contracts — defined once, used across BI and apps.
Marts & Cubes
Domain-shaped output tables for the way the business asks questions.
Materialisations
Views, tables, incremental builds, snapshots, SCDs — how transforms get persisted.
Engineering
Master Data Management
One customer, one product, everywhere.
Engineering
Master Data Management
One customer, one product, everywhere.
Entity Resolution
Matching records across systems — same customer, different IDs.
Golden Records
The single canonical version of each entity.
Reference Data
Lookups, codes, classifications — managed centrally.
Knowledge Graphs & Hierarchies
Org charts, product trees, taxonomies, semantic networks — entities and how they relate.
Stewardship Workflows
Who approves changes; how exceptions get handled.
Cross-system Identity
ID mapping across CRM, ERP, marketing, support, warehouse.
Engineering
Quality & Observability
Keeping data trustworthy.
Engineering
Quality & Observability
Keeping data trustworthy.
Quality Tests
Asserting what should be true — failing loud when it isn't.
Data Profiling
Looking at the data to understand it before modelling it.
Quality Rules
Codified expectations about what good data looks like.
Quality Monitoring
Continuous checks; alerts when something drifts.
Cost & FinOps
Cloud spend, query cost, FinOps practices — what your data work actually costs.
Data Contracts
What producers promise consumers, in writing.
Engineering
Orchestration & Automation
Running and shipping the work.
Engineering
Orchestration & Automation
Running and shipping the work.
Workflow DAGs
Airflow, Prefect, Dagster — the graph of what runs when.
Scheduling & Triggers
Cron, intervals, file landings, webhooks — when work fires.
CI/CD for Data
dbt CI, automated tests, deploy on green — shipping changes safely.
Schema Migrations
Versioned schema changes, contract evolution, safe rollouts.
Environment Promotion
Dev → Staging → Prod — automated promotion with checks at each gate.
Backfills & Reruns
Rerunning history — for new logic, fixed bugs, missed days.
Foundation
Network & Identity
Who and what gets in — the access fabric.
Foundation
Network & Identity
Who and what gets in — the access fabric.
Network & Connectivity
VPCs, subnets, private endpoints, peering, transit gateways — the network fabric.
Identity & Access Management
IAM, SSO, RBAC, federation — the human and role layer.
Workload Identity & Auth
Service accounts, mTLS, workload identity federation, service mesh.
Secrets & Credentials
Vault, AWS Secrets Manager, rotation, just-in-time access.
Encryption & Key Management
KMS, HSM, BYOK — at rest and in transit.
DNS & Service Discovery
How services find each other — the resolver every workload needs.
Foundation
Connectivity & Integration
How data moves — in any direction.
Foundation
Connectivity & Integration
How data moves — in any direction.
Batch Data Movement
Scheduled bulk data movement — the workhorse pattern.
Streaming
Real-time event flows over message brokers — Kafka, Kinesis, Pulsar.
API Integration
Programmatic pull and push to SaaS and partner systems.
Unstructured Intake
PDFs, documents, images, audio — the raw material for AI.
IoT & Edge Sources
Sensors, devices, telemetry, clickstreams — data captured at the edge.
CDC & Replication
Capturing every change from operational databases — log-based replication for analytics.
Foundation
Storage Architecture
Where data lives.
Foundation
Storage Architecture
Where data lives.
Operational Databases
Postgres, MySQL — where applications actually live.
Cloud Data Warehouse
Snowflake-class storage built for SQL analytics at scale.
Object & Lake Storage
S3-class storage for raw files, media, archives, and lakes.
Lakehouse
Delta, Iceberg — warehouse speed on lake storage.
Specialty Stores
Time-series, graph, document, search, vector — fit-for-purpose.
Backup & Disaster Recovery
Archival, retention, recovery — when the worst happens.
Foundation
Compute & Runtime
Where data work runs.
Foundation
Compute & Runtime
Where data work runs.
SQL Engines
Postgres, MySQL, SQL Server, Oracle — the relational workhorses. Most data work actually runs here.
Distributed Query
Trino, Spark, BigQuery, Snowflake compute, DuckDB — MPP and cluster-scale SQL.
Serverless Compute
Lambdas, Cloud Functions, Cloud Run — pay-per-invocation event compute.
Stream Compute
Flink, Kafka Streams, Spark Streaming — processing continuous data.
GPU & ML Compute
Accelerated and special purpose hardware for training and serving models.
General Compute
VMs, containers, Kubernetes — the runtime substrate that hosts the rest.
The Data Capability Map
Tap a group to see its capabilities.
Technology & Standards
Platforms · Tools · Vendors · Best Practices · Templates · Build vs Buy