<_

indexedcode

/dsa_notes/sys_design
arrow_backBack to System Design

Phase 6: Data Engineering & Analytics

Modern systems are data-intensive. Learn how to process, transform, and analyze data at scale with stream processing, data warehousing, and event-driven architectures.

stream

Stream Processing

Kafka Streams, Apache Flink, Spark Streaming. Real-time data processing at scale.

Coming Soon
warehouse

Data Warehousing

Snowflake, BigQuery, Redshift. OLAP workloads and columnar storage for analytics.

Coming Soon
water_drop

Data Lakes

S3 + Parquet/ORC, Delta Lake, Apache Iceberg. Raw data storage with schema-on-read.

Coming Soon
transform

ETL/ELT Pipelines

Apache Airflow, dbt, Fivetran. Orchestrating data transformation workflows.

Coming Soon
sync_alt

Change Data Capture

Debezium, Maxwell. Real-time database replication and event streaming.

Coming Soon
history

Event Sourcing & CQRS

Event stores, command-query separation. Audit trails and temporal queries.

Coming Soon
compare_arrows

Batch vs Stream

MapReduce, Spark batch mode. When to use batch processing vs real-time streaming.

Coming Soon
verified

Data Validation

Great Expectations, schema enforcement. Ensuring data quality in pipelines.

Coming Soon