BILake

Self-hosted business intelligence. Connect your databases, build datasets with SQL, create charts, and publish dashboards — your data never leaves your infrastructure.

Quick Start View on GitHub

Why BILake?

One ingest, many readers

DuckLake caches your datasets in DuckDB backed by S3 Parquet. Hundreds of dashboard viewers hit the cache — your production databases receive only one query per TTL period.

Columnar queries in milliseconds

DuckDB’s in-process columnar engine runs aggregations without network overhead. Chart queries that take 30 seconds on PostgreSQL take 50ms on the local DuckDB cache.

17+ data sources

PostgreSQL, MySQL, ClickHouse, Snowflake, BigQuery, Spanner, Trino, SQL Server, Databricks, and more. One consistent interface across all connectors.

Apache Iceberg V2

Iceberg V2 metadata is written alongside every Parquet file. Spark, Trino, and Flink can read BILake’s cached data as a first-class Iceberg table — no ETL pipeline.

Settings-driven

TTL, memory limits, S3 credentials, auth parameters — all in the metadata database, editable from the Settings page. No config files, no restarts for most changes.

Tableau-style workbooks

Drag-and-drop shelf interface: drop fields onto rows, columns, and mark shelves. Multiple worksheet and dashboard sheets in one workbook tab strip.

Architecture in brief

Source databases (PostgreSQL / MySQL / ClickHouse / BigQuery / …)
        │ one ingest per TTL
        ▼
Worker process  ──→  DuckDB  ──→  S3 Parquet  ──→  Iceberg V2
                                       ↑
Flight server (reads via httpfs) ──────┘
        ↑
API server  ←→  SvelteKit Frontend

Three processes, each independently scalable:

Process	Role
API server	ConnectRPC, auth, CRUD, settings, River scheduler
Flight server	DuckDB in-process, read queries, Parquet views
Worker	River job consumer, dataset ingest, S3 writes

Get started

git clone https://github.com/hakanuzum/bilake.git
cd bilake
docker compose up -d
open http://localhost

Default login: admin@example.com / admin123