Ibis: Build your SQL Queries via Python ‒ One API for nearly 20 Backends

What is Ibis and Why Would You Use It? Ibis is a backend agnostic query builder / dataframe API for Python. Think of it as an interface for dynamically generating & executing SQL queries via Python. Ibis provides a unified API for a variety of backends like Snowflake, BigQuery, DuckDB, polars or PySpark. But why would this be useful? I mainly use it for: Generating complex dynamic queries where (Jinja-)templating would becomes too messy Writing SQL generation as self-contained, easily testable Python functions Switching out a production OLAP database (like Snowflake) at test time with a local & version controlled DuckDB instance containing test data If you know SQL you already know Ibis....

March 30, 2025 · 7 min · 1394 words · Andreas Lay

Distributed locking using Google Cloud Storage (or S3)

The Need for Distributed Locks When you run into situations where you want to prevent two pieces of code ‒ possibly running on differen machines ‒ from running concurrently, you need a distributed lock. An easy solution to implement such a lock is to leverage a cloud storage service like Google Cloud Storage (GCS) or Amazon S3. Here you can find the full code example implementing this kind of lock....

March 27, 2025 · 4 min · 785 words · Andreas Lay

dbt: Programmatic Invocation via dbtRunner

Introduction dbt is a great tool for building & organising your ELT data pipelines. When deploying dbt yourself you can invoke dbt either through dbt core cli or through Python via dbtRunner. I will give you an example template on how to use the latter. You can find the full example in the Full Example section. Note: This example was build on dbt-core==1.8.3. dbtRunner may be subject to breaking changes so there’s no guarantee the provided code works as is with other dbt versions....

August 6, 2024 · 9 min · 1823 words · Andreas Lay

A Primer on SARIMAX

A while ago I created a notebook with an introduction to time series analysis. Here is this notebook as a Gist: Generate a synthetic time series with cycles, trend (random walk) and noise components Look at some descriptive statistics (e.g. autocorrelations) Model the synthetic data with a SARIMA model Working with synthetic data first forces you to be explicit about your assumptions and is great for debugging: Unlike real data, as you know the true process the synthetic data follows you can validate your estimates easily against the “true” values....

November 21, 2023 · 1 min · 109 words · Andreas Lay

Setting Up Poetry to Access GCP Artifact Registry

Introduction In a corporate setting sooner or later you will want to host your in-house Python packages on a private artifact store. We are using Poetry for our package management and using Google Cloud as our cloud provider, therefore Artifact Registry is our store of choice. However the combination of Poetry with Artifact Registry is not well documented, so I hope this post helps. Creating a private package repository on GCP itself is straightforward and I assume you already have created one....

November 20, 2023 · 2 min · 317 words · Andreas Lay