dbt: Programmatic Invocation via dbtRunner

Introduction dbt is a great tool for building & organising your ELT data pipelines. When deploying dbt yourself you can invoke dbt either through dbt core cli or through Python via dbtRunner. I will give you an example template on how to use the latter. You can find the full example in the Full Example section. Note: This example was build on dbt-core==1.8.3. dbtRunner may be subject to breaking changes so there’s no guarantee the provided code works as is with other dbt versions....

August 6, 2024 · 9 min · 1823 words · Andreas Lay

A Primer on SARIMAX

A while ago I created a notebook with an introduction to time series analysis. Here is this notebook as a Gist: Generate a synthetic time series with cycles, trend (random walk) and noise components Look at some descriptive statistics (e.g. autocorrelations) Model the synthetic data with a SARIMA model Working with synthetic data first forces you to be explicit about your assumptions and is great for debugging: Unlike real data, as you know the true process the synthetic data follows you can validate your estimates easily against the “true” values....

November 21, 2023 · 1 min · 109 words · Andreas Lay

Setting Up Poetry to Access GCP Artifact Registry

Introduction In a corporate setting sooner or later you will want to host your in-house Python packages on a private artifact store. We are using Poetry for our package management and using Google Cloud as our cloud provider, therefore Artifact Registry is our store of choice. However the combination of Poetry with Artifact Registry is not well documented, so I hope this post helps. Creating a private package repository on GCP itself is straightforward and I assume you already have created one....

November 20, 2023 · 2 min · 317 words · Andreas Lay