ARIMA Models in Python: All Just Statsmodels Under The Hood?

What’s ARIMA and Why Should You Care?

If you’re working with time series and you need to produce forecasts, autoregressive moving-average models (AR(I)MA) are still a good place to start. But which Python implementation should you use (if you don’t want to use R)?

Recently I’ve been again looking into what the Python ecosystem has to offer in regards to time series analysis in general and ARIMA models in particular. There are quite a few options, however you should have a rough understanding what’s happening under the hood: Are we dealing with a framework that wraps existing libraries or native implementations?

Time Series Packages in the Python Ecosystem

statsmodels: The Original

statsmodels is the classic Python package for estimating ARIMA models. statsmodels.tsa.arima.model.ARIMA provides support for standard (p,d,q) ARIMA models, seasonal orders (P,D,Q) as well as exogenous regressors. statsmodels implements the ARIMA model itself, sped up with Cython.

sktime: A Unified Time Series API

sktime is “an easy-to-use, easy-to-extend, comprehensive python framework for ML and AI with time series” which provides a “unified API for ML/AI with time series, for model building, fitting, application, and validation”. So this means that sktime wraps other libraries to provide one API.

Let’s take a look at the source repo, specifically sktime/forecasting/arima: This folder contains the modules _statsmodels.py and _pmdarima.py.

_statsmodels.py: This is a wrapper around statsmodels.tsa.arima.model.ARIMA
_pmdarima.py: This is a wrapper around pmdarima.arima.auto_arima

We already know statsmodels! But how about pmdarima?

pmdarima: Automatic ARIMA Model Selection

Surely pdmarima implements its own ARIMA estimation! Let’s take a look at the docs:

Pmdarima wraps statsmodels under the hood, but is designed with an interface that’s familiar to users coming from a scikit-learn background.

It’s statsmodels again!

Darts: User-friendly Forecasting

Now let’s take a look at the Darts package. The module darts/models/forecasting/arima.py contains the ARIMA class implementation. And it’s also a wrapper around statsmodels.tsa.arima.model.ARIMA!

There’s also an AutoARIMA class in the darts/models/forecasting/sf_auto_arima.py module. This one actually wraps the statsforecast package, not to cofuse with statsmodels.

statsforecast: Fast Forecasting

The statsforecast package provides us with an ARIMA class in python/statsforecast/models.py. This module in turn imports their own compiled C++ implementation src/arima.cpp. So finally, no wrapper around statsmodels!

But even statsforecast relies on statsmodels: To handle exogenous regressors, statsmodels’ ordinary least squares (sm.OLS) is used!

autogluon: Automated Time Series Forecasting

Last but not least autogluon aims to provide easy-to-use automated time series forecasting. Looking at the module timeseries/src/autogluon/timeseries/models/local/statsforecast.py reveals its dependency on statsforecast, among others statsforecast.models.AutoARIMA.

So this is actually the first library we found without any reliance on statsmodels. But that hasn’t always been the case: This commit from April 2024 removed a former statsmodels dependency!

statsmodels vs. statsforecast

So ARIMA models in the Python ecosystem apparently rely on either the statsmodels or statsforecast implementations (or both). Why would you choose on over the other? I have not benchmarked it myself yet, but statsforecast claims to be around 4x faster than statsmodels. So if performance matters to you, i.e. because you need to run a large amount of ARIMA models, you might want to check out statsforecast or statsforecast based frameworks.

Lesson Learned: Always Spend Some Time Understanding Your Tools

I was actually surprised to find out that all of the above time series packages—with the exception of autogluon—depend on the statsmodels package! While it makes sense that not each library reinvents the wheel it isn’t always immediately clear what the respective library offers vs. what core functionality is inherited from another package. To be fair: Each package is generally transparent about this.

Nevertheless I recommend to at least take a quick look at the underlying source code and ask yourself: Do you really need additional layers of abstractions a “wrapper” package offers?

What’s ARIMA and Why Should You Care?#

Time Series Packages in the Python Ecosystem#

statsmodels: The Original#

sktime: A Unified Time Series API#

pmdarima: Automatic ARIMA Model Selection#

Darts: User-friendly Forecasting#

statsforecast: Fast Forecasting#

autogluon: Automated Time Series Forecasting#

statsmodels vs. statsforecast#

Lesson Learned: Always Spend Some Time Understanding Your Tools#