Forecasting algorithm

AutoARIMA

Classical seasonal ARIMA with exogenous regressors (SARIMAX) for a single time series. Searches for the best ARIMA structure automatically, then produces a point forecast plus a native, statistically-derived uncertainty band for every period in the horizon.

It is the classical baseline alongside lightgbm-v1 — the model every forecaster is expected to run as a yardstick. Pick it when your series is regular and reasonably stationary, when you want honest model-derived prediction intervals, or as a sanity check against the gradient-boosted forecaster.

What it does

You point it at a DataSource and pick a date column, a target column, and (optionally) some drivers. It outputs a new DataSource with one row per future period, in the same shape as every forecast algorithm:

ColumnTypeMeaning
<date_col>dateEchoes the input date column name
yhatfloatPoint prediction
yhat_lowerfloatLower bound at chosen interval
yhat_upperfloatUpper bound at chosen interval

The output DataSource is a first-class table the rest of the app can plot, pin to dashboards, or include in reports.

How it works

Automatic ARIMA search

An ARIMA model describes a series through its own past values (the autoregressive and moving-average terms) and as many rounds of differencing as it takes to make the series stationary. A seasonal ARIMA adds a second set of those terms at the seasonal lag — month-of-year for monthly data, day-of-week for daily, and so on.

AutoARIMA does the hard part for you: it searches over candidate (p, d, q)(P, D, Q) structures and keeps the one with the best information criterion (AICc). You do not pick orders by hand. The chosen structure is recorded on the run as arima_order (e.g. ARIMA(1,1,1)(0,1,0)[12]) so the fit is transparent.

The seasonal period is derived from the cadence you choose — 7 (daily), 52 (weekly), 12 (monthly), 4 (quarterly), 1 (yearly; no seasonal terms).

Native prediction intervals

Unlike the LightGBM forecaster — which fits three quantile models and then conformally widens the band — ARIMA's interval is analytic: it falls straight out of the fitted model's error variance and propagates forward through the forecast horizon. The band naturally widens the further out you forecast, because the model's own uncertainty compounds.

There is no calibration split and no CQR step: the interval_level you choose is the model's stated coverage by construction. The forecast viz never shows "(uncalibrated)".

Exogenous regressors

Driver columns you select enter the model as exogenous regressors — the X in SARIMAX. ARIMA fits a linear coefficient for each, on top of the time-series structure. As with the LightGBM forecaster, the model assumes the drivers' future values are knowable (calendar events, planned spend); the backtest reads them straight from the held-out window.

ARIMA's use of drivers is linear and additive — it does not capture non-linear driver effects or interactions. If a driver matters non-linearly, the LightGBM forecaster will use it better.

Direct forecast

ARIMA forecasts the whole horizon directly — there is no recursive feeding of one step's prediction into the next step's inputs. Inference on fresh data re-applies the discovered ARIMA order to the new history (a fast fixed-order re-fit, no re-search) and reports the model's one-step-ahead fitted values.

Configuration

The form maps directly to the spec — same inputs as the LightGBM forecaster:

Form inputStored as
SourcePredictionModel.source_id
Date columnspec.date_col
Target columnspec.target_col
Drivers (multi-select)spec.exogenous_cols
Cadencespec.frequency
Horizonspec.horizon (in periods)
Interval levelspec.interval_level
Validation horizonspec.validation_horizon

algorithm, version, task, and hyperparams are server-defaulted. ARIMA's fit is deterministic — the same input and spec always produce the same model.

Metrics

A successful run writes these into PredictionRun.metrics:

MetricMeaning
maeMean absolute error against the held-out backtest period
mapeMean absolute % error (rows with target=0 excluded)
smapeSymmetric MAPE — robust to zero/near-zero targets
pi_coverageFraction of backtest rows where actual fell inside band
arima_orderThe ARIMA structure the search selected

smape is the primary "is this any good" number. There is no feature_importances — ARIMA has no engineered feature set; the arima_order string is the interpretability surface instead.

Good for

  • Regular, reasonably stationary series with clear seasonality. Monthly KPIs, weekly demand, quarterly figures — the classic ARIMA home ground.
  • An honest baseline. Running AutoARIMA next to the LightGBM forecaster is the single most useful habit in forecasting — "we replaced our fancy model with ARIMA and it got better" is a common story.
  • Calibrated uncertainty. When the width of the band matters (capacity planning, risk), ARIMA's analytic interval is derived from the model rather than conformally patched on.
  • Small data. ARIMA is well-behaved on short series where a tree model would overfit.

Limitations

  • Linear, single-series. ARIMA models one series and treats drivers linearly. Non-linear effects, regime shifts, threshold behaviour, and driver interactions are better served by the LightGBM forecaster.
  • No multi-series support. One series in, one forecast out — split per series upstream.
  • Stationarity assumptions. ARIMA expects structure that differencing can stabilise. Series with abrupt level shifts or changing seasonality can defeat it; check the backtest band before trusting it.
  • No feature importance. There is no drivers panel — arima_order is the only structural readout.
  • No "project beyond the data" yet. AutoARIMA models are backtested but are not wired into the future_forecast analysis in this version; use the LightGBM forecaster when you need a forward projection past the dataset.

See also

  • lightgbm-v1.md — gradient-boosted sister; non-linear, handles regime shifts and non-linear drivers, usually higher accuracy on rich data.

Not sure which to pick?

Choosing a forecasting algorithm

LightGBM forecasting vs AutoARIMA — when gradient-boosted forecasting wins, when the classical SARIMAX model is enough, and why it is worth running both.