Regression algorithm
Linear regression (OLS)
The classic linear model for predicting a numeric value from feature columns. Trains a scikit-learn LinearRegression (ordinary least squares — no regularization) behind a preprocessing pipeline (impute → scale numeric, impute → one-hot categorical), scores on a random hold-out split, and surfaces the standard regression metric set plus coefficient-based feature importance.
It is the simplest interpretable baseline in the regression family — pick it when you want plain, textbook least squares whose coefficients are the raw partial effect of each feature, with nothing shrinking them.
What it does
You point it at a DataSource and pick:
- a numeric target column you want to predict, and
- one or more feature columns the model gets to look at.
Feature columns may be numeric, boolean, or string/categorical. Like the Ridge regressor — and unlike LightGBM — a linear model needs every feature numeric, so the model does that conversion for you, internally: there are no encoder or scaler nodes to wire. (You still can wire preprocessing upstream if you want explicit control.)
The output is a trained model + an eval_result carrying the metrics, predictions on the test rows, and a feature-importance chart.
How it works
The pipeline shape is identical to the Ridge regressor — the same regressor_train / regressor_eval nodes, the same regression task. regressor_train is fit-only; regressor_eval runs the real prediction pass on the held-out test frame and emits the final scored result. The evaluation step is algorithm-agnostic — every regressor shares the exact same scoring + metric code.
The preprocessing pipeline
The model is a scikit-learn Pipeline. Inside it:
| Feature kind | Steps |
|---|---|
| Numeric / boolean | impute missing values with the median → standardize to zero mean, unit variance |
| String / categorical | impute missing values with the most frequent value → one-hot encode |
A subtlety worth knowing: standardizing the numeric features does not change OLS predictions — ordinary least squares is scale-equivariant, so rescaling a feature just rescales its coefficient inversely and the fitted values are identical. It is kept anyway because it puts the fitted coefficients on a comparable scale for the importance chart. (For Ridge the scaler also matters for the penalty; for OLS it is purely a presentation choice.)
The whole fitted pipeline — imputers, scaler, encoder, and coefficients — is serialized as one unit, so inference replays exactly what was fit.
OLS vs. Ridge
OLS minimizes squared error with no penalty on the coefficients. Ridge adds an L2 penalty (alpha) that shrinks them. The practical differences:
- OLS coefficients are the raw partial effects — directly interpretable as "holding everything else fixed, one unit of this feature moves the target by this much." Ridge's are biased toward zero by the penalty.
- OLS has no tuning knob. There is no
alphato set — the fit is fully determined by the data. - OLS is less stable. With many one-hot columns or correlated features, the unpenalized fit can produce large, erratic coefficients. Ridge's penalty tames exactly that.
Use OLS as the transparent reference; reach for Ridge when the feature set is wide or collinear.
Metric set
Same as every regressor — the eval step is shared:
| Metric | Meaning |
|---|---|
| MAE | Mean absolute error — average prediction error in target units |
| RMSE | Root mean squared error — penalizes large errors more heavily |
| R² | Coefficient of determination — fraction of variance explained (1.0 = perfect, 0.0 = no better than predicting the mean) |
| MAPE | Mean absolute percentage error — relative error, None if any test row has target == 0 |
The runs panel surfaces RMSE as the headline number; all four are visible on the eval result detail.
Feature importance
The chart shows standardized-coefficient magnitude — |coefficient| for each (one-hot-expanded) feature. Because features are scaled to unit variance before the fit, these magnitudes are roughly comparable across columns.
These are linear coefficients, not split gains — not numerically comparable to the LightGBM regressor's importance bars — and, being unregularized, not shrunk the way Ridge's are.
Hyperparameters
OLS has no regularization knob — there is nothing equivalent to Ridge's alpha. The only model-node hyperparams scikit-learn exposes are niche:
| Key | Default | Meaning |
|---|---|---|
fit_intercept | true | Whether to fit an intercept term. Leave on unless you have already centered the target |
positive | false | Constrain all coefficients to be non-negative |
Most runs leave hyperparams empty.
Limitations
- Linear relationship only. OLS models a linear relationship between features and the target — no interactions, no non-linear effects. If accuracy lags the LightGBM regressor badly, that is usually why.
- Unstable on wide or collinear feature sets. With no penalty, correlated features (including the full one-hot expansion of a categorical alongside the intercept — the "dummy-variable trap") leave the individual coefficients non-unique and sometimes wildly large. Predictions and R² are still well-defined, but the importance chart can mislead. Switch to Ridge if you see this.
- One-hot blow-up on high-cardinality columns. A categorical feature with hundreds of distinct values becomes hundreds of indicator columns. Prefer the LightGBM regressor for high-cardinality features, or reduce cardinality upstream.
- No prediction intervals. This is point regression — a single number per row. For uncertainty bands, use the time-series forecaster.
- Random split assumes IID rows. If your data has temporal structure, use the forecast template instead.
See also
ridge-regressor-v1.md— the regularized linear sister; pick it for wide or collinear feature sets.lightgbm-regressor-v1.md— gradient-boosted sister; non-linear, native categorical handling, usually higher accuracy.
Not sure which to pick?
Choosing a regression algorithmLightGBM vs Ridge vs OLS vs Random forest for predicting a number — start with a linear baseline, and when to reach for a tree-based model.