Trend Scanning: Adaptive Labeling for Financial Time Series | AXLFI Blog | AXLFI

Overview

Most supervised-learning pipelines for financial time series start with the same brittle step: pick a fixed look-ahead horizon, compute the forward return, then threshold it into a label. If the horizon is too short, labels are dominated by noise. If it is too long, the regime may change before the prediction materializes.

Trend scanning, introduced by Marcos López de Prado, replaces that fixed choice with an optimization over a set of candidate horizons. At each point in time, it fits a simple linear regression of price against time for every candidate window length, selects the horizon that produces the largest absolute t-statistic, and uses the sign of that t-statistic as the label.

The result is a label that adapts to the local structure of the price path. When a strong trend is running, the method naturally selects a longer window. When the signal is fleeting, it selects a shorter one. The t-statistic provides a built-in confidence score that can be used downstream for sample weighting or filtering.

Visual

Price Path Colored by Trend-Scan t-Value

Each point on the synthetic price path is colored by the trend-scanning t-statistic at that location. Strongly positive t-values (uptrend) appear in light blue; strongly negative t-values (downtrend) appear in dark blue. Flat or ambiguous zones sit in between.

Visual

|t-stat| vs Forward Horizon

For a single starting point, this chart shows the absolute t-statistic from the linear trend regression across every candidate horizon h. The selected horizon h* is the one that maximizes |t|, shown by the highlighted marker.

Visual

Fixed-Horizon vs Trend-Scanning Labels

A direct comparison of labels generated by a fixed 20-day forward return versus trend scanning. Fixed labeling assigns many ambiguous or contradictory labels in choppy zones, while trend scanning concentrates confident labels where the trend evidence is strongest.

Article Section

Why fixed-horizon labeling fails

The standard approach in financial machine learning is to compute the forward return over a fixed number of bars and classify it as up, down, or flat. The problem is that the choice of horizon is arbitrary and has a large impact on label quality.

A 5-day return label captures noise and microstructure. A 60-day return label may span two or more distinct regimes. Neither is clearly correct, and the downstream model is forced to learn from labels that do not reflect the actual trend structure of the data.

This is not a minor nuisance. Label quality is the ceiling for any supervised model. If labels are noisy or misaligned with the real trend, no amount of feature engineering or model complexity will recover the lost signal.

label quality is the ceiling for any supervised model

Fixed forward return

rᵢ = (Pₜ₊ₕ − Pₜ) / Pₜ

Label rule

yᵢ = sign(rᵢ) if |rᵢ| > τ, else 0

Article Section

The trend scanning procedure

Trend scanning works by fitting a linear regression of the price path against time for every candidate horizon in a specified range. At each observation, it evaluates all windows from h_min to h_max, computes the t-statistic of the slope coefficient, and selects the horizon where |t| is maximized.

The t-statistic measures how many standard errors the slope is away from zero. A large positive t means the price path over that window is well-described by an upward trend. A large negative t means a clear downtrend. Selecting the horizon with the largest |t| is equivalent to choosing the window where the linear trend explains the most variance relative to noise.

Set h range→Fit OLS for each h→Compute tᵇ(β₁)→Pick h* = argmax |t|→Label = sign(t(h*))

Regression model

Pₜ₊ⱼ = β₀ + β₁ · j + εⱼ, j = 0, 1, …, h

t-statistic of slope

tᵇ(β₁) = β̂₁ / SE(β̂₁)

Selected horizon

h* = argmaxₕ |tᵇ(β₁, h)|

Final label

yₜ = sign(tᵇ(β₁, h*))

Article Section

Python implementation

The core implementation requires only NumPy. The inner function computes the t-statistic for a single (start, horizon) pair using the standard OLS formula. The outer function loops over observations and candidate horizons, selecting the best one at each point.

trend_scanning.py — core functions

import numpy as np

def _linear_trend_t_value(prices, start, horizon):
    """
    Compute the t-statistic of the slope for a simple
    linear regression of prices[start : start+horizon+1]
    against an integer time index.
    """
    y = prices[start : start + horizon + 1]
    n = len(y)
    if n < 3:
        return 0.0

    x = np.arange(n, dtype=np.float64)
    x_bar = x.mean()
    y_bar = y.mean()

    ss_xx = np.sum((x - x_bar) ** 2)
    ss_xy = np.sum((x - x_bar) * (y - y_bar))

    if ss_xx == 0:
        return 0.0

    beta_1 = ss_xy / ss_xx
    y_hat = y_bar + beta_1 * (x - x_bar)
    residuals = y - y_hat
    sse = np.sum(residuals ** 2)
    mse = sse / (n - 2)

    se_beta = np.sqrt(mse / ss_xx) if mse > 0 else 0.0
    if se_beta == 0:
        return 0.0

    return beta_1 / se_beta

trend_scanning.py — label generator

def trend_scanning_labels(prices, h_min=5, h_max=20):
    """
    For each observation, scan forward horizons from h_min
    to h_max, pick the one with the largest |t-stat|,
    and return the label (+1 / -1) plus metadata.
    """
    n = len(prices)
    labels = np.zeros(n)
    t_values = np.zeros(n)
    best_horizons = np.zeros(n, dtype=int)

    for i in range(n):
        best_t = 0.0
        best_h = h_min

        for h in range(h_min, h_max + 1):
            if i + h >= n:
                break
            t = _linear_trend_t_value(prices, i, h)
            if abs(t) > abs(best_t):
                best_t = t
                best_h = h

        labels[i] = np.sign(best_t)
        t_values[i] = best_t
        best_horizons[i] = best_h

    return labels, t_values, best_horizons

usage_example.py

# Generate a synthetic price path
np.random.seed(42)
returns = np.random.normal(0, 0.01, 200)
returns[40:80] += 0.005   # inject uptrend
returns[120:160] -= 0.005  # inject downtrend
prices = 100 * np.cumprod(1 + returns)

# Run trend scanning
labels, t_vals, horizons = trend_scanning_labels(
    prices, h_min=5, h_max=30
)

print(f"Avg selected horizon: {horizons.mean():.1f}")
print(f"Fraction labeled up:  {(labels == 1).mean():.2%}")
print(f"Fraction labeled dn:  {(labels == -1).mean():.2%}")

Article Section

Why the t-statistic is the right metric

Using the t-statistic rather than raw return or slope magnitude has a specific advantage: it normalizes for volatility. A small slope in a low-volatility regime can have a higher t-value than a large slope in a high-volatility regime. This means the method naturally adjusts for local noise levels.

The t-statistic also provides a built-in confidence measure. Labels with |t| > 2 correspond roughly to significance at the 5% level under the null of no trend. Labels with |t| near zero are ambiguous and can be filtered or down-weighted in training.

This is a key practical advantage. Most labeling methods produce binary outputs with no confidence score. Trend scanning gives you a continuous measure of label reliability that can be fed directly into sample-weighted loss functions.

Confidence filter

|tᵇ| > 2 → include in training set practical rule

Sample weight

wᵢ = |tᵢ| / Σ|tⱼ|

Article Section

Comparison with triple-barrier labeling

The triple-barrier method, also from López de Prado, constructs labels by defining take-profit and stop-loss barriers plus a time barrier. The label depends on which barrier is hit first. It produces path-dependent labels that account for risk management.

Trend scanning is different in philosophy. It does not impose barriers. Instead, it asks: over what forward window is the linear trend evidence strongest? The two methods answer different questions and can be complementary in a pipeline.

Triple-barrier is useful when you need labels that reflect executable trading outcomes. Trend scanning is useful when you need labels that reflect directional structure for feature learning or regime classification.

Triple-barrier method

Labels based on which barrier (profit, loss, or time) is hit first. Path-dependent and execution-aware.

Trend scanning

Labels based on the horizon with strongest linear trend evidence. Confidence-weighted and regime-adaptive.

Article Section

Practical considerations

The choice of h_min and h_max defines the range of horizons the method can select from. Setting h_min too low exposes the method to noise. Setting h_max too high makes it slow and may span multiple regimes.

A reasonable starting point for daily equity data is h_min = 5 and h_max = 20. For intraday data with 5-minute bars, h_min = 12 and h_max = 48 covers one to four hours. These should be tuned to the frequency and the features being used.

Computational cost is O(n · H) where H = h_max − h_min. For large datasets, the inner loop can be vectorized or parallelized. The OLS formula is closed-form and does not require iterative optimization.

the inner loop is closed-form OLS and can be fully vectorized

Complexity

O(n · H) where H = h_max − h_min

Typical daily range

h_min = 5, h_max = 20 starting point

Article Section

Integration into an ML pipeline

Trend scanning labels integrate cleanly into standard financial ML workflows. The labels and t-values become the target variable and sample weights respectively. Features are computed as of time t, and the model learns to predict the direction and strength of the trend that will unfold.

Because the method selects different horizons at different points in time, it naturally produces a mixture of short-term and longer-term labels. This can improve model robustness compared to a fixed-horizon approach that forces the model to predict over one timescale.

Compute features at t→Run trend scanning for labels→Filter by |t| threshold→Weight samples by |t|→Train classifier / regressor→Evaluate with purged CV

Conclusion

Why the framework still holds up

Trend scanning solves a specific and important problem in financial machine learning: how to generate directional labels without committing to an arbitrary fixed horizon.

By selecting the look-ahead window where the linear trend evidence is strongest, the method produces labels that are both adaptive and confidence-scored. The t-statistic gives a natural measure of label quality that can be used for sample weighting and filtering.

The approach is simple to implement, computationally cheap, and has a clear statistical interpretation. It is not a forecasting model itself, but a better way to define the target variable that forecasting models are trained on. That distinction matters: the ceiling on any supervised learner is set by the quality of its labels, and trend scanning raises that ceiling.

Back to Articles