Lag-Llama:第一个时间序列预测的开源基础模型

文章目录 隐藏

Lag-Llama

2.然后，我们从 HuggingFace 🤗 下载预训练的模型权重。

当然，如果这里失败的话，可以下载文末提供Lag-Llama的模型文件和lag-llama.ckpt文件。

3.我们导入所需的包和可用于进行预测的 lag llama estimator 对象。

六、Lag-Llama的具有缺失值数据预测

七、 Lag-Llam的DataFrame数据集预测

八、关于调整上下文长度对零样本预测的重要性的示例演示

1.我们在这里使用 aus_retail 数据集。

2.获取具有默认上下文长度（32)

3.获取上下文长度为64.

4.获取上下文长度为128，此处也启用了 RoPE 扩展。

5.获取上下文长度为256

Lag-Llama

lagllama是为单变量概率预测而构建的。它使用不依赖于频率的通用方法来标记时间序列数据。这样模型可以很好地推广到不可见的频率。

它利用Transformer体系结构和分布头来解析输入令牌，并将它们映射到具有置信区间的未来预测。

一、具有滞后特征的标记

laglllama的标记策略是使用一组指定的滞后来构造序列的滞后特征。

它将从这个列表中为给定的数据集选择所有合适的频率:

季度、月、周、天、小时、秒

也就是说，如果以每日频率提供数据集，lag – llama将尝试使用每日滞后(t-1)，每周滞后(t-7)，每月滞后(t-30)等构建特征。

策略如下图所示。

从上图中，我们还可以看到模型构建了其他静态协变量，例如秒/分、小时/天等等，直到季度/年。虽然这可以很好地推广到所有类型的时间序列，但它有一个致命的缺点：由于固定的滞后指数列表，输入令牌可能会变得非常大。

例如，查看每小时数据的每月频率需要730个时间步。这意味着除了所有静态协变量之外，输入令牌的长度至少为730。

二、Lag-Llama架构

Lag-Llama是一个基于transformer的纯解码器模型，其灵感来自大型语言模型LLaMA的体系结构。

从图中可以看到输入标记是滞后时间步长和静态协变量的拼接。输入序列通过线性投影层将特征映射到解码器内部注意力模块的隐藏维度。另外就是在最后的输出，序列被发送到一个分布头负责输出一个概率分布。

在推理过程中，输入序列生成下一个时间点的分布。然后通过自回归，模型逐个生成剩余的预测序列，直到达到设置的长度。

生成预测的自回归过程有效地允许模型为其预测生成不确定性区间。但是这里的问题就是如果序列很长，自回归的方式会将错误扩大。

三、Lag-Llama分布头

Lag-Llama的分布头负责输出概率分布。这样模型就能够生成预测区间。

在模型的迭代中，最后一层使用Student ‘s t分布来构造不确定性区间。

四、Lag-Llama的训练

作为一个基础模型，Lag-Llama显然是在大量的时间序列数据语料库上训练的，因此该模型可以很好地泛化未见过的时间序列并进行零样本预测。

论文中说：Lag-Llama在来自不同领域的27个时间序列数据集上进行了训练，如能源、交通、经济等。

数据包含7965个单变量时间序列，总计约3.52亿个令牌。

所有数据集都是开源的，包括ethth, Exchange和Weather等。

五、Lag-Llama的零样本预测

1.环境设置

 !git clone https://github.com/time-series-foundation-models/lag-llama/ 
 cd lag-llama 
 pip install -r requirements.txt --quiet

Cloning into 'lag-llama'...
remote: Enumerating objects: 167, done.
remote: Counting objects: 100% (68/68), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 167 (delta 45), reused 50 (delta 40), pack-reused 99
Receiving objects: 100% (167/167), 198.32 KiB | 4.84 MiB/s, done.
Resolving deltas: 100% (75/75), done.

2.然后，我们从 HuggingFace 🤗 下载预训练的模型权重。

 !huggingface-cli download time-series-foundation-models/Lag-Llama lag-llama.ckpt --local-dir /content/lag-llama

当然，如果这里失败的话，可以下载文末提供Lag-Llama的模型文件和lag-llama.ckpt文件。

3.我们导入所需的包和可用于进行预测的 lag llama estimator 对象。

from itertools import islice

from matplotlib import pyplot as plt
import matplotlib.dates as mdates

import torch
from gluonts.evaluation import make_evaluation_predictions, Evaluator
from gluonts.dataset.repository.datasets import get_dataset

from gluonts.dataset.pandas import PandasDataset
import pandas as pd

from lag_llama.gluon.estimator import LagLlamaEstimator

4.Lag-Llama 预测函数

我们创建了一个用于 Lag-Llama 推理的函数，该函数可以重复用于下面的所有不同类型的数据集。此函数返回给定预测范围的预测。预测的形状为（，），其中是从每个时间步长的预测概率分布中采样的样本数。num_samplesprediction_lengthnum_samples

def get_lag_llama_predictions(dataset, prediction_length, device, context_length=32, use_rope_scaling=False, num_samples=100):
    ckpt = torch.load("lag-llama.ckpt", map_location=device) # Uses GPU since in this Colab we use a GPU.
    estimator_args = ckpt["hyper_parameters"]["model_kwargs"]

    rope_scaling_arguments = {
        "type": "linear",
        "factor": max(1.0, (context_length + prediction_length) / estimator_args["context_length"]),
    }

    estimator = LagLlamaEstimator(
        ckpt_path="lag-llama.ckpt",
        prediction_length=prediction_length,
        context_length=context_length, # Lag-Llama was trained with a context length of 32, but can work with any context length

        # estimator args
        input_size=estimator_args["input_size"],
        n_layer=estimator_args["n_layer"],
        n_embd_per_head=estimator_args["n_embd_per_head"],
        n_head=estimator_args["n_head"],
        scaling=estimator_args["scaling"],
        time_feat=estimator_args["time_feat"],
        rope_scaling=rope_scaling_arguments if use_rope_scaling else None,

        batch_size=1,
        num_parallel_samples=100,
        device=device,
    )

    lightning_module = estimator.create_lightning_module()
    transformation = estimator.create_transformation()
    predictor = estimator.create_predictor(transformation, lightning_module)

    forecast_it, ts_it = make_evaluation_predictions(
        dataset=dataset,
        predictor=predictor,
        num_samples=num_samples
    )
    forecasts = list(forecast_it)
    tss = list(ts_it)

    return forecasts, tss

5.加载不同类型的数据集

我们扩展了如何支持加载以不同格式存储的数据。演示的这一部分使用了 pandas.DataFrame based dataset – GluonTS documentation 的 GluonTS 作者提供的教程。我们感谢 GluonTS 的作者整理了如此详细的教程。

6.注意事项

1.此笔记中提供的预测函数对所传递的数据集中的最后步骤执行自动回归预测。prediction_length

目前，如果要执行预测，请在 CSV/数据帧中包含要执行预测的时间戳（使用虚拟值），并将预测长度设置为所需的范围。

2.请记住，在预测时间戳开始之前，Lag-Llama 需要最少的时间戳上下文。除了时间戳之外，Lag-Llama 还可以对滞后使用历史记录中最多更多时间戳的上下文 – 这部分是可选的，但您会发现，随着您为时间戳提供更多上下文，Lag-Llama 的性能会提高。32321092(32+) 1092

下面传递的上下文长度不应更改，并保持在 32 。Lag-Llama 将自动使用超过 32 的上下文作为滞后（如果可用）。

7.从长 CSV 加载数据

我们得到了多个时间序列，这些时间序列在一个 DataFrame 中彼此堆叠在一起，其中有一列来区分不同的序列。item_id

仅当数据集具有多个序列时，才需要。item_id

当您的 CSV 只有一个系列时，不需要。在这种情况下，在创建时，不需要传递。item_idPandasDatasetitem_id

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df

目标 item_id
2021-01-01 00:00:00 -1.3378 一个
2021-01-01 01:00:00 -1.6111 一个
2021-01-01 02:00:00 -1.9259 一个
2021-01-01 03:00:00 -1.9184 一个
2021-01-01 04:00:00 -1.9168 一个
… … …
2021-01-10 19:00:00 1.2349 J
2021-01-10 20:00:00 1.1525 J
2021-01-10 21:00:00 1.1485 J
2021-01-10 22:00:00 1.3248 J
2021-01-10 23:00:00 1.1657 J
2400 行 × 2 列

目标	item_id
2021-01-01 00:00:00	-1.3378	一个
2021-01-01 01:00:00	-1.6111	一个
2021-01-01 02:00:00	-1.9259	一个
2021-01-01 03:00:00	-1.9184	一个
2021-01-01 04:00:00	-1.9168	一个
…	…	…
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

# Set numerical columns as float32
for col in df.columns:
    # Check if column is not of string type
    if df[col].dtype != 'object' and pd.api.types.is_string_dtype(df[col]) == False:
        df[col] = df[col].astype('float32')

# Create the Pandas
dataset = PandasDataset.from_long_dataframe(df, target="target", item_id="item_id")

backtest_dataset = dataset
prediction_length = 24  # Define your prediction length. We use 24 here since the data is of hourly frequency
num_samples = 100 # number of samples sampled from the probability distribution for each timestep
device = torch.device("cuda:0") # You can switch this to CPU or other GPUs if you'd like, depending on your environment

8.获取预测

我们执行零样本推理

forecasts, tss = get_lag_llama_predictions(backtest_dataset, prediction_length, device, num_samples)

我们在此数据集上绘制模型的预测以及基本实况。

plt.figure(figsize=(20, 15))
date_formater = mdates.DateFormatter('%b, %d')
plt.rcParams.update({'font.size': 15})

# Iterate through the first 9 series, and plot the predicted samples
for idx, (forecast, ts) in islice(enumerate(zip(forecasts, tss)), 9):
    ax = plt.subplot(3, 3, idx+1)

    plt.plot(ts[-4 * prediction_length:].to_timestamp(), label="target", )
    forecast.plot( color='g')
    plt.xticks(rotation=60)
    ax.xaxis.set_major_formatter(date_formater)
    ax.set_title(forecast.item_id)

plt.gcf().tight_layout()
plt.legend()
plt.show()

六、Lag-Llama的具有缺失值数据预测

如果时间戳列的间距不均匀且单调递增，则在使用 PandasDataset 时会出现错误。在这里，我们将展示如何填补缺失的空白。

为了演示这一点，我们首先从长数据集中删除一些随机行。

import pandas as pd
import numpy as np
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
remove_ind = np.random.choice(np.arange(df.shape[0]), size=100, replace=False)
mask = [False if i in remove_ind else True for i in range(df.shape[0])]
df_missing_val = df.loc[mask, :]  # dataframe with 100 rows removed from df
df

现在，我们按进行分组并重新索引每个分组的 DataFrame。重新编制索引，如下所示，将添加新行，其中包含缺少数据的值。如果需要，用户可以在每个 DataFrame 上使用 method 来填充所需的值。item_idNaNfillna()

但是，Lag-Llama 支持包含行的数据集，并且插补是完全可选的。

# Get the max end date
max_end = max(df.groupby("item_id").apply(lambda _df: _df.index[-1]))
dfs_dict = {}
for item_id, gdf in df_missing_val.groupby("item_id"):
    # Get the full (regular) date range
    new_index = pd.date_range(gdf.index[0], end=max_end, freq="1H")
    # Reindex the dataframe
    dfs_dict[item_id] = gdf.reindex(new_index).drop("item_id", axis=1)
    # Conver the columns to float32 for lag-llama
    for col in dfs_dict[item_id]:
        # Check if column is not of string type
        if dfs_dict[item_id][col].dtype != 'object' and pd.api.types.is_string_dtype(dfs_dict[item_id][col]) == False:
            dfs_dict[item_id][col] = dfs_dict[item_id][col].astype('float32')

# Create a PandasDataset
ds = PandasDataset(dfs_dict, target="target")

backtest_dataset = ds
prediction_length = 24  # Define your prediction length. We use 24 here since the data is of hourly frequency
num_samples = 100 # number of samples sampled from the probability distribution for each timestep
device = torch.device("cuda:0") # You can switch this to CPU or other GPUs if you'd like, depending on your environment

plt.figure(figsize=(20, 15))
date_formater = mdates.DateFormatter('%b, %d')
plt.rcParams.update({'font.size': 15})

# Iterate through the first 9 series, and plot the predicted samples
for idx, (forecast, ts) in islice(enumerate(zip(forecasts, tss)), 9):
    ax = plt.subplot(3, 3, idx+1)

    plt.plot(ts[-4 * prediction_length:].to_timestamp(), label="target", )
    forecast.plot( color='g')
    plt.xticks(rotation=60)
    ax.xaxis.set_major_formatter(date_formater)
    ax.set_title(forecast.item_id)

plt.gcf().tight_layout()
plt.legend()
plt.show()

七、 Lag-Llam的DataFrame数据集预测

在这里，我们得到了以下格式的数据，其中时间序列在我们可以简单地用 dict 把它变成一个对象字典，然后用它构造一个DataFrame 数据集。

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url_wide = (
    "https://gist.githubusercontent.com/rsnirwan/c8c8654a98350fadd229b00167174ec4"
    "/raw/a42101c7786d4bc7695228a0f2c8cea41340e18f/ts_wide.csv"
)
df_wide = pd.read_csv(url_wide, index_col=0, parse_dates=True)
df_wide

forecasts, tss = get_lag_llama_predictions(backtest_dataset, prediction_length, device, num_samples)


plt.figure(figsize=(20, 15))
date_formater = mdates.DateFormatter('%b, %d')
plt.rcParams.update({'font.size': 15})

# Iterate through the first 9 series, and plot the predicted samples
for idx, (forecast, ts) in islice(enumerate(zip(forecasts, tss)), 9):
    ax = plt.subplot(3, 3, idx+1)

    plt.plot(ts[-4 * prediction_length:].to_timestamp(), label="target", )
    forecast.plot( color='g')
    plt.xticks(rotation=60)
    ax.xaxis.set_major_formatter(date_formater)
    ax.set_title(forecast.item_id)

plt.gcf().tight_layout()
plt.legend()
plt.show()

八、关于调整上下文长度对零样本预测的重要性的示例演示

本节旨在演示在使用模型 zero-shot 时优化超参数的重要性。context_length

1.我们在这里使用 aus_retail 数据集。

# To filter warnings for readability
import warnings
warnings.simplefilter("ignore", UserWarning)

# For this dataset
from gluonts.dataset.common import ListDataset

df = pd.read_csv("https://gist.githubusercontent.com/dannymorris/ac176586e0236bd9278e9c81e06851a8/raw/54fd7c7520702d3dd7d4bd59c9dfbed5385af438/aus_retail.csv")
df = df.set_index('Month')

df.head()

metadata = {
    'prediction_length': 12,
    'freq': '1M'
}

train_data = [{"start": df.index[0], "target": df[i].values[:-metadata['prediction_length']]} for i in df.columns]
test_data = [{"start": df.index[0], "target": df[i].values} for i in df.columns]

train_ds = ListDataset(
    data_iter=train_data,
    freq=metadata['freq']
)

test_ds = ListDataset(
    data_iter=test_data,
    freq=metadata['freq']
)
device = torch.device("cuda:0") # You can switch this to CPU or other GPUs if you'd like, depending on your environment

2.获取具有默认上下文长度（32)

forecasts_ctx_len_32, tss_ctx_len_32 = get_lag_llama_predictions(test_ds, prediction_length=metadata['prediction_length'], device=device, 
                                           context_length=32, use_rope_scaling=False, num_samples=30)
forecasts_ctx_len_32 = list(forecasts_ctx_len_32)
tss_ctx_len_32 = list(tss_ctx_len_32)

evaluator = Evaluator()
agg_metrics_ctx_len_32, ts_metrics_ctx_len_32 = evaluator(iter(tss_ctx_len_32), iter(forecasts_ctx_len_32))
print("CRPS:", agg_metrics_ctx_len_32['mean_wQuantileLoss'])

Running evaluation: 152it [00:00, 5881.04it/s]
CRPS: 0.07745884580723007
我们得到的 CRPS 为 0.0774。

3.获取上下文长度为64.

forecasts_ctx_len_64, tss_ctx_len_64 = get_lag_llama_predictions(test_ds, prediction_length=metadata['prediction_length'], device=device, 
                                           context_length=64, use_rope_scaling=False, num_samples=30)
forecasts_ctx_len_64 = list(forecasts_ctx_len_64)
tss_ctx_len_64 = list(tss_ctx_len_64)

evaluator = Evaluator()
agg_metrics_ctx_len_64, ts_metrics_ctx_len_64 = evaluator(iter(tss_ctx_len_64), iter(forecasts_ctx_len_64))
print("CRPS:", agg_metrics_ctx_len_64['mean_wQuantileLoss'])

Running evaluation: 152it [00:00, 5517.20it/s]
CRPS: 0.07220623044285954
我们得到了更好的 CRPS 0.0722

现在启用 RoPE 缩放，这可以更好地使模型处理较大的上下文长度（在本例中大于 32）。这是通过传递给函数来完成的。

forecasts_ctx_len_64_rope_scaled, tss_ctx_len_64_rope_scaled = get_lag_llama_predictions(test_ds, prediction_length=metadata['prediction_length'], device=device, 
                                           context_length=64, use_rope_scaling=True, num_samples=30)
forecasts_ctx_len_64_rope_scaled = list(forecasts_ctx_len_64_rope_scaled)
tss_ctx_len_64_rope_scaled = list(tss_ctx_len_64_rope_scaled)

evaluator = Evaluator()
agg_metrics_ctx_len_64_rope_scaled, ts_metrics_ctx_len_64_rope_scaled = evaluator(iter(tss_ctx_len_64_rope_scaled), iter(forecasts_ctx_len_64_rope_scaled))
print("CRPS:", agg_metrics_ctx_len_64_rope_scaled['mean_wQuantileLoss'])

Running evaluation: 152it [00:00, 5895.67it/s]
CRPS: 0.07104961715320961
我们得到了更好的 CRPS 0.0710。这表明，在增加上下文长度时，RoPE 缩放非常重要。

4.获取上下文长度为128，此处也启用了 RoPE 扩展。

forecasts_ctx_len_128_rope_scaled, tss_ctx_len_128_rope_scaled = get_lag_llama_predictions(test_ds, prediction_length=metadata['prediction_length'], device=device, 
                                           context_length=128, use_rope_scaling=True, num_samples=30)
forecasts_ctx_len_128_rope_scaled = list(forecasts_ctx_len_128_rope_scaled)
tss_ctx_len_128_rope_scaled = list(tss_ctx_len_128_rope_scaled)

evaluator = Evaluator()
agg_metrics_ctx_len_128_rope_scaled, ts_metrics_ctx_len_128_rope_scaled = evaluator(iter(tss_ctx_len_128_rope_scaled), iter(forecasts_ctx_len_128_rope_scaled))
print("CRPS:", agg_metrics_ctx_len_128_rope_scaled['mean_wQuantileLoss'])

Running evaluation: 152it [00:00, 5502.06it/s]
CRPS: 0.06576577186286284
我们得到了更好的 CRPS 0.0657。

到目前为止，增加上下文长度（并将其与 RoPE 扩展一起使用可以获得更好的性能）

5.获取上下文长度为256

forecasts_ctx_len_256_rope_scaled, tss_ctx_len_256_rope_scaled = get_lag_llama_predictions(test_ds, prediction_length=metadata['prediction_length'], device=device, 
                                           context_length=256, use_rope_scaling=True, num_samples=30)
forecasts_ctx_len_256_rope_scaled = list(forecasts_ctx_len_256_rope_scaled)
tss_ctx_len_256_rope_scaled = list(tss_ctx_len_256_rope_scaled)

evaluator = Evaluator()
agg_metrics_ctx_len_256_rope_scaled, ts_metrics_ctx_len_256_rope_scaled = evaluator(iter(tss_ctx_len_256_rope_scaled), iter(forecasts_ctx_len_256_rope_scaled))
print("CRPS:", agg_metrics_ctx_len_256_rope_scaled['mean_wQuantileLoss'])

Running evaluation: 152it [00:00, 6991.12it/s]
CRPS: 0.07051453819039323

我们得到了更差的 CRPS，为 0.0705。

我们看到，当上下文长度增加到256，模型的性能会下降。表明调整每个数据集/任务的上下文长度非常重要，而目前在 Lag-Llama 的情况下，尽可能大的上下文长度并不总是最好的。所以对于不同的数据集，可以通过设置多次不同的上下文长度来测试不同情况下的CRFS，以得到最优的结果。

如果大家感兴趣，我将会在下篇文章介绍如何对Lag-Llama模型进行微调，让其达到更好的预测效果。

百度网盘请输入提取码百度网盘为您提供文件的网络备份、同步和分享服务。空间大、速度快、安全稳固，支持教育网加速，支持手机端。注册使用百度网盘即可享受免费存储空间https://pan.baidu.com/s/10wph3rvpMPG1OYLB8nBCbQ?pwd=t2ta提取码：t2ta

文章来源于互联网:Lag-Llama:第一个时间序列预测的开源基础模型

Lag-Llama:第一个时间序列预测的开源基础模型

Lag-Llama

一、具有滞后特征的标记

二、Lag-Llama架构

三、Lag-Llama分布头

四、Lag-Llama的训练

五、Lag-Llama的零样本预测

1.环境设置

2.然后，我们从 HuggingFace 🤗 下载预训练的模型权重。

当然，如果这里失败的话，可以下载文末提供Lag-Llama的模型文件和lag-llama.ckpt文件。

3.我们导入所需的包和可用于进行预测的 lag llama estimator 对象。

4.Lag-Llama 预测函数

5.加载不同类型的数据集

6.注意事项

7.从长 CSV 加载数据

8.获取预测

六、Lag-Llama的具有缺失值数据预测

七、 Lag-Llam的DataFrame数据集预测

八、关于调整上下文长度对零样本预测的重要性的示例演示

1.我们在这里使用 aus_retail 数据集。

2.获取具有默认上下文长度（32)

3.获取上下文长度为64.

4.获取上下文长度为128，此处也启用了 RoPE 扩展。

5.获取上下文长度为256

相关推荐

对比kimi、通义千问、文心一言的写代码能力

热门文章

AI大模型,我们的未来

Lag-Llama

一、具有滞后特征的标记

二、Lag-Llama架构

三、Lag-Llama分布头

四、Lag-Llama的训练

五、Lag-Llama的零样本预测

1.环境设置

2.然后，我们从 HuggingFace 🤗 下载预训练的模型权重。

当然，如果这里失败的话，可以下载文末提供Lag-Llama的模型文件和lag-llama.ckpt文件。

3.我们导入所需的包和可用于进行预测的 lag llama estimator 对象。

4.Lag-Llama 预测函数

5.加载不同类型的数据集

6.注意事项

7.从长 CSV 加载数据

8.获取预测

六、Lag-Llama的具有缺失值数据预测

七、 Lag-Llam的DataFrame数据集预测

八、关于调整上下文长度对零样本预测的重要性的示例演示

1.我们在这里使用 aus_retail 数据集。

2.获取具有默认上下文长度 （32)

3.获取上下文长度为64.

4.获取上下文长度为128，此处也启用了 RoPE 扩展。

5.获取上下文长度为256

相关推荐

对比kimi、通义千问、文心一言的写代码能力

热门文章

AI大模型,我们的未来

2.获取具有默认上下文长度（32)