自定义数据集

2025-02-12

该页面由 Jupyter Notebook 生成，原文件于 Github

In [1]:

# 导入包和设置设备

import torch
from torch import nn

device = "cuda" if torch.cuda.is_available() else "cpu"

torch.__version__, device

Out[1]:

('2.5.1+cu124', 'cuda')

获取并处理数据

获取数据

首先，需要一些数据。这里使用的数据是 Food101 数据集的一个子集。

Food101 包含 101 种不同食物的 1000 张图像，总计 101000 张图像（75750 张训练图像和 25250 张测试图像）。

为了自定义数据集，选取将 3 种食物开始：披萨、牛排和寿司。同时每个类并不是 1000 个图像，而是从随机的 10% 开始（从小处开始，必要时增加）。

可以以下步骤下载数据集：

原始Food101数据集和论文网站。
笔记本（ https://www.learnpytorch.io ）提供。

In [2]:

import requests
import zipfile
from pathlib import Path

# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it... 
if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
    
    # Download pizza, steak, sushi data
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
        request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
        print("Downloading pizza, steak, sushi data...")
        f.write(request.content)

    # Unzip pizza, steak, sushi data
    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
        print("Unzipping pizza, steak, sushi data...") 
        zip_ref.extractall(image_path)

data\pizza_steak_sushi directory exists.

也可以自行下载数据集，并划分为 train 和 test。

在此例中，有标准图像分类格式的披萨、牛排和寿司图像。

图像分类格式在单独的目录中包含单独的图像类，标题为特定的类名。例如，pizza的所有图像都包含在pizza/目录中。

pizza_steak_sushi/
    train/
        pizza/
        steak/
        sushi/
    test/
        pizza/
        steak/
        sushi/

目标是将这个数据存储结构转化为PyTorch可用的数据集。

现在试着打开几张图片看看：

使用 pathlib.Path.glob() 获取所有图像路径，以查找所有以 .jpg 结尾的文件。
使用 Python 的 random.choice() 选择一个随机的图像路径。
使用 pathlib.Path.parent.stem 获取图像类名。
使用 PIL.image.open() （PIL代表 Python image Library）打开随机图像路径。
显示图像并打印一些元数据。

In [3]:

import random
from PIL import Image

random.seed(42) 

image_path_list = list(image_path.glob("*/*/*.jpg"))
random_image_path = random.choice(image_path_list)
image_class = random_image_path.parent.stem
img = Image.open(random_image_path)

print(f"Random image path: {random_image_path}")
print(f"Image class: {image_class}")
print(f"Image height: {img.height}") 
print(f"Image width: {img.width}")
img

Random image path: data\pizza_steak_sushi\test\sushi\2394442.jpg
Image class: sushi
Image height: 408
Image width: 512

Out[3]:

No description has been provided for this image

同样可以使用 matplotlib：

In [4]:

import numpy as np
import matplotlib.pyplot as plt

img_as_array = np.asarray(img)

plt.figure(figsize=(5, 5))
plt.imshow(img_as_array)
plt.title(f"Image class: {image_class} | Image shape: {img_as_array.shape} -> [height, width, color_channels]")
plt.axis(False);

转化数据集表示

现在希望将图像数据加载到 PyTorch 中。在 PyTorch 中使用图像数据之前，需要：

把它变成张量（图像的数值表示）。
将其转换为 torch.utils.data.dataset，随后再转换为 torch.utils.data.DataLoader，简称它们为 Dataset 和 DataLoader。

PyTorch 有几种不同类型的预构建数据集和数据集加载器，具体取决于处理的问题。

视觉类：torchvision.datasets；
音频类：torchaudio.datasets；
文本类：torchtext.datasets；
推荐系统：torchrec.datasets。

In [5]:

# 导入包
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

使用 torchvision.transforms 转化数据：

使用 transform.Resize() 调整图像的大小。
使用 transform.RandomHorizontalFlip() 在水平方向上随机翻转图像（这可以被认为是一种数据增强形式，因为它会人为地改变我们的图像数据）。
使用 transform.ToTensor() 将图像从 PIL 图像转换为 PyTorch 张量。

可以使用 torchvision.transforms.Compose() 编译所有这些步骤。

In [6]:

data_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),
    transforms.RandomHorizontalFlip(p=0.5), # p 为翻转的概率
    transforms.ToTensor()
])

接下来试试转换的效果：

In [7]:

def plot_transformed_images(image_paths, transform, n=3, seed=42):
    random.seed(seed)
    random_image_paths = random.sample(image_paths, k=n)
    for image_path in random_image_paths:
        with Image.open(image_path) as f:
            fig, ax = plt.subplots(1, 2)
            ax[0].imshow(f) 
            ax[0].set_title(f"Original \nSize: {f.size}")
            ax[0].axis("off")

            # permute() 会改变图像的形状以适应 matplotlib
            # (PyTorch default is [C, H, W] but Matplotlib is [H, W, C])
            transformed_image = transform(f).permute(1, 2, 0) 
            ax[1].imshow(transformed_image) 
            ax[1].set_title(f"Transformed \nSize: {transformed_image.shape}")
            ax[1].axis("off")
            fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)

plot_transformed_images(image_path_list, 
                        transform=data_transform, 
                        n=3)

使用 ImageFolder 加载数据

目前数据是标准的图像分类格式，所以可以使用 torchvision.datasets.ImageFolder 类。将目标图像目录的文件路径以及我们想要对图像执行的一系列转换传递给它。

In [8]:

from torchvision import datasets

train_dir = image_path / "train"
test_dir = image_path / "test"

train_data = datasets.ImageFolder(root=train_dir,
                                  transform=data_transform,
                                  target_transform=None) # 转换在标签上执行
test_data = datasets.ImageFolder(root=test_dir,
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data\pizza_steak_sushi\train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data\pizza_steak_sushi\test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
           )

现在 PyTorch 已经注册了数据集。通过检查 classes 和 class_to_idx 属性以及训练集和测试集的长度来检查一下：

In [9]:

class_names = train_data.classes
class_dict = train_data.class_to_idx
class_names, class_dict, len(train_data), len(test_data)

Out[9]:

(['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2}, 225, 75)

再检查一下训练数据和测试数据：

In [10]:

img, label = train_data[0][0], train_data[0][1]
img.shape, img.dtype, label, type(label)

Out[10]:

(torch.Size([3, 64, 64]), torch.float32, 0, int)

图像现在是张量的形式（形状为 [3,64,64] -> [通道, 高度, 宽度]），标签是与特定类相关的整数形式（由class_to_idx属性引用）。

还需要将数据转换为 DataLoader。将 Dataset 转换为DataLoader，模型可以遍历并学习样本和目标（特征和标签）之间的关系。

为了简单起见，将使用 batch_size=1 和 num_workers=1。

batch_size 已经解释过，批量大小。
num_workers 定义将创建多少个子进程来加载数据，num_workers 设置的值越高，PyTorch 在加载数据时使用的计算能力就越强。通常通过 Python 的 os.cpu_count() 将其设置为 CPU 总数，确保 DataLoader 使用尽可能多的内核来加载数据。

In [11]:

from torch.utils.data import DataLoader

train_dataloader = DataLoader(dataset=train_data,
                              batch_size=1,
                              num_workers=1,
                              shuffle=True)

test_dataloader = DataLoader(dataset=test_data,
                             batch_size=1,
                             num_workers=1,
                             shuffle=False)

最后获取 train_dataloader 中每个可迭代项的 Shape 信息：

In [12]:

img, label = next(iter(train_dataloader))
img.shape, label.shape

Out[12]:

(torch.Size([1, 3, 64, 64]), torch.Size([1]))

使用自定义 DataSet 类加载数据

如果像 torchvision.datasets.ImageFolder() 这样的预构建数据集创建器不存在，或者针对具体问题的解决方案根本不存在，那么可以自定义一个。

创建自定义方式来加载Dataset的优缺点：

优点：可以用几乎任何东西创建数据集，不限于 PyTorch 预构建的 Dataset 函数。
缺点：尽管可以用几乎任何东西创建一个数据集，但这并不意味着它就有效；同时会导致编写更多代码，这可能容易出现错误或性能问题。

实际操作是继承 torch.utils.data.Dataset （PyTorch中所有Dataset的基类）来复制torchvision.datasets.ImageFolder()。

从导入需要的模块开始：

Python 处理目录的 os（数据存储在目录中）。
Python 处理文件路径的 pathlib（每个图像都有一个唯一的文件路径）。
PyTorch 的所有的东西。
用于加载图像的 PIL 的 Image 类。
继承 torch.utils.data.Dataset 创建自定义数据集。
torchvision.transforms 把图像变成张量。
来自 Python 的 typing 模块的各种类型，为代码添加类型提示。

In [13]:

import os
import pathlib
import torch

from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
from typing import Tuple, Dict, List

获取数据类名

首先实现获取数据类名的函数，获取如 ['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2} 的信息：

In [14]:

def find_classes(directory: str) -> Tuple[List[str], Dict[str, int]]:
    """Finds the class folder names in a target directory.
    
    Assumes target directory is in standard image classification format.

    Args:
        directory (str): target directory to load classnames from.

    Returns:
        Tuple[List[str], Dict[str, int]]: (list_of_class_names, dict(class_name: idx...))
    
    Example:
        find_classes("food_images/train")
        >>> (["class_1", "class_2"], {"class_1": 0, ...})
    """
    # 1. 通过扫描目标目录获取类名
    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
    
    # 2. 如果找不到类名，则引发错误
    if not classes:
        raise FileNotFoundError(f"Couldn't find any classes in {directory}.")
        
    # 3. 创建索引标签的字典
    class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
    return classes, class_to_idx

测试一下：

In [15]:

find_classes(train_dir)

Out[15]:

(['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2})

构建自定义数据集类

将构建一个类来复刻 torchvision.datasets.ImageFolder()的功能。

分析如下：

继承 torch.utils.data.Dataset。
用 targ_dir 参数（目标数据目录）和 transform 参数初始化子类。
创建属性：目标图像路径、transform（可以是 None）， classes 和 class_to_idx （来自 find_classes() 函数）。
创建一个函数从文件中加载图像并返回它们，可以使用 PIL 或 torchvision.io。
重写 torch.utils.data.Dataset 的 __len__ 方法，返回数据集中的样本数量。（不必需）
重写 torch.utils.data.Dataset 的 __getitem__ 方法以返回数据集中的单个样本。（必需）

In [16]:

from torch.utils.data import Dataset

class CustomImageFolder(Dataset):
    def __init__(self, targ_dir: str, transform=None) -> None:
        self.paths = list(pathlib.Path(targ_dir).glob("*/*.jpg"))
        self.transform= transform
        self.classes, self.class_to_idx = find_classes(targ_dir)

    def load_image(self, index: int) -> Image.Image:
        image_path = self.paths[index]
        return Image.open(image_path)

    def __len__(self) -> int:
        return len(self.paths)

    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
        img = self.load_image(index)
        class_name = self.paths[index].parent.name # 要求 data_folder/class_name/image.jpeg
        class_idx = self.class_to_idx[class_name]

        if self.transform:
            return self.transform(img), class_idx # (X, y)
        else:
            return img, class_idx # (X, y)

重新设置数据转变器：

In [17]:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])

test_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])

接着实例化数据：

In [18]:

train_data_custom = CustomImageFolder(targ_dir=train_dir,
                                      transform=train_transforms)
test_data_custom = CustomImageFolder(targ_dir=test_dir,
                                     transform=test_transforms)

train_data_custom.classes, train_data_custom.class_to_idx, len(train_data_custom), len(test_data_custom)

Out[18]:

(['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2}, 225, 75)

测试 `getitem`

直接上函数：

In [19]:

# 1. Take in a Dataset as well as a list of class names
def display_random_images(dataset: torch.utils.data.dataset.Dataset,
                          classes: List[str] = None,
                          n: int = 10,
                          display_shape: bool = True,
                          seed: int = None):
    
    # 2. Adjust display if n too high
    if n > 10:
        n = 10
        display_shape = False
        print(f"For display purposes, n shouldn't be larger than 10, setting to 10 and removing shape display.")
    
    # 3. Set random seed
    if seed:
        random.seed(seed)

    # 4. Get random sample indexes
    random_samples_idx = random.sample(range(len(dataset)), k=n)

    # 5. Setup plot
    plt.figure(figsize=(16, 5))

    # 6. Loop through samples and display random samples 
    for i, targ_sample in enumerate(random_samples_idx):
        targ_image, targ_label = dataset[targ_sample][0], dataset[targ_sample][1]

        # 7. Adjust image tensor shape for plotting: [color_channels, height, width] -> [color_channels, height, width]
        targ_image_adjust = targ_image.permute(1, 2, 0)

        # Plot adjusted samples
        plt.subplot(1, n, i+1)
        plt.imshow(targ_image_adjust)
        plt.axis("off")
        if classes:
            title = f"class: {classes[targ_label]}"
            if display_shape:
                title = title + f"\nshape: {targ_image_adjust.shape}"
        plt.title(title)

调用测试：

In [20]:

# Display random images from ImageFolder created Dataset
display_random_images(train_data, 
                      n=5, 
                      classes=class_names,
                      seed=None)

In [21]:

display_random_images(train_data_custom, 
                      n=5, 
                      classes=class_names,
                      seed=None) # Try setting the seed for reproducible images

看起来生效。

把自定义数据类变成 DataLoader

通过 CustomImageFolder 类，可以将原始图像转换为数据集（特征映射到标签或 X 映射到 y ）。

因为自定义数据集的继承 torch.utils.data，所以可以通过 torch.utils.data.DataLoader() 直接使用它们。

In [22]:

from torch.utils.data import DataLoader

train_dataloader_custom = DataLoader(dataset=train_data_custom, 
                                     batch_size=1, 
                                     num_workers=0, 
                                     shuffle=True)
test_dataloader_custom = DataLoader(dataset=test_data_custom,
                                    batch_size=1, 
                                    num_workers=0, 
                                    shuffle=False)

最后获取 train_dataloader_custom 中每个可迭代项的 Shape 信息：

In [23]:

img_custom, label_custom = next(iter(train_dataloader_custom))

img_custom.shape, label_custom.shape

Out[23]:

(torch.Size([1, 3, 64, 64]), torch.Size([1]))

其他形式的转换（数据增强）

目前已经看到了对数据的一些变换，但还有更多，可以在 torchvision.transforms 文档中查阅。

变换的目的是以某种方式改变图像，如裁剪、随机删除部分、随即旋转等等。进行这类转换通常被称为数据增强。

数据增强是通过人为地增加训练集的多样性来改变数据的过程。

对图像执行数据增强的许多示例在：https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_illustrations.html

研究表明，随机变换（如 transform.RandAugment() 和 transform.TrivialAugmentWide()）通常比手工选择的变换表现得更好。

在 transforms.TrivialAugmentWide() 中需要注意的主要参数是 num_magnitude_bins=31。

它定义了将选择多少范围的强度值来应用某个转换，0 表示没有范围，31 表示最大范围（最高强度的最高机会）。

将 transforms.TrivialAugmentWide() 合并到 transforms.Compose() 中：

In [24]:

from torchvision import transforms

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.TrivialAugmentWide(num_magnitude_bins=31),
    transforms.ToTensor()
])

test_transforms = transforms.Compose([
    transforms.Resize((224, 224)), 
    transforms.ToTensor()
])

看看效果：

In [25]:

image_path_list = list(image_path.glob("*/*/*.jpg"))

plot_transformed_images(
    image_paths=image_path_list,
    transform=train_transforms,
    n=3,
    seed=None
)

模型0：没有数据增强的 TinyVGG

定义 transform：

In [26]:

simple_transform = transforms.Compose([ 
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

加载数据：

In [27]:

import os
from torchvision import datasets
from torch.utils.data import DataLoader

train_data_simple = datasets.ImageFolder(root=train_dir, transform=simple_transform)
test_data_simple = datasets.ImageFolder(root=test_dir, transform=simple_transform)

BATCH_SIZE = 32
NUM_WORKERS = 3 # 个人修改，不全部使用
print(f"batch size: {BATCH_SIZE}, workers: {NUM_WORKERS}")

train_dataloader_simple = DataLoader(train_data_simple, 
                                     batch_size=BATCH_SIZE, 
                                     shuffle=True, 
                                     num_workers=NUM_WORKERS)

test_dataloader_simple = DataLoader(test_data_simple, 
                                    batch_size=BATCH_SIZE, 
                                    shuffle=False, 
                                    num_workers=NUM_WORKERS)

len(train_dataloader_simple), len(test_dataloader_simple)

batch size: 32, workers: 3

Out[27]:

(8, 3)

直接创建模型，同视觉一节：

In [28]:

class TinyVGG(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(input_shape, hidden_units, kernel_size=3,stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units,  hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,stride=2)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16,
                      out_features=output_shape)
        )
    
    def forward(self, x: torch.Tensor):
        return self.classifier(self.conv_block_2(self.conv_block_1(x)))

torch.manual_seed(42)
model_0 = TinyVGG(input_shape=3, # (3, RGB) 
                  hidden_units=10, 
                  output_shape=len(train_data.classes)).to(device)
model_0

Out[28]:

TinyVGG(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

使用 `torchinfo` 了解模型形状

需要安装 torchinfo 库：pip install torchinfo。

使用：summary(model, input_size=(batch_size, model_shape))

In [29]:

from torchinfo import summary
summary(model_0, input_size=[1, 3, 64, 64]) # 对示例输入大小进行测试传递

Out[29]:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
TinyVGG                                  [1, 3]                    --
├─Sequential: 1-1                        [1, 10, 32, 32]           --
│    └─Conv2d: 2-1                       [1, 10, 64, 64]           280
│    └─ReLU: 2-2                         [1, 10, 64, 64]           --
│    └─Conv2d: 2-3                       [1, 10, 64, 64]           910
│    └─ReLU: 2-4                         [1, 10, 64, 64]           --
│    └─MaxPool2d: 2-5                    [1, 10, 32, 32]           --
├─Sequential: 1-2                        [1, 10, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 10, 32, 32]           910
│    └─ReLU: 2-7                         [1, 10, 32, 32]           --
│    └─Conv2d: 2-8                       [1, 10, 32, 32]           910
│    └─ReLU: 2-9                         [1, 10, 32, 32]           --
│    └─MaxPool2d: 2-10                   [1, 10, 16, 16]           --
├─Sequential: 1-3                        [1, 3]                    --
│    └─Flatten: 2-11                     [1, 2560]                 --
│    └─Linear: 2-12                      [1, 3]                    7,683
==========================================================================================
Total params: 10,693
Trainable params: 10,693
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 6.75
==========================================================================================
Input size (MB): 0.05
Forward/backward pass size (MB): 0.82
Params size (MB): 0.04
Estimated Total Size (MB): 0.91
==========================================================================================

torchinfo.summary() 的输出提供了关于模型的大量信息。

Total params 是模型中参数的总数；
Estimated Total Size 是估计的总大小（MB）。

还可以看到输入和输出形状的变化，因为特定 input_size 的数据在模型中移动。

封装每 step 的训练和测试函数

In [30]:

def train_step(model: torch.nn.Module, 
               dataloader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               optimizer: torch.optim.Optimizer):
    model.train()
    
    train_loss, train_acc = 0, 0
    
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        y_pred = model(X)

        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

def test_step(model: torch.nn.Module, 
              dataloader: torch.utils.data.DataLoader, 
              loss_fn: torch.nn.Module):
    model.eval() 
    
    test_loss, test_acc = 0, 0
    
    with torch.inference_mode():
        for batch, (X, y) in enumerate(dataloader):
            X, y = X.to(device), y.to(device)
    
            test_pred_logits = model(X)
            
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()
            
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
            
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

封装训练函数

In [31]:

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5):

    results = {"train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    for epoch in range(epochs):
        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer)
        test_loss, test_acc = test_step(model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn)
        
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        results["train_loss"].append(train_loss.item() if isinstance(train_loss, torch.Tensor) else train_loss)
        results["train_acc"].append(train_acc.item() if isinstance(train_acc, torch.Tensor) else train_acc)
        results["test_loss"].append(test_loss.item() if isinstance(test_loss, torch.Tensor) else test_loss)
        results["test_acc"].append(test_acc.item() if isinstance(test_acc, torch.Tensor) else test_acc)

    return results

构建训练和测试循环

In [32]:

torch.manual_seed(42) 
torch.cuda.manual_seed(42)

NUM_EPOCHS = 5

model_0 = TinyVGG(input_shape=3, # 3, RGB 
                  hidden_units=10, 
                  output_shape=len(train_data.classes)).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(), lr=0.001)

from timeit import default_timer as timer 
start_time = timer()

model_0_results = train(model=model_0, 
                        train_dataloader=train_dataloader_simple,
                        test_dataloader=test_dataloader_simple,
                        optimizer=optimizer,
                        loss_fn=loss_fn, 
                        epochs=NUM_EPOCHS)

end_time = timer()
print(f"耗时: {end_time-start_time:.3f} seconds")

Epoch: 1 | train_loss: 1.1078 | train_acc: 0.2578 | test_loss: 1.1362 | test_acc: 0.2604
Epoch: 2 | train_loss: 1.0846 | train_acc: 0.4258 | test_loss: 1.1622 | test_acc: 0.1979
Epoch: 3 | train_loss: 1.1153 | train_acc: 0.2930 | test_loss: 1.1695 | test_acc: 0.1979
Epoch: 4 | train_loss: 1.0990 | train_acc: 0.2891 | test_loss: 1.1343 | test_acc: 0.1979
Epoch: 5 | train_loss: 1.0989 | train_acc: 0.2930 | test_loss: 1.1435 | test_acc: 0.1979
耗时: 46.904 seconds

效果很差，试试可视化损失，封装函数：

In [33]:

def plot_loss_curves(results: Dict[str, List[float]]):
    """Plots training curves of a results dictionary.

    Args:
        results (dict): dictionary containing list of values, e.g.
            {"train_loss": [...],
             "train_acc": [...],
             "test_loss": [...],
             "test_acc": [...]}
    """
    
    # Get the loss values of the results dictionary (training and test)
    loss = results['train_loss']
    test_loss = results['test_loss']

    # Get the accuracy values of the results dictionary (training and test)
    accuracy = results['train_acc']
    test_accuracy = results['test_acc']

    # Figure out how many epochs there were
    epochs = range(len(results['train_loss']))

    # Setup a plot 
    plt.figure(figsize=(10, 3))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, loss, label='train_loss')
    plt.plot(epochs, test_loss, label='test_loss')
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.legend()

    # Plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, accuracy, label='train_accuracy')
    plt.plot(epochs, test_accuracy, label='test_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

调用：

In [34]:

plot_loss_curves(model_0_results)

探究损失函数

查看训练和测试损失曲线是查看模型是否过拟合的好方法。

过拟合模型是在训练集上比在验证/测试集上表现更好，训练损失远低于测试损失。
当训练和测试损失没有想要的那么低时，这被认为是欠拟合。

训练和测试损失曲线的理想位置是它们彼此紧密排列。

处理过拟合

由于过拟合的主要问题是模型太好地拟合训练数据，防止过拟合的一种常见技术称为正则化。

预防过拟合的操作：

使用更多数据：拥有更多的数据使模型有更多的机会学习样式，这些样式可能更容易推广到新的示例。
简化模型：如果当前模型已经过拟合训练数据，则模型可能过于复杂。这意味着它对数据的模式学习得太好，无法很好地推广到看不见的数据。简化模型的一种方法是减少它使用的层数或减少每层中隐藏单元的数量。
数据增强：人为地为数据添加了更多的多样性。如果模型能够学习增强数据中的模式，则模型可能能够更好地概括看不见的数据。
迁移学习：迁移学习涉及利用一个模型已经学会使用的模式（也称为预训练权重）作为您自己任务的基础。在此例子中，可以使用一个在各种图像上预训练的计算机视觉模型，然后稍微调整它，使其更专门用于食物图像。
使用 dropout 层：dropout 层随机删除神经网络中隐藏层之间的连接，有效地简化了模型，也使剩余的连接更好。
使用衰减的学习率：在模型训练时慢慢降低学习率。越接近收敛，越希望权重更新越小。
使用早停：早期停止在模型训练开始过度拟合之前停止。例如，假设模型的损失在过去 10（这个数字是任意的）个 epoch 中停止下降，可能希望在这里停止模型训练，并使用损失最低的模型权重（10 epoch 之前）。

处理欠拟合

当模型拟合不足时，它被认为对训练集和测试集的预测能力较差。从本质上讲，欠拟合模型将无法将损失值降低到期望的水平。

目前的损失曲线，认为 TinyVGG 模型 model_0 对数据拟合不足。处理欠拟合背后的主要思想是提高模型的预测能力。

处理欠拟合的操作：

增加模型隐藏层或隐层神经元：如果模型拟合不足，可能没有足够的能力来学习所需的模式/权重/数据表示来进行预测。为模型添加更多预测能力的一种方法是增加这些层中隐藏层/单元的数量。
调整学习率：也许模型的学习率太高了。而且它试图在每个时期更新权重太多，从而无法学习任何东西。在这种情况下，可以降低学习率。
使用迁移学习：迁移学习能够防止过拟合和欠拟合。它涉及到使用以前工作模型中的模式，并根据当前问题进行调整。
训练更长时间：模型可能需要更多的时间来学习数据的表示。如果你在小型实验中发模型没有学习到任何东西，也许让它训练更多的 epoch 可能会带来更好的性能。
减少正则化：也许因为试图防止过度拟合导致模型是欠拟合的。

模型1：数据增强后的 TinyVGG

修改数据 transform：

In [35]:

train_transform_trivial_augment = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.TrivialAugmentWide(num_magnitude_bins=31),
    transforms.ToTensor() 
])

test_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])

再次处理数据集：

In [36]:

train_data_augmented = datasets.ImageFolder(train_dir, transform=train_transform_trivial_augment)
test_data_simple = datasets.ImageFolder(test_dir, transform=test_transform)

train_data_augmented, test_data_simple

Out[36]:

(Dataset ImageFolder
     Number of datapoints: 225
     Root location: data\pizza_steak_sushi\train
     StandardTransform
 Transform: Compose(
                Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
                TrivialAugmentWide(num_magnitude_bins=31, interpolation=InterpolationMode.NEAREST, fill=None)
                ToTensor()
            ),
 Dataset ImageFolder
     Number of datapoints: 75
     Root location: data\pizza_steak_sushi\test
     StandardTransform
 Transform: Compose(
                Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
                ToTensor()
            ))

转成 DataLoader：

In [37]:

BATCH_SIZE = 32
NUM_WORKERS = 3

torch.manual_seed(42)
train_dataloader_augmented = DataLoader(train_data_augmented, 
                                        batch_size=BATCH_SIZE, 
                                        shuffle=True,
                                        num_workers=NUM_WORKERS)

test_dataloader_simple = DataLoader(test_data_simple, 
                                    batch_size=BATCH_SIZE, 
                                    shuffle=False, 
                                    num_workers=NUM_WORKERS)

重新实例化模型：

In [38]:

torch.manual_seed(42)
model_1 = TinyVGG(
    input_shape=3,
    hidden_units=10,
    output_shape=len(train_data_augmented.classes)).to(device)
model_1

Out[38]:

TinyVGG(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

开始训练：

In [39]:

torch.manual_seed(42) 
torch.cuda.manual_seed(42)

NUM_EPOCHS = 5

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_1.parameters(), lr=0.001)

start_time = timer()

model_1_results = train(model=model_1, 
                        train_dataloader=train_dataloader_augmented,
                        test_dataloader=test_dataloader_simple,
                        optimizer=optimizer,
                        loss_fn=loss_fn, 
                        epochs=NUM_EPOCHS)

end_time = timer()
print(f"耗时: {end_time-start_time:.3f} seconds")

Epoch: 1 | train_loss: 1.1073 | train_acc: 0.2500 | test_loss: 1.1060 | test_acc: 0.2604
Epoch: 2 | train_loss: 1.0793 | train_acc: 0.4258 | test_loss: 1.1380 | test_acc: 0.2604
Epoch: 3 | train_loss: 1.0805 | train_acc: 0.4258 | test_loss: 1.1684 | test_acc: 0.2604
Epoch: 4 | train_loss: 1.1287 | train_acc: 0.3047 | test_loss: 1.1618 | test_acc: 0.2604
Epoch: 5 | train_loss: 1.0895 | train_acc: 0.4258 | test_loss: 1.1470 | test_acc: 0.2604
耗时: 47.582 seconds

看起来效果也不好，绘制损失趋势图：

In [40]:

plot_loss_curves(model_1_results)

比较并评估模型

使用 pandas 并绘图：

In [41]:

import pandas as pd
model_0_df = pd.DataFrame(model_0_results)
model_1_df = pd.DataFrame(model_1_results)

# Setup a plot 
plt.figure(figsize=(15, 8))

# Get number of epochs
epochs = range(len(model_0_df))

# Plot train loss
plt.subplot(2, 2, 1)
plt.plot(epochs, model_0_df["train_loss"], label="Model 0")
plt.plot(epochs, model_1_df["train_loss"], label="Model 1")
plt.title("Train Loss")
plt.xlabel("Epochs")
plt.legend()

# Plot test loss
plt.subplot(2, 2, 2)
plt.plot(epochs, model_0_df["test_loss"], label="Model 0")
plt.plot(epochs, model_1_df["test_loss"], label="Model 1")
plt.title("Test Loss")
plt.xlabel("Epochs")
plt.legend()

# Plot train accuracy
plt.subplot(2, 2, 3)
plt.plot(epochs, model_0_df["train_acc"], label="Model 0")
plt.plot(epochs, model_1_df["train_acc"], label="Model 1")
plt.title("Train Accuracy")
plt.xlabel("Epochs")
plt.legend()

# Plot test accuracy
plt.subplot(2, 2, 4)
plt.plot(epochs, model_0_df["test_acc"], label="Model 0")
plt.plot(epochs, model_1_df["test_acc"], label="Model 1")
plt.title("Test Accuracy")
plt.xlabel("Epochs")
plt.legend();

最后封装一个函数，使得可以外部输入图片路径，然后进行预测：

In [42]:

import torchvision

def pred_and_plot_image(model: torch.nn.Module, 
                        image_path: str, 
                        class_names: List[str] = None, 
                        transform=None,
                        device: torch.device = device):
    """Makes a prediction on a target image and plots the image with its prediction."""
    
    # 1. 加载图像并将张量值转换为float32
    target_image = torchvision.io.read_image(str(image_path)).type(torch.float32)
    
    # 2. 将图像像素值除以255，得到[0,1]之间的值
    target_image = target_image / 255. 
    
    # 3. 作数据转换
    if transform:
        target_image = transform(target_image)
    
    # 4. 确保模型在目标设备上
    model.to(device)
    
    # 5. 打开模型评估模式
    model.eval()
    with torch.inference_mode():
        # 为图像添加额外的维度
        target_image = target_image.unsqueeze(dim=0)
    
        # 对具有额外维度的图像进行预测，并将其发送到目标设备
        target_image_pred = model(target_image.to(device))
        
    # 6. 转换logits -> 预测概率
    target_image_pred_probs = torch.softmax(target_image_pred, dim=1)

    # 7. 转换预测概率 -> 预测标签
    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
    
    # 8. 将图像与预测和预测概率一起绘制
    plt.imshow(target_image.squeeze().permute(1, 2, 0)) # 确保它的大小适合 matplotlib
    if class_names:
        title = f"Pred: {class_names[target_image_pred_label.cpu()]} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    else: 
        title = f"Pred: {target_image_pred_label} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    plt.title(title)
    plt.axis(False);

In [43]:

test_img_path = "data/pizza_steak_sushi/test/pizza/1687143.jpg"
custom_image_transform = transforms.Compose([
    transforms.Resize((64, 64))
])

pred_and_plot_image(model=model_1,
                    image_path=test_img_path,
                    class_names=class_names,
                    transform=custom_image_transform,
                    device=device)

PyTorch 中的计算机视觉

PyTorch 模块化

Style Transfer By Fingsinz