Gatys 等人提出的风格迁移方法

2025-04-09

GATYS L, ECKER A, BETHGE M. A Neural Algorithm of Artistic Style[J/OL]. Journal of Vision, 2016: 326. http://dx.doi.org/10.1167/16.12.326. DOI:10.1167/16.12.326.

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

摘要：在美术中，尤其是绘画，人类已经掌握了通过构成图像内容和风格之间复杂的相互作用来创造独特视觉体验的技能。到目前为止，这一过程的算法基础尚不清楚，也没有具有类似能力的人工系统。然而，在视觉感知的其他关键领域，如物体和人脸识别，最近一类被称为深度神经网络的生物启发视觉模型证明了接近人类的表现。在这里，我们介绍了一个基于深度神经网络的人工系统，它可以创建高感知质量的艺术图像。该系统利用神经表征对任意图像的内容和风格进行分离和重组，为艺术图像的创作提供了神经算法。此外，鉴于性能优化的人工神经网络和生物视觉之间惊人的相似性，我们的工作为理解人类如何创造和感知艺术图像的算法提供了一条道路。

主要内容

内容与风格的分离

利用卷积神经网络（CNN）不同层次的特征表示：

内容特征：由 CNN 高层（如 conv4_2）捕获，保留图像的高层语义信息（如物体及其布局），但忽略细节像素。
风格特征：通过计算多层特征图的 Gram 矩阵（特征相关性）来捕捉纹理、颜色和局部结构，形成多尺度的风格表示。

图像生成方法

损失函数：联合优化内容损失（$\mathcal{L}_{content}$）和风格损失（$\mathcal{L}_{style}$）：

$$ \mathcal{L}_{total} = \alpha \mathcal{L}_{content} + \beta \mathcal{L}_{style} $$

内容损失：基于目标图像与生成图像在指定层的特征差异（均方误差）。
风格损失：基于 Gram 矩阵的差异，通过多层（如conv1_1至conv5_1）加权求和。

优化过程：从白噪声图像出发，通过梯度下降逐步调整，使生成图像同时匹配目标内容和风格。

网络架构与改进

使用 VGG-19 网络，移除全连接层，仅保留卷积和池化层。
将最大池化替换为平均池化，以改善梯度流动和生成效果。

关键创新点

Gram 矩阵表征风格：Gram 矩阵通过计算不同特征图之间的相关性，有效捕捉纹理的统计特性（如颜色分布、笔触方向），从而将风格抽象为多尺度的统计信息。
分层控制风格与内容：
- 高层内容层（如 conv4_2）保留全局结构，适合内容重建。
- 多层级风格层（低层到高层）分别捕捉不同尺度的局部纹理（低层）和整体色彩协调（高层）。
- 通过调整使用的层数（如仅用低层生成局部纹理）和损失权重（$\alpha/\beta$），可灵活控制生成效果。

实验结果

成功将名画风格（如梵高《星空》、蒙克《呐喊》）应用到同一张照片（图宾根内卡河畔），生成图像既保留原图内容，又复现艺术风格。

参数影响分析：

层数选择：使用更高层风格特征（如包含 conv5_1）会生成更平滑、连贯的视觉效果。
权重调整：增大 $\alpha/\beta$（侧重内容）保留更多原图结构；减小 $\alpha/\beta$（侧重风格）则强化纹理，弱化内容。

实现

TensorFlow-Github：https://github.com/lengstrom/fast-style-transfer

风格迁移评价

特征变换 - WCT

Style Transfer By Fingsinz