How To Convert Tensorflow Model to PyTorch

Updated:

The two major deep learning frameworks, Tensorflow and PyTorch, implement the same deep learning model in slightly different forms. Therefore, even if they are implementations of the same model, it is not easy to use the model parameters as they are. This article describes how to convert the parameters of the Tensorflow implementation into the parameters of the PyTorch implementation according to the type of deep learning layer.

Tensorflow Dense, PyTorch Linear

First, the most basic Fully Connected Layer. Each framework implements it as a Dense layer or Linear layer. The basic operation of a Fully Connected Layer is as follows.

\begin{equation} \vb{X}_o = \vb{X}_i \vb{A} + \vb{B} \end{equation} Here, $\vb{X}_i, \vb{X}_o$ are row vectors of dimensions $N_i, N_o$, respectively, $A$ is a weight matrix of dimension $N_i \times N_o$, and $\vb{B}$ is a bias row vector of dimension $N_o$.

However, each framework does not store and use $\vb{A}, \vb{B}$ as they are. In PyTorch, it stores $\vb{A}^T$ exactly. Therefore, the operation in the Linear layer of PyTorch is as follows. \begin{equation} \vb{X}_o = \vb{X}_i \qty{\vb{A}^T}^T + \vb{B} \end{equation}

Therefore, when converting from the Tensorflow model to the PyTorch model, the weight is transposed and stored.

import einops

tf_weight, tf_bias = tf_fc.get_weights()

tf_weight_rearrange = einops.rearrange(tf_weight, 'a b -> b a')
torch_fc.weight.data.copy_(torch.Tensor(tf_weight_rearrange))

torch_fc.bias.data.copy_(torch.Tensor(tf_bias))

Batch Normalization Layer

Next, let’s look at the conversion of the Batch Normalization Layer. Batch Normalization performs the following normalization operation. \begin{equation} \vb{X}_o = \frac{\vb{X}_i - \mathbb{E}[\vb{X}]}{\mathbb{V}[\vb{X}] + \varepsilon} \times \gamma + \beta \end{equation} Here, $\gamma, \beta, \mathbb{E}[\vb{X}], \mathbb{V}[\vb{X}]$ are all trainable parameters, so they must be converted and passed to PyTorch. In addition, the value of $\varepsilon$ specified as a default value is $10^{-3}$ in Tensorflow, while it is $10^{-5}$ in PyTorch, so it must also be changed.

gamma, beta, moving_mean, moving_var  = tf_batch.get_weights()

torch_batch.weight.data.copy_(torch.Tensor(gamma))
torch_batch.bias.data.copy_(torch.Tensor(beta))

torch_batch.running_mean.data.copy_(torch.Tensor(moving_mean))
torch_batch.running_var.data.copy_(torch.Tensor(moving_var))

torch_batch.eps = tf_batch.epsilon

2D Convolution Layer

Finally, let’s look at the conversion of the 2D Convolution Layer. The dimension shape of the layer is significantly different between Tensorflow and PyTorch, so the input and output must be modified accordingly. The weight of the Convolution Layer is determined by the input and output channel dimension $C_{in}, C_{out}$ and the 2D kernel size $H, W$. Tensorflow stores the weight in the shape of $(H, W, C_{in}, C_{out})$, while PyTorch stores it in the shape of $(C_{out}, C_{in}, H, W)$. The code to convert it is as follows.

tf_weight, tf_bias = tf_conv.get_weights()

tf_weight_rearrange = einops.rearrange(tf_weight, 'h w i o -> o i h w')
torch_conv.weight.data.copy_(torch.Tensor(tf_weight_rearrange))
torch_conv.bias.data.copy_(torch.Tensor(tf_bias))

Flatten after Convolution

In addition, the input and output features are different between the two frameworks, for Batch size $N$, Channel length $C$, Image size $H, W$, Tensorflow takes the shape of $(N, H, W, C)$, while PyTorch takes the shape of $(N, C, H, W)$. After the Convolution layer, there is usually a process of Flattening the Convolution feature to use it in the Fully Connected layer. Because of the above, if Flatten is performed directly, the index of the feature vector will be greatly mixed. There are two ways to solve this problem.

  1. Rearrange the feature vector order.
  2. Rearrange the weight of the Fully Connected Layer that follows.

I chose the second method for simplicity of the model structure (since it only needs to be done once at the conversion stage), but both methods are as follows.

# Method 1
class Model(nn.Module):
    def __init__(self, c_in, c_out, kernel_size):
        self.conv = nn.Conv2d(c_in, c_out, kernel_size=kernel_size)
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(f_in, f_out)

    def forward(self, x):
        x = self.conv(x)
        x = einops.rearrange(x, 'n c h w -> n h w c')
        x = self.flatten(x)
        x = self.fc(x)

# Method 2
# For conversion of the first FC layer after convolution
tf_weight, tf_bias = tf_fc.get_weights()
tf_weight_rearrange = einops.rearrange(tf_weight, '(h w c) o -> o (c h w)', h=.., w=..)
torch_fc.weight.data.copy_(torch.Tensor(tf_weight_rearrange))
torch_fc.bias.data.copy_(torch.Tensor(tf_bias))

Summary

In fact, there are more layers than this, but the implementation layers I used in this work are as follows. There is a standard model format called onnx that acts as a bridge between frameworks. Using the tf2onnx module and onnx2torch module, it is possible to convert through the onnx format. I think it would be good to use a more complex method for internal structure study.