2016년 발표된 파이썬 기반 오픈소스 머신러닝/딥러닝 라이브러리
Numpy
, Pandas
등 데이터를 다룰 때 이용되는 기본 툴과 연계해서 쓸 수 있고, 구현이 쉽고 빠르다는 것이 장점
우선 사전에 Anaconda
가 설치된 환경이라고 가정.
https://pytorch.org 에 접속해 설치환경을 선택해주면,
아래와 같은 코드를 얻을 수 있다.
와 같은 화면이 나오고, 중간에 'y'를 한 번 눌러주면 설치는 완료된다.
M1 맥의 경우엔 아직 호환이 완전히 되지 않아 가상환경에서 PyTorch를 설치해서 사용해야 한다.
# 가상환경 생성
# 가상환경 이름 : HDMT-torch (임의로 설정가능)
% conda create --name HDMT-torch python=3.8
# pytorch 다운로드
% conda activate HDMT-torch
% conda install -c conda-forge pytorch=1.9.0
# 설치 확인
% python
>>> import torch
>>> x = torch.rand(5,3)
>>> print(x)
Jupyter notebook 과 연결
# 가상환경에 jupyter notebook 설치
% conda activate DL-torch
% conda install jupyter notebook
# 가상환경의 jupyter notebook 과 local의 jupyter 를 kernel로 연결
% python -m ipykernel install --user --name HDMT-torch --display-name "PyTorch_kernel"
# kernel 연결 해제를 위한 코드
% jupyter kernelspec uninstall "PyTorch_kernel"
Tensor
라는 자료 구조를 사용한다.import torch
t = torch.FloatTensor([0., 1., 2., 3., 4., 5., 6.])
print(t)
tensor([0., 1., 2., 3., 4., 5., 6.])
numpy
와 거의 비슷하게 적용된다.t = torch.arange(6)
t
tensor([0, 1, 2, 3, 4, 5])
t * 3
tensor([ 0, 3, 6, 9, 12, 15])
print(t.dim())
print(t.size())
print(t.shape)
1 torch.Size([6]) torch.Size([6])
x = torch.FloatTensor([[1,2,3],[4,5,6]])
x_gpu = x.cuda()
x_gpu
tensor([[1., 2., 3.], [4., 5., 6.]], device='cuda:0')
x_cpu = x_gpu.cpu()
x_cpu
tensor([[1., 2., 3.], [4., 5., 6.]])
에서, 미분을 구하면
$$ \begin{aligned} \frac{dy}{dw} &= 2w \\ \frac{dz}{dw} &= \frac{dz}{dw} \times \frac{dz}{dw}\\ &= 4w \end{aligned} $$import torch
w = torch.tensor(3.0, requires_grad=True)
y = w**2
z = 2*y + 5
z.backward()
print('수식을 w로 미분한 값 : {}'.format(w.grad))
수식을 w로 미분한 값 : 12.0
ResNet은 아래 그림에 보이는 residual block을 쌓아서 만든 CNN 구조이다.
Deep Residual Learning for Image Recognition, Kaiming He, 2015 에서는 총 layer의 개수에 따라 ResNet-18, 34, 50, 101, 152의 5가지 네트워크가 있다. 아래 그림은 ResNet-34의 구조.
[코드로 구현해야 할 함수, 구조]
데이터 불러오기, normalization
Residual block & ResNet 구조
loss function, metric 계산 함수
최종 학습 함수
직접 학습해보기 위해 논문에서 사용한 데이터보다 크기가 상대적으로 작은 STL10 데이터 셋을 이용한다.
torchvision 패키지에서 제공하는 데이터 셋.
10개의 class가 지정되어 있다.
### import packages
# For model
import torch
import torch.nn as nn # 기초 구조인 convolution layer, activation function, loss
import torch.nn.functional as F
from torchsummary import summary # Network 구조를 출력할 때 이용.
from torch import optim # optimization function
from torch.optim.lr_scheduler import StepLR # 변화하는 learning rate 추적
# For dataset and transformation
from torchvision import datasets # torch와 함께 사용되는 computer vision 분야의 데이터, 모델
import torchvision.transforms as transforms # 데이터 전처리에 이용
from torch.utils.data import DataLoader # 가공된 데이터를 pytorch 모형에서 사용되는 form으로
import os
# For displaying images
from torchvision import utils
import matplotlib.pyplot as plt
%matplotlib inline
# utils
import numpy as np
import time
import copy
# 경로 지정
path2data = '/Users/user5/opt/Jupyter/ResNet_Torch'
# 데이터 불러오기
train_ds = datasets.STL10(path2data, split='train', download=True, transform=transforms.ToTensor())
val_ds = datasets.STL10(path2data, split='test', download=True, transform=transforms.ToTensor())
Downloading http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz to /Users/user5/opt/Jupyter/ResNet_Torch/stl10_binary.tar.gz
Extracting /Users/user5/opt/Jupyter/ResNet_Torch/stl10_binary.tar.gz to /Users/user5/opt/Jupyter/ResNet_Torch Files already downloaded and verified
# train 데이터에 각각에 대한 R, G, B값의 평균, 표준편차.
# 각 데이터는 3개의 값을 갖는다고 볼 수 있다.
train_meanRGB = [np.mean(x.numpy(), axis=(1,2)) for x, _ in train_ds]
train_stdRGB = [np.std(x.numpy(), axis=(1,2)) for x, _ in train_ds]
# 전체 데이터의 R, G, B 값에 대한 평균, 표준편차.
train_meanR = np.mean([m[0] for m in train_meanRGB])
train_meanG = np.mean([m[1] for m in train_meanRGB])
train_meanB = np.mean([m[2] for m in train_meanRGB])
train_stdR = np.mean([s[0] for s in train_stdRGB])
train_stdG = np.mean([s[1] for s in train_stdRGB])
train_stdB = np.mean([s[2] for s in train_stdRGB])
# For validation set
val_meanRGB = [np.mean(x.numpy(), axis=(1,2)) for x, _ in val_ds]
val_stdRGB = [np.std(x.numpy(), axis=(1,2)) for x, _ in val_ds]
val_meanR = np.mean([m[0] for m in val_meanRGB])
val_meanG = np.mean([m[1] for m in val_meanRGB])
val_meanB = np.mean([m[2] for m in val_meanRGB])
val_stdR = np.mean([s[0] for s in val_stdRGB])
val_stdG = np.mean([s[1] for s in val_stdRGB])
val_stdB = np.mean([s[2] for s in val_stdRGB])
print(train_meanR, train_meanG, train_meanB)
print(val_meanR, val_meanG, val_meanB)
0.4467106 0.43980986 0.40664646 0.44723064 0.4396425 0.40495726
# Transformation 정의
train_transformation = transforms.Compose([
transforms.ToTensor(),
transforms.Resize(224),
transforms.Normalize([train_meanR, train_meanG, train_meanB],[train_stdR, train_stdG, train_stdB]),
])
val_transformation = transforms.Compose([
transforms.ToTensor(),
transforms.Resize(224),
transforms.Normalize([train_meanR, train_meanG, train_meanB],[train_stdR, train_stdG, train_stdB]),
])
# 정의한 transformation 적용
train_ds.transform = train_transformation
val_ds.transform = val_transformation
# 가공한 데이터를 이용해 DataLoader 생성
train_dl = DataLoader(train_ds, batch_size=32, shuffle=True)
val_dl = DataLoader(val_ds, batch_size=32, shuffle=True)
# display sample images
def show(img, y=None, color=True):
npimg = img.numpy()
npimg_tr = np.transpose(npimg, (1,2,0))
plt.imshow(npimg_tr)
if y is not None:
plt.title('labels :' + str(y))
# np.random.seed(1)
# torch.manual_seed(1)
grid_size = 4
rnd_inds = np.random.randint(0, len(train_ds), grid_size)
print('image indices:',rnd_inds)
x_grid = [train_ds[i][0] for i in rnd_inds]
y_grid = [train_ds[i][1] for i in rnd_inds]
x_grid = utils.make_grid(x_grid, nrow=grid_size, padding=2)
show(x_grid, y_grid)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
image indices: [3112 1193 2540 1692]
class BasicBlock(nn.Module):
# channel 수 늘어날 때, output의 차원 조절
expansion = 1
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
# Conv -> BN -> ReLU -> Conv -> BN
# nn.Sequential은 각각의 층을 쌓아주는 역할
self.residual_function = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(),
nn.Conv2d(out_channels, out_channels * BasicBlock.expansion, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(out_channels * BasicBlock.expansion),
)
# identity mapping(input과 output의 feature map size, filter 수가 동일한 경우 사용.)
self.shortcut = nn.Sequential()
self.relu = nn.ReLU()
# projection mapping(in, out의 차원이 다를 때)
if stride != 1 or in_channels != BasicBlock.expansion * out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels * BasicBlock.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * BasicBlock.expansion)
)
def forward(self, x):
x = self.residual_function(x) + self.shortcut(x)
x = self.relu(x)
return x
class ResNet(nn.Module):
def __init__(self, block, num_block, num_classes=10, init_weights=True):
# block : 위에서 정의한 Residual Block
# num_block : 파트 별 각 block 의 개수
super().__init__()
# filter 개수 64개로 시작
self.in_channels=64
## ResNet-34 그림에서 구분된 색깔별로 conv1, ..., conv5
# 첫 번째 7 x 7 conv
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
# identity mapping 이용한 3 x 3 conv
self.conv2_x = self._make_layer(block, 64, num_block[0], 1)
self.conv3_x = self._make_layer(block, 128, num_block[1], 2)
self.conv4_x = self._make_layer(block, 256, num_block[2], 2)
self.conv5_x = self._make_layer(block, 512, num_block[3], 2)
# output layer
self.avg_pool = nn.AdaptiveAvgPool2d((1,1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
# weight 초기화
if init_weights:
self._initialize_weights()
def _make_layer(self, block, out_channels, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channels, out_channels, stride))
self.in_channels = out_channels * block.expansion
return nn.Sequential(*layers)
def forward(self,x):
output = self.conv1(x)
output = self.conv2_x(output)
x = self.conv3_x(output)
x = self.conv4_x(x)
x = self.conv5_x(x)
x = self.avg_pool(x)
x = x.view(x.size(0), -1) # -1 : 자동으로 차원 맞춰준다.
x = self.fc(x)
return x
# define weight initialization function
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
def resnet18():
return ResNet(BasicBlock, [2,2,2,2])
def resnet34():
return ResNet(BasicBlock, [3, 4, 6, 3])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = resnet34().to(device) # model on GPU
x = torch.randn(3, 3, 224, 224).to(device) # data on GPU
output = model(x)
print(output.size())
torch.Size([3, 10])
summary(model, (3, 224, 224), device=device.type)
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 112, 112] 9,408 BatchNorm2d-2 [-1, 64, 112, 112] 128 ReLU-3 [-1, 64, 112, 112] 0 MaxPool2d-4 [-1, 64, 56, 56] 0 Conv2d-5 [-1, 64, 56, 56] 36,864 BatchNorm2d-6 [-1, 64, 56, 56] 128 ReLU-7 [-1, 64, 56, 56] 0 Conv2d-8 [-1, 64, 56, 56] 36,864 BatchNorm2d-9 [-1, 64, 56, 56] 128 ReLU-10 [-1, 64, 56, 56] 0 BasicBlock-11 [-1, 64, 56, 56] 0 Conv2d-12 [-1, 64, 56, 56] 36,864 BatchNorm2d-13 [-1, 64, 56, 56] 128 ReLU-14 [-1, 64, 56, 56] 0 Conv2d-15 [-1, 64, 56, 56] 36,864 BatchNorm2d-16 [-1, 64, 56, 56] 128 ReLU-17 [-1, 64, 56, 56] 0 BasicBlock-18 [-1, 64, 56, 56] 0 Conv2d-19 [-1, 64, 56, 56] 36,864 BatchNorm2d-20 [-1, 64, 56, 56] 128 ReLU-21 [-1, 64, 56, 56] 0 Conv2d-22 [-1, 64, 56, 56] 36,864 BatchNorm2d-23 [-1, 64, 56, 56] 128 ReLU-24 [-1, 64, 56, 56] 0 BasicBlock-25 [-1, 64, 56, 56] 0 Conv2d-26 [-1, 128, 28, 28] 73,728 BatchNorm2d-27 [-1, 128, 28, 28] 256 ReLU-28 [-1, 128, 28, 28] 0 Conv2d-29 [-1, 128, 28, 28] 147,456 BatchNorm2d-30 [-1, 128, 28, 28] 256 Conv2d-31 [-1, 128, 28, 28] 8,192 BatchNorm2d-32 [-1, 128, 28, 28] 256 ReLU-33 [-1, 128, 28, 28] 0 BasicBlock-34 [-1, 128, 28, 28] 0 Conv2d-35 [-1, 128, 28, 28] 147,456 BatchNorm2d-36 [-1, 128, 28, 28] 256 ReLU-37 [-1, 128, 28, 28] 0 Conv2d-38 [-1, 128, 28, 28] 147,456 BatchNorm2d-39 [-1, 128, 28, 28] 256 ReLU-40 [-1, 128, 28, 28] 0 BasicBlock-41 [-1, 128, 28, 28] 0 Conv2d-42 [-1, 128, 28, 28] 147,456 BatchNorm2d-43 [-1, 128, 28, 28] 256 ReLU-44 [-1, 128, 28, 28] 0 Conv2d-45 [-1, 128, 28, 28] 147,456 BatchNorm2d-46 [-1, 128, 28, 28] 256 ReLU-47 [-1, 128, 28, 28] 0 BasicBlock-48 [-1, 128, 28, 28] 0 Conv2d-49 [-1, 128, 28, 28] 147,456 BatchNorm2d-50 [-1, 128, 28, 28] 256 ReLU-51 [-1, 128, 28, 28] 0 Conv2d-52 [-1, 128, 28, 28] 147,456 BatchNorm2d-53 [-1, 128, 28, 28] 256 ReLU-54 [-1, 128, 28, 28] 0 BasicBlock-55 [-1, 128, 28, 28] 0 Conv2d-56 [-1, 256, 14, 14] 294,912 BatchNorm2d-57 [-1, 256, 14, 14] 512 ReLU-58 [-1, 256, 14, 14] 0 Conv2d-59 [-1, 256, 14, 14] 589,824 BatchNorm2d-60 [-1, 256, 14, 14] 512 Conv2d-61 [-1, 256, 14, 14] 32,768 BatchNorm2d-62 [-1, 256, 14, 14] 512 ReLU-63 [-1, 256, 14, 14] 0 BasicBlock-64 [-1, 256, 14, 14] 0 Conv2d-65 [-1, 256, 14, 14] 589,824 BatchNorm2d-66 [-1, 256, 14, 14] 512 ReLU-67 [-1, 256, 14, 14] 0 Conv2d-68 [-1, 256, 14, 14] 589,824 BatchNorm2d-69 [-1, 256, 14, 14] 512 ReLU-70 [-1, 256, 14, 14] 0 BasicBlock-71 [-1, 256, 14, 14] 0 Conv2d-72 [-1, 256, 14, 14] 589,824 BatchNorm2d-73 [-1, 256, 14, 14] 512 ReLU-74 [-1, 256, 14, 14] 0 Conv2d-75 [-1, 256, 14, 14] 589,824 BatchNorm2d-76 [-1, 256, 14, 14] 512 ReLU-77 [-1, 256, 14, 14] 0 BasicBlock-78 [-1, 256, 14, 14] 0 Conv2d-79 [-1, 256, 14, 14] 589,824 BatchNorm2d-80 [-1, 256, 14, 14] 512 ReLU-81 [-1, 256, 14, 14] 0 Conv2d-82 [-1, 256, 14, 14] 589,824 BatchNorm2d-83 [-1, 256, 14, 14] 512 ReLU-84 [-1, 256, 14, 14] 0 BasicBlock-85 [-1, 256, 14, 14] 0 Conv2d-86 [-1, 256, 14, 14] 589,824 BatchNorm2d-87 [-1, 256, 14, 14] 512 ReLU-88 [-1, 256, 14, 14] 0 Conv2d-89 [-1, 256, 14, 14] 589,824 BatchNorm2d-90 [-1, 256, 14, 14] 512 ReLU-91 [-1, 256, 14, 14] 0 BasicBlock-92 [-1, 256, 14, 14] 0 Conv2d-93 [-1, 256, 14, 14] 589,824 BatchNorm2d-94 [-1, 256, 14, 14] 512 ReLU-95 [-1, 256, 14, 14] 0 Conv2d-96 [-1, 256, 14, 14] 589,824 BatchNorm2d-97 [-1, 256, 14, 14] 512 ReLU-98 [-1, 256, 14, 14] 0 BasicBlock-99 [-1, 256, 14, 14] 0 Conv2d-100 [-1, 512, 7, 7] 1,179,648 BatchNorm2d-101 [-1, 512, 7, 7] 1,024 ReLU-102 [-1, 512, 7, 7] 0 Conv2d-103 [-1, 512, 7, 7] 2,359,296 BatchNorm2d-104 [-1, 512, 7, 7] 1,024 Conv2d-105 [-1, 512, 7, 7] 131,072 BatchNorm2d-106 [-1, 512, 7, 7] 1,024 ReLU-107 [-1, 512, 7, 7] 0 BasicBlock-108 [-1, 512, 7, 7] 0 Conv2d-109 [-1, 512, 7, 7] 2,359,296 BatchNorm2d-110 [-1, 512, 7, 7] 1,024 ReLU-111 [-1, 512, 7, 7] 0 Conv2d-112 [-1, 512, 7, 7] 2,359,296 BatchNorm2d-113 [-1, 512, 7, 7] 1,024 ReLU-114 [-1, 512, 7, 7] 0 BasicBlock-115 [-1, 512, 7, 7] 0 Conv2d-116 [-1, 512, 7, 7] 2,359,296 BatchNorm2d-117 [-1, 512, 7, 7] 1,024 ReLU-118 [-1, 512, 7, 7] 0 Conv2d-119 [-1, 512, 7, 7] 2,359,296 BatchNorm2d-120 [-1, 512, 7, 7] 1,024 ReLU-121 [-1, 512, 7, 7] 0 BasicBlock-122 [-1, 512, 7, 7] 0 AdaptiveAvgPool2d-123 [-1, 512, 1, 1] 0 Linear-124 [-1, 10] 5,130 ================================================================ Total params: 21,289,802 Trainable params: 21,289,802 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.57 Forward/backward pass size (MB): 96.28 Params size (MB): 81.21 Estimated Total Size (MB): 178.07 ----------------------------------------------------------------
Parameter 수에 대한 계산을 해보면 정확한 수의 parameter로 모형이 만들어진 것을 알 수 있다.
예를 들어, 가장 첫 번째 Conv layer를 생각하면 $224 \times 224 \times 3$ 의 데이터에 $7 \times 7 \times 3$ 의 필터 64개가 convolution 연산에 사용된다. 따라서
$$(7 \times 7 \times 3) \times 64 = 9,408$$
개의 params가 필요하다.
BN 과정에선 64개의 channel에 대해 $\gamma, \beta$ 가 필요하기 때문에
$$2 \times 64 = 128$$
의 params가 필요하다.
3. ReLU 함수와 Pooling 과정에선 parameter가 필요없다. (0)
# 손실 함수로 crossentropy 함수 사용.
# reduction='sum' : 합한 값을 리턴
loss_func = nn.CrossEntropyLoss(reduction='sum')
# Optimization 함수로 Adam을 이용.
opt = optim.Adam(model.parameters(), lr=0.001)
# 10번 동안 accuracy 개선이 이루어지지 않으면, learning rate를 0.1 (10%) 로 줄인다.
from torch.optim.lr_scheduler import ReduceLROnPlateau
lr_scheduler = ReduceLROnPlateau(opt, mode='min', factor=0.1, patience=10)
# function to get current lr
def get_lr(opt):
for param_group in opt.param_groups:
return param_group['lr']
def metric_batch(output, target):
pred = output.argmax(1, keepdim=True) # argmax의 index 출력
corrects = pred.eq(target.view_as(pred)).sum().item() # 맞춘 것의 개수
return corrects
def loss_batch(loss_func, output, target, opt=None):
loss = loss_func(output, target)
metric_b = metric_batch(output, target)
if opt is not None:
opt.zero_grad()
loss.backward() # back propagation 과정이 자동으로 이루어짐
opt.step()
return loss.item(), metric_b
# function to calculate loss and metric per epoch
def loss_epoch(model, loss_func, dataset_dl, sanity_check=False, opt=None):
running_loss = 0.0
running_metric = 0.0
len_data = len(dataset_dl.dataset)
for xb, yb in dataset_dl:
xb = xb.to(device)
yb = yb.to(device)
output = model(xb)
loss_b, metric_b = loss_batch(loss_func, output, yb, opt)
running_loss += loss_b
if metric_b is not None:
running_metric += metric_b
if sanity_check is True:
break
loss = running_loss / len_data # 각 데이터의 평균 loss값
metric = running_metric / len_data # 맞춘 개수의 비율
return loss, metric
# function to start training
def train_val(model, params): # model : ResNet-34
num_epochs=params['num_epochs']
loss_func=params["loss_func"]
opt=params["optimizer"]
train_dl=params["train_dl"]
val_dl=params["val_dl"]
sanity_check=params["sanity_check"]
lr_scheduler=params["lr_scheduler"]
# path2weights=params["path2weights"]
loss_history = {'train': [], 'val': []}
metric_history = {'train': [], 'val': []}
best_loss = float('inf')
start_time = time.time()
for epoch in range(num_epochs):
current_lr = get_lr(opt)
print('Epoch {}/{}, current lr={}'.format(epoch, num_epochs-1, current_lr))
model.train()
train_loss, train_metric = loss_epoch(model, loss_func, train_dl, sanity_check, opt)
loss_history['train'].append(train_loss)
metric_history['train'].append(train_metric)
model.eval()
with torch.no_grad():
val_loss, val_metric = loss_epoch(model, loss_func, val_dl, sanity_check)
loss_history['val'].append(val_loss)
metric_history['val'].append(val_metric)
if val_loss < best_loss:
best_loss = val_loss
print('Get best val_loss')
lr_scheduler.step(val_loss)
print('train loss: %.6f, val loss: %.6f, accuracy: %.2f, time: %.4f min' %(train_loss, val_loss, 100*val_metric, (time.time()-start_time)/60))
print('-'*10)
return model, loss_history, metric_history
# definc the training parameters
params_train = {
'num_epochs':60,
'optimizer':opt,
'loss_func':loss_func,
'train_dl':train_dl,
'val_dl':val_dl,
'sanity_check':False,
'lr_scheduler':lr_scheduler,
'path2weights':'./models/weights.pt',
}
model, loss_hist, metric_hist = train_val(model, params_train)
Epoch 0/59, current lr=0.001 Get best val_loss train loss: 1.906893, val loss: 1.838223, accuracy: 27.80, time: 1.8935 min ---------- Epoch 1/59, current lr=0.001 train loss: 1.711235, val loss: 2.005734, accuracy: 25.50, time: 3.7734 min ---------- Epoch 2/59, current lr=0.001 Get best val_loss train loss: 1.657437, val loss: 1.702303, accuracy: 33.06, time: 5.6438 min ---------- Epoch 3/59, current lr=0.001 Get best val_loss train loss: 1.579703, val loss: 1.636876, accuracy: 37.80, time: 7.5198 min ---------- Epoch 4/59, current lr=0.001 train loss: 1.526695, val loss: 1.735830, accuracy: 35.01, time: 9.3881 min ---------- Epoch 5/59, current lr=0.001 Get best val_loss train loss: 1.446652, val loss: 1.397253, accuracy: 47.34, time: 11.2552 min ---------- Epoch 6/59, current lr=0.001 train loss: 1.373143, val loss: 1.431520, accuracy: 48.20, time: 13.1223 min ---------- Epoch 7/59, current lr=0.001 Get best val_loss train loss: 1.280482, val loss: 1.293679, accuracy: 51.44, time: 14.9930 min ---------- Epoch 8/59, current lr=0.001 Get best val_loss train loss: 1.206923, val loss: 1.293261, accuracy: 51.64, time: 16.8704 min ---------- Epoch 9/59, current lr=0.001 Get best val_loss train loss: 1.144072, val loss: 1.235218, accuracy: 55.39, time: 18.7454 min ---------- Epoch 10/59, current lr=0.001 train loss: 1.065722, val loss: 1.585936, accuracy: 48.10, time: 20.6229 min ---------- Epoch 11/59, current lr=0.001 Get best val_loss train loss: 1.009217, val loss: 1.181533, accuracy: 58.33, time: 22.4981 min ---------- Epoch 12/59, current lr=0.001 train loss: 0.947496, val loss: 1.381070, accuracy: 53.30, time: 24.3739 min ---------- Epoch 13/59, current lr=0.001 train loss: 0.856289, val loss: 1.279058, accuracy: 55.83, time: 26.2512 min ---------- Epoch 14/59, current lr=0.001 Get best val_loss train loss: 0.778776, val loss: 1.181465, accuracy: 59.17, time: 28.1277 min ---------- Epoch 15/59, current lr=0.001 train loss: 0.690270, val loss: 1.374157, accuracy: 57.16, time: 30.0060 min ---------- Epoch 16/59, current lr=0.001 train loss: 0.633957, val loss: 1.239093, accuracy: 58.58, time: 31.8788 min ---------- Epoch 17/59, current lr=0.001 train loss: 0.526141, val loss: 1.298696, accuracy: 59.04, time: 33.7428 min ---------- Epoch 18/59, current lr=0.001 train loss: 0.422507, val loss: 1.349279, accuracy: 61.46, time: 35.6094 min ---------- Epoch 19/59, current lr=0.001 train loss: 0.379842, val loss: 1.607382, accuracy: 55.16, time: 37.4777 min ---------- Epoch 20/59, current lr=0.001 train loss: 0.278846, val loss: 1.454984, accuracy: 62.32, time: 39.3506 min ---------- Epoch 21/59, current lr=0.001 train loss: 0.257138, val loss: 1.529728, accuracy: 60.22, time: 41.2251 min ---------- Epoch 22/59, current lr=0.001 train loss: 0.209964, val loss: 1.943922, accuracy: 57.93, time: 43.0974 min ---------- Epoch 23/59, current lr=0.0001 train loss: 0.089278, val loss: 1.309136, accuracy: 66.89, time: 44.9770 min ---------- Epoch 24/59, current lr=0.0001 train loss: 0.039906, val loss: 1.330076, accuracy: 67.19, time: 46.8522 min ---------- Epoch 25/59, current lr=0.0001 train loss: 0.028141, val loss: 1.331277, accuracy: 67.81, time: 48.7190 min ---------- Epoch 26/59, current lr=0.0001 train loss: 0.022297, val loss: 1.363443, accuracy: 67.38, time: 50.5900 min ---------- Epoch 27/59, current lr=0.0001 train loss: 0.018759, val loss: 1.381591, accuracy: 67.27, time: 52.4567 min ---------- Epoch 28/59, current lr=0.0001 train loss: 0.017231, val loss: 1.399006, accuracy: 67.40, time: 54.3236 min ---------- Epoch 29/59, current lr=0.0001 train loss: 0.014392, val loss: 1.404628, accuracy: 67.40, time: 56.1959 min ---------- Epoch 30/59, current lr=0.0001 train loss: 0.013281, val loss: 1.407874, accuracy: 67.34, time: 58.0703 min ---------- Epoch 31/59, current lr=0.0001 train loss: 0.011145, val loss: 1.442742, accuracy: 67.45, time: 59.9463 min ---------- Epoch 32/59, current lr=0.0001 train loss: 0.011081, val loss: 1.449811, accuracy: 67.62, time: 61.8234 min ---------- Epoch 33/59, current lr=0.0001 train loss: 0.011674, val loss: 1.462479, accuracy: 67.55, time: 63.6972 min ---------- Epoch 34/59, current lr=1e-05 train loss: 0.008042, val loss: 1.490112, accuracy: 67.54, time: 65.5710 min ---------- Epoch 35/59, current lr=1e-05 train loss: 0.009684, val loss: 1.476966, accuracy: 67.46, time: 67.4480 min ---------- Epoch 36/59, current lr=1e-05 train loss: 0.008199, val loss: 1.458135, accuracy: 67.55, time: 69.3275 min ---------- Epoch 37/59, current lr=1e-05 train loss: 0.008427, val loss: 1.463535, accuracy: 67.56, time: 71.2016 min ---------- Epoch 38/59, current lr=1e-05 train loss: 0.006792, val loss: 1.470919, accuracy: 67.66, time: 73.0625 min ---------- Epoch 39/59, current lr=1e-05 train loss: 0.012828, val loss: 1.481919, accuracy: 67.46, time: 74.9140 min ---------- Epoch 40/59, current lr=1e-05 train loss: 0.009715, val loss: 1.460907, accuracy: 67.53, time: 76.7686 min ---------- Epoch 41/59, current lr=1e-05 train loss: 0.006844, val loss: 1.468899, accuracy: 67.86, time: 78.6323 min ---------- Epoch 42/59, current lr=1e-05 train loss: 0.006987, val loss: 1.469574, accuracy: 67.44, time: 80.4948 min ---------- Epoch 43/59, current lr=1e-05 train loss: 0.006851, val loss: 1.458670, accuracy: 67.50, time: 82.3590 min ---------- Epoch 44/59, current lr=1e-05 train loss: 0.007477, val loss: 1.490147, accuracy: 67.61, time: 84.2409 min ---------- Epoch 45/59, current lr=1.0000000000000002e-06 train loss: 0.006596, val loss: 1.484037, accuracy: 67.36, time: 86.1194 min ---------- Epoch 46/59, current lr=1.0000000000000002e-06 train loss: 0.007562, val loss: 1.471597, accuracy: 67.83, time: 87.9919 min ---------- Epoch 47/59, current lr=1.0000000000000002e-06 train loss: 0.005837, val loss: 1.491371, accuracy: 67.56, time: 89.8671 min ---------- Epoch 48/59, current lr=1.0000000000000002e-06 train loss: 0.007152, val loss: 1.487296, accuracy: 67.55, time: 91.7461 min ---------- Epoch 49/59, current lr=1.0000000000000002e-06 train loss: 0.007229, val loss: 1.462681, accuracy: 67.71, time: 93.6257 min ---------- Epoch 50/59, current lr=1.0000000000000002e-06 train loss: 0.010226, val loss: 1.489795, accuracy: 67.59, time: 95.5038 min ---------- Epoch 51/59, current lr=1.0000000000000002e-06 train loss: 0.007472, val loss: 1.456061, accuracy: 67.61, time: 97.3767 min ---------- Epoch 52/59, current lr=1.0000000000000002e-06 train loss: 0.006910, val loss: 1.476505, accuracy: 67.66, time: 99.2508 min ---------- Epoch 53/59, current lr=1.0000000000000002e-06 train loss: 0.006867, val loss: 1.472661, accuracy: 67.50, time: 101.1384 min ---------- Epoch 54/59, current lr=1.0000000000000002e-06 train loss: 0.006987, val loss: 1.470128, accuracy: 67.54, time: 103.0151 min ---------- Epoch 55/59, current lr=1.0000000000000002e-06 train loss: 0.007252, val loss: 1.492600, accuracy: 67.74, time: 104.8890 min ---------- Epoch 56/59, current lr=1.0000000000000002e-07 train loss: 0.006550, val loss: 1.506922, accuracy: 67.47, time: 106.7570 min ---------- Epoch 57/59, current lr=1.0000000000000002e-07 train loss: 0.006761, val loss: 1.499378, accuracy: 67.59, time: 108.6297 min ---------- Epoch 58/59, current lr=1.0000000000000002e-07 train loss: 0.005710, val loss: 1.472349, accuracy: 67.55, time: 110.4974 min ---------- Epoch 59/59, current lr=1.0000000000000002e-07 train loss: 0.006239, val loss: 1.481498, accuracy: 67.42, time: 112.3638 min ----------
# Train-Validation Progress
num_epochs=params_train["num_epochs"]
# plot loss progress
plt.title("Train-Val Loss")
plt.plot(range(1,num_epochs+1),loss_hist["train"],label="train")
plt.plot(range(1,num_epochs+1),loss_hist["val"],label="val")
plt.ylabel("Loss")
plt.xlabel("Training Epochs")
plt.legend()
plt.show()
# plot accuracy progress
plt.title("Train-Val Accuracy")
plt.plot(range(1,num_epochs+1),metric_hist["train"],label="train")
plt.plot(range(1,num_epochs+1),metric_hist["val"],label="val")
plt.ylabel("Accuracy")
plt.xlabel("Training Epochs")
plt.legend()
plt.show()