반응형
CNN 이미지 분류
Github: https://github.com/PSLeon24/LearnAI/blob/main/CNN_Image_Classification_with_PyTorch.ipynb
1. 데이터 전처리(Data Preprocessing)
import matplotlib.pyplot as plt
from torchvision.datasets.cifar import CIFAR10
# CIFAR-10은 10가지 클래스를 갖는 데이터셋 -> ex: 비행기, 트럭, 자동차, ...
from torchvision.transforms import ToTensor
# Load CIFAR-10 Dataset
training_data = CIFAR10(
root = './',
train = True,
download = True,
transform = ToTensor())
test_data = CIFAR10(
root = './',
train = False,
download = True,
transform = ToTensor())
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar-10-python.tar.gz
100%|██████████| 170498071/170498071 [00:02<00:00, 79693735.88it/s]
Extracting ./cifar-10-python.tar.gz to ./
Files already downloaded and verified
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(training_data.data[i])
plt.show()
1_2. 데이터 증강(Data Augmentation)
- 크롭핑(cropping): 이미지의 일정 부분을 도려내는 기법. 불필요한 부분을 도려낼 때 사용
- 좌우대칭(horizontal flip)
- 크롭핑과 좌우대칭 후, 이미지 크기에는 변화가 없도록 패딩 기법을 사용해 잘라낸 부분을 0으로 채우기
- 패딩(padding): 이미지의 특정 영역을 0(혹은 아무 값)으로 채우는 기법. 0으로 채우면 제로 패딩이라고 함
함수 원형 | 설명 | 제공 라이브러리 |
---|---|---|
Compose([*tf]) | 전처리 함수 tf를 입력받아 차례대로 실행 | torchvision.transforms |
RandomCrop(size) | 이미지의 일부를 제거한 뒤 size 크기로 복원 | torchvision.transforms |
RandomHorizontalFlip(p) | p 확률로 이미지를 좌우대칭 시킴 | torchvision.transforms |
import matplotlib.pyplot as plt
import torchvision.transforms as T
from torchvision.datasets.cifar import CIFAR10
from torchvision.transforms import Compose
from torchvision.transforms import RandomHorizontalFlip, RandomCrop
transforms = Compose([ # 데이터 전처리 함수
T.ToPILImage(),
RandomCrop((32, 32), padding=4), # 랜덤으로 이미지 일부 제거 후 padding
RandomHorizontalFlip(p=0.5) # y축을 기준으로 좌우대칭
])
training_data = CIFAR10(
root = './',
train = True,
download = True,
transform = transforms) # transfrom에는 데이터를 변환하는 함수가 들어감
test_data = CIFAR10(
root = './',
train = False,
download = True,
transform = transforms) # transfrom에는 데이터를 변환하는 함수가 들어감
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(transforms(training_data.data[i]))
plt.show()
1_3. 이미지 정규화(image normalization)
- R, G, B 한쪽에 값이 편향되어 있으면 학습에 안 좋은 영향을 끼칠 수 있으므로 최대한 정규분포를 따르도록 정규화하는 작업이 필요
- 정규화(normalization)
- 데이터의 분포를 정규분포의 형태로 바꿔주는 것
- 정규분포(normal distribution, gaussian distribution): 평균과 표준편차를 설명하는 분포 / 평균이 0, 표준편차가 1인 정규분포를 표준분포라고 한다.
CIFAR10 Dataset의 평균과 표준편차 구하기
함수 원형 | 설명 | 제공 라이브러리 |
---|---|---|
stack(tesnor, dim) | tensor가 dim 방향으로 합쳐준다. 예를 들어(224, 224) 크기의 텐서를 dim=0 방향으로 텐서 세 개를 합치면 (3, 224, 224) 모양의 텐서가 된다. | torch |
import torch
training_data = CIFAR10(
root = './',
train = True,
download = True,
transform = ToTensor())
imgs = [item[0] for item in training_data]
# imgs를 하나로 합침
imgs = torch.stack(imgs, dim=0).numpy()
# rgb 각 평균
mean_r = imgs[:, 0, :, :].mean()
mean_g = imgs[:, 1, :, :].mean()
mean_b = imgs[:, 2, :, :].mean()
print('mean')
print(f'R:{mean_r:.4f}, G:{mean_g:.4f}, B:{mean_b:.4f}')
# rgb 각 표준편차
std_r = imgs[:, 0, :, :].std()
std_g = imgs[:, 1, :, :].std()
std_b = imgs[:, 2, :, :].std()
print('std')
print(f'R:{std_r:.4f}, G:{std_g:.4f}, B:{std_b:.4f}')
Files already downloaded and verified
mean
R:0.4914, G:0.4822, B:0.4465
std
R:0.2470, G:0.2435, B:0.2616
import matplotlib.pyplot as plt
import torchvision.transforms as T
from torchvision.datasets.cifar import CIFAR10
from torchvision.transforms import Compose
from torchvision.transforms import RandomHorizontalFlip, RandomCrop, Normalize
transforms = Compose([ # 데이터 전처리 함수
T.ToPILImage(),
RandomCrop((32, 32), padding=4), # 랜덤으로 이미지 일부 제거 후 padding
RandomHorizontalFlip(p=0.5), # y축을 기준으로 좌우대칭
T.ToTensor(),
Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.247, 0.243, 0.261)),
T.ToPILImage()
])
training_data = CIFAR10(
root = './',
train = True,
download = True,
transform = transforms) # transfrom에는 데이터를 변환하는 함수가 들어감
test_data = CIFAR10(
root = './',
train = False,
download = True,
transform = transforms) # transfrom에는 데이터를 변환하는 함수가 들어감
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(transforms(training_data.data[i]))
plt.show()
2. CNN으로 이미지 분류하기
- 합성곱 3X3
- ReLU
- 합성곱 3X3
- ReLU
- 최대풀링(maxpooling: 이미지 크기를 절반으로 줄이는 연산)
2_1. 기본 블록 정의하기
# VGG 기본 블록 정의
import torch
import torch.nn as nn
class BasicBlock(nn.Module): # 기본 블록
# 기본 블록을 구성하는 층
def __init__(self, in_channels, out_channels, hidden_dim):
super(BasicBlock, self).__init__()
# Conv2d(in, out, kernel, stride): 합성곱을 계산
self.conv1 = nn.Conv2d(in_channels, hidden_dim,
kernel_size = 3, padding = 1)
self.conv2 = nn.Conv2d(hidden_dim, out_channels,
kernel_size = 3, padding = 1)
self.relu = nn.ReLU()
# stride: 커널의 이동거리
# MaxPool2d(kernel, stride): 최대 풀링을 실행
self.pool = nn.MaxPool2d(kernel_size = 2, stride = 2)
def forward(self, x): # 기본 블럭의 순전파
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.relu(x)
x = self.pool(x)
return x
2_2. CNN 모델 정의하기
class CNN(nn.Module):
def __init__(self, num_classes):
super(CNN, self).__init__()
# 합성곱 기본 블록 정의
self.block1 = BasicBlock(in_channels = 3, out_channels = 32, hidden_dim = 16)
self.block2 = BasicBlock(in_channels = 32, out_channels = 128, hidden_dim = 64)
self.block3 = BasicBlock(in_channels = 128, out_channels = 256, hidden_dim = 128)
# 분류기 정의
self.fc1 = nn.Linear(in_features = 4096, out_features = 2048)
self.fc2 = nn.Linear(in_features = 2048, out_features = 256)
self.fc3 = nn.Linear(in_features = 256, out_features = num_classes)
# 분류기의 활성화 함수
self.relu = nn.ReLU()
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x) # x shape is (-1, 256, 4, 4)
x = torch.flatten(x, start_dim = 1) # 1차원으로 평탄화
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
return x
2_3. 모델 학습하기
# 데이터 증강 정의
from torch.utils.data.dataloader import DataLoader
from torch.optim.adam import Adam
transforms = Compose([
RandomCrop((32, 32), padding = 4),
RandomHorizontalFlip(p = 0.5),
ToTensor(),
Normalize(mean = (0.4914, 0.4822, 0.4465), std = (0.247, 0.243, 0.261))
])
# 데이터 로드 및 모델 정의
training_data = CIFAR10(root = './', train = True, download = True, transform = transforms)
test_data = CIFAR10(root = './', train = False, download = True, transform = transforms)
train_loader = DataLoader(training_data, batch_size = 32, shuffle = True)
test_loader = DataLoader(test_data, batch_size = 32, shuffle = False)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
model = CNN(num_classes = 10)
model.to(device)
CNN(
(block1): BasicBlock(
(conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(block2): BasicBlock(
(conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(block3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc1): Linear(in_features=4096, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=256, bias=True)
(fc3): Linear(in_features=256, out_features=10, bias=True)
(relu): ReLU()
)
# learning rate
lr = 1e-3
optim = Adam(model.parameters(), lr = lr)
for epochs in range(100):
for data, label in train_loader:
optim.zero_grad()
preds = model(data.to(device))
loss = nn.CrossEntropyLoss()(preds, label.to(device))
loss.backward()
optim.step()
print(f'epochs: {epochs + 1}, loss: {loss.item()}')
torch.save(model.state_dict(), "CIFAR.pth")
epochs: 1, loss: 1.2944953441619873
epochs: 2, loss: 1.256959080696106
epochs: 3, loss: 1.2103475332260132
epochs: 4, loss: 0.6277247071266174
epochs: 5, loss: 0.8739399909973145
epochs: 6, loss: 0.656192421913147
epochs: 7, loss: 0.5774560570716858
epochs: 8, loss: 0.11375464498996735
epochs: 9, loss: 0.7365378141403198
epochs: 10, loss: 0.3127354085445404
epochs: 11, loss: 0.8014481067657471
epochs: 12, loss: 0.7422015070915222
epochs: 13, loss: 0.5816378593444824
epochs: 14, loss: 0.5835270881652832
epochs: 15, loss: 0.5221394300460815
epochs: 16, loss: 0.8089097738265991
epochs: 17, loss: 0.6186450123786926
epochs: 18, loss: 0.2968730926513672
epochs: 19, loss: 0.4735616147518158
epochs: 20, loss: 0.17729789018630981
epochs: 21, loss: 0.8242084383964539
epochs: 22, loss: 0.7224799394607544
epochs: 23, loss: 0.8725035190582275
epochs: 24, loss: 0.35641077160835266
epochs: 25, loss: 0.5685468912124634
epochs: 26, loss: 0.6796103715896606
epochs: 27, loss: 0.38071227073669434
epochs: 28, loss: 0.6666141748428345
epochs: 29, loss: 0.3475969135761261
epochs: 30, loss: 0.26084691286087036
epochs: 31, loss: 1.0520521402359009
epochs: 32, loss: 0.16453635692596436
epochs: 33, loss: 0.45064014196395874
epochs: 34, loss: 0.31848952174186707
epochs: 35, loss: 0.16753819584846497
epochs: 36, loss: 0.12775918841362
epochs: 37, loss: 0.4465509057044983
epochs: 38, loss: 0.5896815657615662
epochs: 39, loss: 0.34515488147735596
epochs: 40, loss: 0.6784462928771973
epochs: 41, loss: 0.2775740921497345
epochs: 42, loss: 0.6451463103294373
epochs: 43, loss: 0.317112535238266
epochs: 44, loss: 0.4523209035396576
epochs: 45, loss: 0.3064001798629761
epochs: 46, loss: 0.6443012952804565
epochs: 47, loss: 0.3107648491859436
epochs: 48, loss: 0.2591986656188965
epochs: 49, loss: 0.6829649209976196
epochs: 50, loss: 0.45758822560310364
epochs: 51, loss: 0.43198031187057495
epochs: 52, loss: 0.24181154370307922
epochs: 53, loss: 0.26728639006614685
epochs: 54, loss: 0.30310603976249695
epochs: 55, loss: 0.3887348771095276
epochs: 56, loss: 0.028082484379410744
epochs: 57, loss: 0.2346067875623703
epochs: 58, loss: 0.3004593253135681
epochs: 59, loss: 0.28349030017852783
epochs: 60, loss: 0.16054654121398926
epochs: 61, loss: 0.13317736983299255
epochs: 62, loss: 0.24746283888816833
epochs: 63, loss: 0.15954481065273285
epochs: 64, loss: 0.16638357937335968
epochs: 65, loss: 0.17073221504688263
epochs: 66, loss: 0.6243990659713745
epochs: 67, loss: 0.3548056185245514
epochs: 68, loss: 0.24232488870620728
epochs: 69, loss: 0.06432937830686569
epochs: 70, loss: 0.3951784372329712
epochs: 71, loss: 0.592050313949585
epochs: 72, loss: 0.2057608962059021
epochs: 73, loss: 0.6377749443054199
epochs: 74, loss: 0.15353797376155853
epochs: 75, loss: 0.27110201120376587
epochs: 76, loss: 0.3324567675590515
epochs: 77, loss: 0.20056085288524628
epochs: 78, loss: 0.5185072422027588
epochs: 79, loss: 0.4486525058746338
epochs: 80, loss: 0.24327629804611206
epochs: 81, loss: 0.6180362105369568
epochs: 82, loss: 0.5132676362991333
epochs: 83, loss: 0.3146440088748932
epochs: 84, loss: 0.23053902387619019
epochs: 85, loss: 0.4025624990463257
epochs: 86, loss: 0.29679837822914124
epochs: 87, loss: 0.5279446244239807
epochs: 88, loss: 0.3573974668979645
epochs: 89, loss: 0.5636845827102661
epochs: 90, loss: 0.19891107082366943
epochs: 91, loss: 0.34471043944358826
epochs: 92, loss: 0.11527696996927261
epochs: 93, loss: 0.5898762345314026
epochs: 94, loss: 0.18890637159347534
epochs: 95, loss: 0.16639317572116852
epochs: 96, loss: 0.4204927682876587
epochs: 97, loss: 0.4960172176361084
epochs: 98, loss: 0.03426114097237587
epochs: 99, loss: 0.21638162434101105
epochs: 100, loss: 0.20015659928321838
2_4. 모델 성능 평가하기
model.load_state_dict(torch.load('CIFAR.pth', map_location = device))
num_corr = 0
with torch.no_grad():
for data, label in test_loader:
output = model(data.to(device))
preds = output.data.max(1)[1]
corr = preds.eq(label.to(device).data).sum().item()
num_corr += corr
print(f'Accuracy: {num_corr / len(test_data)}')
Accuracy: 0.8205
'AI & BigData' 카테고리의 다른 글
[MachineLearning] 선형 회귀 모델로 보험료 예측하기 (0) | 2023.08.18 |
---|---|
[AI] Object Detection 모델 성능 평가 지표(IoU, Precision, Recall, mAP, F1 Score) (1) | 2023.08.16 |
[AI] 가중치 규제(Regularization) (0) | 2023.08.11 |
[AI] 스테이블 디퓨전 AI 실사 모델 만들기 with Stable Diffusion WebUI (0) | 2023.07.27 |
[AI] Diffusion: stable diffusion webui (0) | 2023.07.27 |