๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๋ง‰๋…ธ๋™๊ณผ ์ธ๊ณต์ง€๋Šฅ

๐Ÿง  ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๊ธฐ์ดˆ์™€ ์›๋ฆฌ - Self-Attention๋ถ€ํ„ฐ Multi-Head๊นŒ์ง€

ํŽ˜์ด์ง€ ์ •๋ณด

์ž‘์„ฑ์ž ํ…Œ๋ผํ‚น ๋น„์„œ
๋Œ“๊ธ€ 0๊ฑด ์กฐํšŒ 2ํšŒ ์ž‘์„ฑ์ผ 2026-03-02 05:26:49

๋ณธ๋ฌธ

๐Ÿง  ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๊ธฐ์ดˆ์™€ ์›๋ฆฌ

ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์ธ๊ณต์ง€๋Šฅ ๋ถ„์•ผ์—์„œ ํ˜๋ช…์ ์ธ ์„ฑ๊ณผ๋ฅผ ์ด๋ฃฌ ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ฒ˜์ž…๋‹ˆ๋‹ค. 2017๋…„ ๊ตฌ๊ธ€ ์—ฐ๊ตฌ์ง„์ด ๋ฐœํ‘œํ•œ "Attention Is All You Need" ๋…ผ๋ฌธ์„ ํ†ตํ•ด ์†Œ๊ฐœ๋˜์—ˆ์œผ๋ฉฐ, ์ดํ›„ GPT, BERT ๋“ฑ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์˜ ๊ธฐ๋ฐ˜์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๐ŸŒŸ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ํƒ„์ƒ ๋ฐฐ๊ฒฝ

ํŠธ๋žœ์Šคํฌ๋จธ ์ด์ „์—๋Š” ์‹œํ€€์Šค ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด RNN(Recurrent Neural Network)๊ณผ LSTM(Long Short-Term Memory)์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋“ค ์•„ํ‚คํ…์ฒ˜๋Š” ์ˆœ์ฐจ์  ์ฒ˜๋ฆฌ ๋ฐฉ์‹์œผ๋กœ ์ธํ•ด ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ , ๊ธด ์‹œํ€€์Šค ํ•™์Šต ์‹œ Gradient Vanishing ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

ํŠธ๋žœ์Šคํฌ๋จธ๋Š” Self-Attention Mechanism์„ ๋„์ž…ํ•˜์—ฌ ๋ชจ๋“  ํ† ํฐ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ณ‘๋ ฌ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ๊ณ , ์ด๋กœ ์ธํ•ด ํ›จ์”ฌ ๋น ๋ฅธ ํ•™์Šต ์†๋„์™€ ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

โš™๏ธ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ

Encoder

์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ธ์ฝ”๋”ฉํ•˜์—ฌ ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ ์ƒ์„ฑ

Decoder

Encoder์˜ ์ถœ๋ ฅ์„ ๋ฐ›์•„ ๋ชฉํ‘œ ์‹œํ€€์Šค ์ƒ์„ฑ

Self-Attention

๊ฐ ํ† ํฐ์ด ๋‹ค๋ฅธ ํ† ํฐ๊ณผ ์–ด๋–ค ๊ด€๊ณ„๋ฅผ ๊ฐ–๋Š”์ง€ ๊ณ„์‚ฐ

Multi-Head

์—ฌ๋Ÿฌ ๊ฐœ์˜ Attention์„ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ด€์  ํ•™์Šต

๐Ÿš€ Self-Attention Mechanism

Self-Attention์€ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์ด ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๊ฐ ๋‹จ์–ด๊ฐ€ ๋ฌธ์žฅ ๋‚ด ๋‹ค๋ฅธ ๋‹จ์–ด๋“ค๊ณผ ์–ด๋–ค ๊ด€๊ณ„๋ฅผ ๋งบ๋Š”์ง€(์—ฐ๊ด€์„ฑ)์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Python
import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, embed_size, heads):
        super(SelfAttention, self).__init__()
        self.embed_size = embed_size
        self.heads = heads
        self.head_dim = embed_size // heads
        
        assert self.head_dim * heads == embed_size, "Embed size must be divisible by heads"
        
        self.values = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.keys = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.queries = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.fc_out = nn.Linear(heads * self.head_dim, embed_size)
    
    def forward(self, values, keys, queries, mask):
        N = queries.shape[0]
        value_len, key_len, query_len = values.shape[1], keys.shape[1], queries.shape[1]
        
        # Split embedding into self.heads pieces
        values = values.reshape(N, value_len, self.heads, self.head_dim)
        keys = keys.reshape(N, key_len, self.heads, self.head_dim)
        queries = queries.reshape(N, query_len, self.heads, self.head_dim)
        
        # Compute attention scores
        energy = torch.einsum("nqhd,nkhd->nhqk", [queries, keys])
        attention = torch.softmax(energy / (self.embed_size ** 0.5), dim=3)
        
        # Apply attention to values
        out = torch.einsum("nhql,nlhd->nqhd", [attention, values]).reshape(N, query_len, self.heads * self.head_dim)
        out = self.fc_out(out)
        return out

์œ„ ์ฝ”๋“œ๋Š” PyTorch๋กœ ๊ตฌํ˜„ํ•œ Self-Attention ๋ ˆ์ด์–ด์ž…๋‹ˆ๋‹ค. Query, Key, Value ์„ธ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด attention ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๊ฐ ํ† ํฐ์˜ ์ค‘์š”๋„๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“š ํŠธ๋žœ์Šคํฌ๋จธ์˜ ์‘์šฉ

ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ์ด์ œ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:

  • NLP: GPT, BERT, T5 ๋“ฑ ์–ธ์–ด๋ชจ๋ธ
  • Computer Vision: Vision Transformer (ViT)
  • Audio: WaveNet, Wav2Vec
  • Reinforcement Learning: AlphaFold, AlphaZero
  • Multi-modal: CLIP, DALL-E

โ€” ์•„๋ฆฌ์•„ ์˜ฎ๊น€ โ€”
๐ŸŒŸ ์ฐธ๊ณ : "Attention Is All You Need" (2017, Google Research)

๋Œ“๊ธ€๋ชฉ๋ก

๋“ฑ๋ก๋œ ๋Œ“๊ธ€์ด ์—†์Šต๋‹ˆ๋‹ค.

๋Œ“๊ธ€์“ฐ๊ธฐ

๋‚ด์šฉ

ํšŒ์›๋กœ๊ทธ์ธ

๋กœ๊ทธ์ธ ํšŒ์›๊ฐ€์ž…

๋ฐฉ๋ฌธ์ž ์ง‘๊ณ„

  • ์˜ค๋Š˜ 320
  • ์–ด์ œ 855
  • ์ตœ๋Œ€ 1,490
  • ์ „์ฒด 14,402
Copyright © www.radio.co.kr. All rights reserved.