논문 링크(2942회 인용)

Summary

linear model과 deep neural networks을 동시에 학습함으로써 memorization과 generalization의 장점을 합치고자 한다.
linear model을 통해 wide한 특징을, deep neural networks을 통해 deep한 특징을 모형에 녹여해는 것이다.
linear model과 dep neural networks의 임베딩 결과를 concatenate하여 동일한 layer에서 network을 학습시킨다.
구글 app store 데이터에서 좋은 성능을 보임을 확인했다.

Motivation

추천 시스템에서 logistic regression과 같은 generalized linear model은 간단하고 해석 가능하기 때문에 자주 쓰인다. memorization은 linear model에서 cross-product feature을 사용함으로써 달성할 수 있다.
하지만 이런 cross-product feature은 학습 데이터에 나타나지 않은 query-item feature을 일반화하지 못한다는 단점이 있다.
embedding-based 모델은 나타나지 않은 query-item feature을 저차원의 dense embedding vector을 통해 일반화한다.
그러나 query-item 행렬이 sparse하거나 high rank인 경우에는 이런 저차원의 dense embedding vector을 학습하기 어렵다. embedding-based 모델에서는 이런 경우에 대해서도 저차원으로 학습함으로써 실제로는 interaction이 없음에도 과도하게 일반화하는 경향이 있다.
이와 같이 generalized linear model과 embedding-based 모델은 각자의 장단점이 있기 때문에, 본 논문에서는 이 둘의 장점을 합치기 위해 두 방법을 동시에 모형에 녹여낸다.

Approach

The wide component

wide component는 그림 1의 왼쪽 그림과 같이 generalized linear model($y = \mathbf{w}^T \mathbf{x} + b$)의 형태이다. 중요한 것은, 여기서 raw input인 $\mathbf{x} = [x_1, x_2, \cdots, x_d]$뿐만 아니라 cross-product transformation도 feature에 포함된다는 것이다.

k번째 transformation인 $\phi_k$는 d개의 변수에 대한 cross-product transformation이다.
$c_{ki}$는 i번째 변수가 k번째 변환인 $\phi_k$의 한 부분이라면 1, 그렇지 않으면 0을 뱉어내는 변수이다.
예를 들어, 어떤 데이터 세트에 10개의 변수가 있는데 $\phi_k$가 AND(성별=여성, 언어=영어)라면, 성별이 여성이고 언어가 영어인 데이터의 cross-product 변환만 1이 된다.
이러한 cross-product 변환은 generalized linear model에 nonlinearity을 더하여 interaction을 모형에 포함시킬 수 있다.

The deep component

그림1의 오른쪽 그림과 같은 feed-forward neural network을 의미한다. sparse한 범주형 변수는 저차원의 dense real-valued vector로 변환된다. $l$번째 레이어의 출력값을 $a^{(l)}$이라고 할 때, 아래와 같은 network을 학습하는 것이 목적이다.

여기서 $W^{(l)}, b^{(l)}, f$는 $l$번째 레이어의 가중치 행렬과 bias, 그리고 activation function을 의미한다. 본 논문에서는 activation function으로 relu을 사용했다.

Joint training of wide & deep model

그림1의 가운데 그림을 보자. wide model과 deep model의 출력값은 모두 하나의 레이어에서 contatenate되고 공통의 activation function을 통과하여 하나의 출력값을 가진다. 여기서 중요한 점은, 이 학습법이 ensemble이 아니라 joint learning이라는 점이다. ensemble은 독립적으로 학습한 결과물들을 합치는 것이고 joint learning은 파라미터를 동시에 업데이트하는 것으로써, wide & deep learning은 joint learning에속한다.

그림1의 가운데 그림의 logistic regression의 예측값은 아래와 같이 적을 수 있다.

$\mathbf{x}, \phi(\mathbf{x})$: raw 입력값, 변환된 입력값
$a^{(l_f)}$: 마지막 레이어의 출력값
$\sigma$: sigmoid 함수

Results

논문에서 학습한 neural network의 아키텍쳐이다. 연속형 변수는 그대로 넣거나 변환하여 넣었고 범주형 변수는 neural network을 태워서 임베팅을 만든 후에 넣었다. 두 종류의 변수를 하나의 레이어에서 concatenate하고 또 hidden layer을 태워서 최종 logistic loss을 계산한다.

wide & deep network가 offline 실험과 online 실험에서 모두 좋은 결과를 보였다.

Conclusion

굉장히 간단한 딥러닝을 사용한 추천 논문이다. 연속형 변수는 그대로, 범주형 변수는 임베딩을 만들어 넣는 것이 포인트이다.
추천뿐만 아니라 다른 문제에서도 활용할 수 있다고 생각된다. 결국 변수를 어떻게 다루느냐의 문제니까..

'ML&DL > Recommender System' 카테고리의 다른 글

[Recommender System / Paper review] #25 AutoRec: Autoencoders Meet Collaborative Filtering (2)	2023.04.25
[Recommender System / Paper review] #24 DeepFM: A Factorization-Machine based Neural Network for CTR Prediction (0)	2023.04.25
[Recommender System / Paper review] #22 Factorization Machines (0)	2023.04.21
[Recommender System / Paper review] #21 Deep Neural Networks for YouTube Recommendations (0)	2023.04.20
[Recommender System / Paper review] #20 The YouTube Video Recommendation System (0)	2023.04.19

꾸준하게

[Recommender System / Paper review] #23 Wide & Deep Learning for Recommender Systems

Summary

Motivation

Approach

The wide component

The deep component

Joint training of wide & deep model

Results

Conclusion

'ML&DL > Recommender System' 카테고리의 다른 글

댓글

티스토리툴바

[Recommender System / Paper review] #23 Wide & Deep Learning for Recommender Systems

Summary

Motivation

Approach

The wide component

The deep component

Joint training of wide & deep model

Results

Conclusion

'ML&DL > Recommender System' 카테고리의 다른 글

관련글

댓글

티스토리툴바