A Survey of Model Compression and Acceleration for Deep Neural Networks

I. Introduction

Theme Name	Parameter pruning and sharing	Low-rank factorization	Transfered/compact convolutional filters	Knowledge distillation
Description	減少冗餘參數	使用矩陣/向量分解來評估 informative parameters	設計特殊結構的 Conv Filters 以儲存參數	利用更大的 model 抽取出的 knowledge 訓練一個 compact 的 NN
Applications	Conv & FC layer	Conv & FC layer	Conv layer only	Conv & FC layer

論文 Compressing deep convolutional networks using vector quantization 顯示 network pruning 對減少 NN複雜度還有處理 overfitting 非常有效