And the Bit Goes Down: Revisiting the Quantization of Neural Networks - Université Claude Bernard Lyon 1 Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

Résumé

In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unlabelled data at quantization time and allows for efficient inference on CPU by using bytealigned codebooks to store the compressed weights. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20 X compression factor) while preserving a top-1 accuracy of 76:1% on ImageNet object classification and by compressing a Mask R-CNN with a 26 X factor.1
Fichier principal
Vignette du fichier
1907.05686 (1).pdf (1.51 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02434572 , version 1 (10-01-2020)

Identifiants

  • HAL Id : hal-02434572 , version 1

Citer

Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou. And the Bit Goes Down: Revisiting the Quantization of Neural Networks. ICLR 2020 - Eighth International Conference on Learning Representations, Apr 2020, Addis-Abeba, Ethiopia. pp.1-11. ⟨hal-02434572⟩
240 Consultations
709 Téléchargements

Partager

Gmail Facebook X LinkedIn More