D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver et al., Mixmatch: A holistic approach to semi-supervised learning, 2019.

. Miguelá, Y. Carreira-perpiñán, and . Idelbayev, Model compression as constrained optimization, with application to neural nets. part ii: quantization, 2017.

Y. Cheng, D. Wang, P. Zhou, and T. Zhang, A survey of model compression and acceleration for deep neural networks, 2017.

Y. Choi, M. El-khamy, and J. Lee, Towards the limit of network quantization, 2016.

M. Courbariaux, Y. Bengio, and J. David, Binaryconnect: Training deep neural networks with binary weights during propagations, 2015.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, Conference on Computer Vision and Pattern Recognition, 2009.

W. Emily-l-denton, J. Zaremba, Y. Bruna, R. Lecun, and . Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, Advances in Neural Information Processing Systems 27, 2014.

T. Ge, K. He, Q. Ke, and J. Sun, Optimized product quantization, IEEE Trans. Pattern Anal. Mach. Intell, 2014.

Y. Gong, L. Liu, M. Yang, and L. Bourdev, Compressing deep convolutional networks using vector quantization, 2014.

Y. Guo, A survey on methods and theories of quantized neural networks, 2018.

S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, International Conference on Learning Representations, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2015.

K. He, G. Gkioxari, P. Dollar, and R. Girshick, Mask r-cnn, International Conference on Computer Vision (ICCV, 2017.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, NIPS Deep Learning Workshop, 2014.

A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen et al., , 2019.

G. Huang, Z. Liu, L. Van-der-maaten, and K. Q. Weinberger, Densely connected convolutional networks. Conference on Computer Vision and Pattern Recognition, 2017.

F. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. Dally et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016.

H. Jegou, M. Douze, and C. Schmid, Product quantization for nearest neighbor search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00514462

H. Jégou, M. Douze, and C. Schmid, Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 2012.

Y. Lecun, J. S. Denker, and S. A. Solla, Optimal brain damage, Advances in Neural Information Processing Systems, 1990.

F. Li and B. Liu, Ternary weight networks, 2016.

X. Lin, C. Zhao, and W. Pan, Towards accurate binary convolutional neural network, 2017.

Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan et al., Learning efficient convolutional networks through network slimming, International Conference on Computer Vision, 2017.

S. Raphael-gontijo-lopes, T. Fenu, and . Starner, Data-free knowledge distillation for deep neural networks, 2017.

J. Luo, J. Wu, and W. Lin, Thinet: A filter level pruning method for deep neural network compression, 2017.

N. Ma, X. Zhang, H. Zheng, and J. Sun, Shufflenet V2: practical guidelines for efficient CNN architecture design, 2018.

D. Mahajan, R. B. Girshick, V. Ramanathan, K. He, M. Paluri et al., Ashwin Bharambe, and Laurens van der Maaten. Exploring the limits of weakly supervised pretraining, 2018.

D. Mark and . Mcdonnell, Training wide residual networks for deployment using a single bit for each weight, 2018.

K. Asit, D. Mishra, and . Marr, Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy, 2017.

K. Asit, E. Mishra, J. J. Nurvitadhi, D. Cook, and . Marr, WRPN: wide reduced-precision networks, 2017.

M. Norouzi, J. David, and . Fleet, Cartesian k-means, Conference on Computer Vision and Pattern Recognition, 2013.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, 2016.

M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation, 2018.

O. Shayer, D. Levi, and E. Fetaya, Learning discrete weights using the local reparameterization trick, 2017.

M. Tan, V. Quoc, . Le, and . Efficientnet, Rethinking model scaling for convolutional neural networks, 2019.

B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni et al., The new data and new challenges in multimedia research, 2015.

F. Tung and G. Mori, Deep neural network compression by in-parallel pruningquantization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, HAQ: hardware-aware automated quantization, 2018.

K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, Haq: hardware-aware automated quantization, 2018.

J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, Quantized convolutional neural networks for mobile devices, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, Conference on Computer Vision and Pattern Recognition, 2017.

I. Z. Yalniz, H. Jégou, K. Chen, M. Paluri, and D. Mahajan, Billion-scale semisupervised learning for image classification, 2019.

X. Zhang, X. Zhou, M. Lin, and J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2017.

A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, Incremental network quantization: Towards lossless cnns with low-precision weights, 2017.

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu et al., Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, 2016.

C. Zhu, S. Han, H. Mao, and W. J. Dally, Trained ternary quantization, 2016.

B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, Learning transferable architectures for scalable image recognition, 2017.