Deep Neural Network Compression
Deep neural networks have recently achieved great success in many visual recognition tasks. However, memory consumption is an important aspect of artificial neural networks that needs to be carefully considered to make them applicable to real-world problems. In deep networks, the total number of parameters for a given network and the stored activations for large datasets use a massive amount of memory. There are a number of different approaches used to compress or reduce the memory consumed, including network quantization, tensor decomposition, knowledge distillation, and network pruning. A natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. We work on Dependency-based Neuron Trimming to speed up such models for large neural networks in both sequential and batch settings, exploring tradeoffs between computational complexity and accuracy for both data-driven and dynamic approaches.