Deep Learning
General Structures and Techniques in Deep Learning
1. Adversial Training
- Adversarial risk via optimal transport and optimal couplings
- Certified Robustness to Adversarial Examples with Differential Privacy
- Certified Adversarial Robustness via Randomized Smoothing
- Certified Robustness to Label-Flipping Attacks via Randomized Smoothing
- Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers
- Robust Estimation and Generative Adversarial Networks
- Robust Descent using Smoothed Multiplicative Noise
2. Architecture
- Dynamic Routing Between Capsules
- Matrix Capsules with Em Routing
- Deep Residual Learning for Image Recognition
- Neural Ordinary Differential Equations
- Augmented Neural ODEs
- Fully Convolutional Networks for Semantic Segmentation
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Tabnet: Attentive interpretable tabular learning (Arik and Pfister, AAAI 2021)
3. Deep Causal Inference
- Causal Effect Inference with Deep Latent-Variable Models
- Causal Deep Information Bottleneck
- Learning Representations for Counterfactual Inference
- Estimating Individual Treatment Effect: Generalization Bounds and Algorithms
4. Deep Learning Theory
- Deep Neural Networks as Gaussian Processes
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks
- Theoretical Guarantees for Sampling and Inference in Generative Models with Latent Diffusions
- Deep Double Descent: Where Bigger Models and More Data Hurt
- On Lazy Training in Differentiable Programming
5. Feature Interaction
- Deep & cross network for ad click predictions. (Wang et al., KDD 2017)
- xDeepFM: Combining explicit and implicit feature interactions for recommender systems. (Lian et al., KDD 2018)
- AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks (Song et al., CIKM 2019)
- S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. (Zhou et al., CIKM 2020)
6. Generative Adversarial Network
- Generative Adversarial Nets
- towards principled methods for training generative adversarial networks
- Wasserstein GAN
- Improved Techniques for Training GANs
- Training Generative Neural Networks via Maximum Mean Discrepancy Optimization
- Generative Moment Matching Networks
- MMD GAN: Towards Deeper Understanding of Moment Matching Network
- On Gradient Regularizers for MMD GANs
7. Generalization
- Dark Knowledge
- Distilling the Knowledge in a Neural Network
- Understanding Black-box Predictions via Influence Functions
- Understanding Deep Learning Requires Rethinking Generalization
- DeepFool: a simple and accurate method to fool deep neural networks
- On the Accuracy of Influence Functions for Measuring Group Effects
8. Graph Models
- Convolutional neural networks on graphs with fast localized spectral filtering (Defferrard et al., NIPS 2016)
- Semi-supervised classification with graph convolutional networks (Kipf and Welling, ICLR 2017)
- Simplifying graph convolutional networks (Weinberger et al., ICML 2019)
- Inductive representation learning on large graphs (Hamilton et al., NIPS 2017)
Graph attention networks (Bengio et al., NIPS 2018)
9. Information Theory
The Information Bottleneck Method
- Deep Variational Information Bottleneck
- Deep Learning and the Information Bottleneck Principle
- Opening the Black Box of Deep Neural Networks via Information
- On the Information Bottleneck Theory of Deep Learning
- Estimating Information Flow in Deep Neural Networks
10. Measuring Uncertainty
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- On Calibration of Modern Neural Networks
- Predicting Good Probabilities With Supervised Learning
- A Comprehensive Review of Neural Network-based Prediction Intervals and New Advances
- Lower Upper Bound Estimation Method for Construction of Neural Network-Based Prediction
- Estimating the Mean and Variance of the Target Probability Distribution
- Wind Power Interval Prediction Based on Improved PSO and BP Neural Network
- Prediction intervals for artificial neural networks
- Practical confidence and prediction intervals
11. Meta Learning
- Learning to Learn by Gradient Descent by Gradient Descent
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. (Abbeel and Levine, ICML 2017)
- Optimization as a Model for Few-shot Learning
- Neural architecture search: A survey. (Elsken et al., 2019 JMLR)
- Learning Transferable Architectures for Scalable Image Recognition. (Zoph et al., 2018 CVPR )
- Neural architecture search with reinforcement learning. (Zoph and Le, 2017 ICLR)
- DARTS- Differentiable Architecture Search. (Liu et al., 2019 ICLR)
12. Natural Language Processing
- Teaching Machines to Read and Comprehend
- Using the output embedding to improve language models
- Tying word vectors and word classifiers: A loss framework for language modeling
- Smart Reply: Automated Response Suggestion for Email
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (Bat et al., 2018)
- Temporal Convolutional Networks for Action Segmentation and Detection
- Temporal Convolutional Attention-based Network For Sequence Modeling
- Long Short-Term Memory-Networks for Machine Reading
- Effective Approaches to Attention-based Neural Machine Translation
- Neural Machine Translation by Jointly Learning to Align and Translate
- Sequence to Sequence Learning with Neural Networks
- Show, Attend and Tell- Neural Image Caption Generation with Visual Attention
- Attention Is All You Need
- Pre-training of Deep Bidirectional Transformers for Language Understanding
- Neural Turing Machines
- Pointer Networks
- ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information (Sun et al., ACL 2021)
- Chinese NER Using Lattice LSTM (Zhang and Yang, ACL 2018)
- An Encoding Strategy Based Word-Character LSTM for Chinese NER (Liu et al., ACL 2019)
- A Neural Multi-digraph Model for Chinese NER with Gazetteers (Ding et al., ACL 2019)
- Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network (Sui et al., EMNLP 2019)
- A Lexicon-Based Graph Neural Network for Chinese NER (Gui et al., EMNLP 2019)
13. Privacy
- Deep Leakage from Gradients
14. Reinforcement Learning
- Mastering the Game of Go with Deep Neural Networks and Tree Search
- Mastering the Game of Go without Human Knowledge
- Continuous control with deep reinforcement learning
- Deterministic Policy Gradient Algorithms
- Trust Region Policy Optimization
- Deep Reinforcement Learning with Double Q-learning
- Prioritized Experience Replay
- Taming the Noise in Reinforcement Learning via Soft Updates
15. Semi-supervised Learning
- Semi-supervised Learning with Deep Generative Models
- Semi-supervised Learning with Ladder Networks
- Auxiliary Deep Generative Models
- Semi-Supervised Learning with Generative Adversarial Networks
- Semi-supervised Learning with GANs: Revisiting Manifold Regularization
- Data-Efficient Image Recognition with Contrastive Predictive Coding
- Temporal Ensembling for Semi-Supervised Learning
- Good Semi-supervised Learning that Requires a Bad GAN
16. Stochastic Optimization and Generalization
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Sharp Minima Can Generalize For Deep Nets
- Entropy-sgd: Biasing Gradient Descent into Wide Valleys
- Geometry of Optimization and Implicit Regularization in Deep Learning
- Path-sgd: Path-normalized Optimization in Deep Neural Networks
- Norm-based capacity control in neural networks
- An Empirical Analysis of the Optimization of Deep Network Loss Surfaces
- Theory of Deep Learning II: Landscape of the Empirical Risk in Deep Learning
- Theory of Deep Learning III: Generalization Properties of SGD
- Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
- On Dropout and Nuclear Norm Regularization
- Theoretical guarantees for sampling and inference in generative models with latent diffusions
- Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit
- The Landscape of Empirical Risk for Non-convex Losses
17. Time Series
- Deep State Space Models for Time Series Forecasting
- Deep Factors for Forecasting
- DeepAR- Probabilistic forecasting with autoregressive recurrent networks
- Structured Inference Networks for Nonlinear State Space Models
18. Training Technique
- Improving neural networks by preventing co-adaptation of feature detectors
- Dropout: A simple way to prevent neural networks from overfitting
- An empirical analysis of dropout in piecewise linear networks
- Understanding Dropout
- Asynchronous Stochastic Gradient Descent with Delay Compensation
- Understanding Synthetic Gradients and Decoupled Neural Interfaces
- Decoupled Neural Interfaces using Synthetic Gradients
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- How Does Batch Normalization Help Optimization
19. Transfer Learning
- How transferable are features in deep neural networks?
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Neural Style Transfer A Review
- Image Style Transfer Using Convolutional Neural Networks
- Perceptual Losses for Real-Time Style Transfer and Super Resolution
20. Unsupervised Learning
- Representation Learning with Contrastive Predictive Coding
- A Simple Framework for Contrastive Learning of Visual Representations