2016年7月4日星期一

Deep Residual Networks

Deep Residual Networks

Microsoft Research Asia (MSRA).

Table of Contents

  1. Introduction
  2. Citation
  3. Disclaimer and known issues
  4. Models
  5. Results
  6. Third-party re-implementations

Introduction

This repository contains the original models (ResNet-50, ResNet-101, and ResNet-152) described in the paper "Deep Residual Learning for Image Recognition" (http://arxiv.org/abs/1512.03385). These models are those used in ILSVRC andCOCO 2015 competitions, which won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Note
  1. Re-implementations with training code and models from Facebook AI Research (FAIR): blogcode
  2. Code of improved 1K-layer ResNets with 4.62% test error on CIFAR-10 in our new arXiv paper:https://github.com/KaimingHe/resnet-1k-layers

Citation

If you use these models in your research, please cite:
@article{He2015,
    author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
    title = {Deep Residual Learning for Image Recognition},
    journal = {arXiv preprint arXiv:1512.03385},
    year = {2015}
}

Disclaimer and known issues

  1. These models are converted from our own implementation to a recent version of Caffe (2016/2/3, b590f1d). The numerical results using this code are as in the tables below.
  2. These models are for the usage of testing or fine-tuning.
  3. These models were not trained using this version of Caffe.
  4. If you want to train these models using this version of Caffe without modifications, please notice that:
    • GPU memory might be insufficient for extremely deep models.
    • Changes of mini-batch size should impact accuracy (we use a mini-batch of 256 images on 8 GPUs, that is, 32 images per GPU).
    • Implementation of data augmentation might be different (see our paper about the data augmentation we used).
    • We randomly shuffle data at the beginning of every epoch.
    • There might be some other untested issues.
  5. In our BN layers, the provided mean and variance are strictly computed using average (not moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.
  6. In the BN paper, the BN layer learns gamma/beta. To implement BN in this version of Caffe, we use its provided "batch_norm_layer" (which has no gamma/beta learned) followed by "scale_layer" (which learns gamma/beta).
  7. We use Caffe's implementation of SGD with momentum: v := momentum*v + lr*g. If you want to port these models to other libraries (e.g., Torch, CNTK), please pay careful attention to the possibly different implementation of SGD with momentum: v := momentum*v + (1-momentum)*lr*g, which changes the effective learning rates.

Models

  1. Visualizations of network structures (tools from ethereon):
  2. Model files:
    • MSR download: link
    • OneDrive download: link

Results

  1. Curves on ImageNet (solid lines: 1-crop val error; dashed lines: training error): Training curves
  2. 1-crop validation error on ImageNet (center 224x224 crop from resized image with shorter side=256):
    modeltop-1top-5
    VGG-1628.5%9.9%
    ResNet-5024.7%7.8%
    ResNet-10123.6%7.1%
    ResNet-15223.0%6.7%
  3. 10-crop validation error on ImageNet (averaging softmax scores of 10 224x224 crops from resized image with shorter side=256), the same as those in the paper:
    modeltop-1top-5
    ResNet-5022.9%6.7%
    ResNet-10121.8%6.1%
    ResNet-15221.4%5.7%

Third-party re-implementations

Deep residual networks are very easy to implement and train. We recommend to see also the following third-party re-implementations and extensions:
  1. By Facebook AI Research (FAIR), with training code in Torch and pre-trained ResNet-18/34/50/101 models for ImageNetblogcode
  2. Torch, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code
  3. Lasagne, CIFAR-10, with ResNet-32 and ResNet-56 and training code: code
  4. Neon, CIFAR-10, with pre-trained ResNet-32 to ResNet-110 models, training code, and curves: code
  5. Torch, MNIST, 100 layers: blogcode
  6. A winning entry in Kaggle's right whale recognition challenge: blogcode
  7. Neon, Place2 (mini), 40 layers: blogcode
  8. MatConvNet, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code
  9. TensorFlow, CIFAR-10, with ResNet-32,110,182 training code and curves: code
  10. MatConvNet, reproducing CIFAR-10 and ImageNet experiments (supporting official MatConvNet), training code and curves: blogcode
Converters:
  1. MatConvNet: url
  2. TensorFlow: url

没有评论:

发表评论