MXNet baseline model for iNaturalist Challenge at FGVC 2017 competition

I have prepared for a baseline model using MXNet for iNaturalist Challenge at FGVC 2017 competition on Kaggle. Github link is the public LB score is 0.117. Please follow this discussion thread if any questions.

How to use

Install MXNet

Run pip install mxnet-cu80 after installing CUDA driver or go to for the latest version from Github.

Windows users? no CUDA 8.0? no GPU? Please run pip search mxnet and find the good package for your platform.

Generate lists

After downloading and unzipping the train and test set in to data, along with the necessary .json annotation files, run python under data and generate train.lst val.lst test.lst

Generate rec files

A good way to speed up training is maximizing the IO by using .rec format, which also provides convenience of data augmentation. In the data/ directory, can generate train.rec and val.rec for the train and validate datasets, and can be obtained from MXNet repo . One can adjust --quality 95 parameter to lower quality for saving disk space, but it may take risk of loosing training precision.


Run sh which looks like (a 4 GTX 1080 machine for example):

python --pretrained-model model/resnet-152 \
    --load-epoch 0 --gpus 0,1,2,3 \
    --model-prefix model/iNat-resnet-152 \
	--data-nthreads 48 \
    --batch-size 48 --num-classes 5089 --num-examples 579184

please adjust --gpus and --batch-size according to the machine configuration. A sample calculation: batch-size = 12 can use 8 GB memory on a GTX 1080, so --batch-size 48 is good for a 4-GPU machine.

Please have internet connection for the first time run because needs to download the pretrained model from If the machine has no internet connection, please download the corresponding model files from other machines, and ship to model/ directory.

Generate submission file

After a long run of some epochs, e.g. 30 epochs, we can select some epochs for the submission file. Run sub.pywhich two parameters : num of epoch and gpu id like:

python 21 0

selects the 21st epoch and infer on GPU #0. One can merge multiple epoch results on different GPUs and ensemble for a good submission file.

How ‘fine-tune’ works

Fine-tune method starts with loading a pretrained ResNet 152 layers (Imagenet 11k classes) from MXNet model zoo, where the model has gained some prediction power, and applies the new data by learning from provided data.

The key technique is from lr_step_epochs where we assign a small learning rate and less regularizations when approach to certain epochs. In this example, we give lr_step_epochs='10,20' which means the learning rate changes slower when approach to 10th and 20th epoch, so the fine-tune procedure can converge the network and learn from the provided new samples. A similar thought is applied to the data augmentations where fine tune is given less augmentation. This technique is described in Mu’s thesis

This pipeline is not limited to ResNet-152 pretrained model. Please experiment the fine tune method with other models, like ResNet 101, Inception, from MXNet’s model zoo by following this tutorial and this sample code . Please feel free submit issues and/or pull requests and/or discuss on the Kaggle forum if have better results.



Deep learning for hackers with MXnet (1) GPU installation and MNIST


I am going to have a series of blogs about implementing deep learning models and algorithms with MXnet. The topic list covers MNIST, LSTM/RNN, image recognition, neural artstyle image generation etc. Everything here is about programing deep learning (a.k.a. deep learning for hackers), instead of theoritical tutorials, so basic knowledge of machine learning and neural network is a prerequisite. I assume readers know how neural network works and what backpropagation is. If difficulties, please review Andew Ng’s coursera class Week 4:

Surely, this blog doesn’t cover everything about deep learning. It is very important to understand the fundamental deep learning knowledge. For readers who want to know in-depth theoritical deep learning knowledge, please read some good tutorials, for example,

MXnet: lightweight, distributed, portable deep learning toolkit

MXnet is a deep learning toolkit written in C++11, and it comes with DMLC (Distributed (Deep) Machine Learning Common You might have known MXnet’s famous DMLC-sibling xgboost, a parallel gradient boosting decision tree which dominates most Kaggle competitions and is generally used in many projects.

MXnet is very lightweight, dynamic, portable, easy to distribute, memory efficient, and one of the coolest features is, it can run on portable devices (e.g. image recognition on your Android phone ) MXnet also has clear design plus clean C++11 code, let go star and fork it on github:

Recently MXnet has received much attention in multiple conferences and blogs for its unique features of speed and efficient memory usage. Professionals are comparing MXnet with Caffe, Torch7 and Google’s TensorFlow. These benchmarks show that MXnet is a new rising star. Go check this recent tweet from Quora’s Xavier Amatriain:

Install MXnet with GPU

MXnet natively supports multiple platforms (Linux, Mac OS X and Windows) and multiple languages (C++, Java, Python, R and Julia, plus a recent support on javascript, no joking MXnet.js). In this tutorial, we use Ubuntu 14.04 LTS and Python for example. Just a reminder that, since we use CUDA for GPU computing and CUDA hasn’t yet support ubuntu 15.10 or newer (with gcc 5.2), let’s stay with 14.04 LTS, or, at latest 15.04.

The installation can be done on physical machines with nVidia CUDA GPUs or cloud instance, for example AWS GPU instance g2.2xlarge or g2.8xlarge. The following steps mostly come from the official installation guide, with some CUDA modification.

Please note: for installing CUDA on AWS from scratch, some additional steps are needed for updating linux-image-extra-virtual and disabling nouveau, for more details, please refer to Caffe’s guide:,-CUDA-7,-cuDNN)

Install dependency

MXnet only needs minimal dependency: gcc, BLAS, and OpenCV (optional), that is it. One can install git just in case it hasn’t been installed.

sudo apt-get update
sudo apt-get install -y build-essential git libblas-dev libopencv-dev

Clone mxnet

git clone --recursive

Just another reminder that --recursive is needed: MXnet depends on DMLC common packages mshadow, ps-lite and dmlc-core, where --recursive can clone all necessary ones. Please don’t compile now, and we need to install CUDA firstly.

Install CUDA

CUDA installation here is universal for other deep learning packages. Please go to for selecting the CUDA installation for the corresponding system. For example, installing CUDA for Ubuntu 14.04 should looks like this, and deb(network) is suggested for fastest downloading from the closest Ubuntu source.


Or, here it is the command-line-only solution:

sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda

If everything goes well, please check the video card status by nvidia-smi, and it should look like this:


CPU info may vary, and I am using a GTX 960 4GB (approximately 200$ now). MXnet has very efficient memory usage, and 4GB is good for most of the problems. If your video card has only 2GB, MXnet is fine with it with some small parameter tunes too.

Optional: CuDNN. Mxnet supports cuDNN too. cuDNN is nVidia deep learning toolkit which optimizes operations like convolution, poolings etc, for better speed and memory usage. Usually it can speed up MXnet by 40% to 50%. If interested, please go apply for the developer program here, and install cuDNN by the official instruction when approved,

Compile MXnet with CUDA support

MXnet needs to turn on CUDA support in the configuration. Please find from mxnet/make/, copy to mxnet/, and edit these three lines:

USE_CUDA_PATH = /usr/local/cuda
USE_BLAS = blas

where the second line is for CUDA installation path. The path usually is /usr/local/cuda or /usr/local/cuda-7.5. If readers prefer other BLAS implementations. e.g. OpenBlas or Atlas, please change USE_BLAS to openblas or atlas and add the blas path to ADD_LDFLAGS and ADD_CFLAGS.

We can compile MXnet with CUDA (-j4 for multi-thread compiling):

make -j4

One more reminder that, if one has non-CUDA video cards, for example Intel Iris or AMD R9, or there is not video card, please change USE_CUDA to 0. MXnet is dynamic for switching between CPU and GPU: instead of GPU version, one can compile multi-theading CPU version by setting USE_OPENMP = 1 or leave it to 0 so BLAS can take care of multi-threading, either way is fine with MXnet.

Install Python support

MXnet natively supports Python, one can simply do:

cd python; python install

Python 2.7 is suggested while Python 3.4 is also supported. One might need setuptools and numpy if not yet installed. I personally suggest Python from Anaconda or Miniconda

(answer some installation questions)
conda install numpy

Let’s run MNIST, a handwritten digit recognizer

Now we have a GPU-ready MXnet, let’s have the first deep learning example: MNIST. MNIST is a handwritten digit dataset with 60,000 training samples and 10,000 testing samples, where each sample is a 28X28 greyscale picture of digits, and the goal of MNIST is training a smart machine learning model for recognizing hand-writing digits, for example, it recognizes the zip code that people write on envelops and helps our post masters distribute the mails.

Let’s run the example:

cd mxnet/example/image-classification/
python will download MNIST dataset for the first time, please be patient while downloading. The sample code will print out the loss and precision for each step, and the final result should be approximately 97%. by default uses CPU only. MXnet has easy switch between CPU and GPU. Since we have GPU, let’s turn it on by:

python --gpus 0

That is it. --gpus 0 means using the first GPU. If one has multiple GPUs, for example 4 GPUs, one can set --gpus 0,1,2,3 for using all of them. While running with GPU, the nvidia-smi should look like this:


where one can see python is using GPU. Since MNIST is not a heavy task, with MXnet efficient GPU meomory usage, GPU usage is about 30-40% while memory usage at 67MB。

Trouble shooting

When run with GPU for the first time, readers may encounter something like this:

ImportError: cannot open shared object file: No such file 

It is because of the PATH of CUDA dynamic link lib, one can add this to ./bashrc

export LD_LIBRARY_PATH=/usr/local/cuda-7.5/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH

Or compile it to MXnet by adding in

ADD_LDFLAGS = -I/usr/local/cuda-7.5/targets/x86_64-linux/lib/
ADD_CFLAGS =-I/usr/local/cuda-7.5/targets/x86_64-linux/lib/

MNIST code secrete revealed: design a simple MLP

In, there is an function get_mlp(). It implements a multilayer perceptron (MLP). In MXnet, a MLP needs some structure definition, like this in the code:

data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)
mlp = mx.symbol.Softmax(data = fc3, name = 'mlp')

Let’s understand what is going on for this neural network. Samples in MNIST look like these:

  • Each same (a digit) is a 28X28 pixel grey scale image, which can be represented as a vector of 28X28=784 float value where each value is the grey scale of the pixel.
  • In MLP, each layer needs a layer structure. For example in the first layer, fc1 is a full connected layer mx.symbol.FullyConnected which takes input from data. This first layer has 128 nodes, defined as num_hidden.
  • Each layer also need an activation function Activation to connect to the next layer, in other words, transferring values from the current layer to the next layer. In this example, the Activation function for connecting layer fc1 and fc2 is ReLu, which is short for rectified linear unit or Rectifier, a function as f(x)=max(0,x). ReLU is a very commonly used activation function in deep learning, mostly because it is easy to calculate and easy to converge in gradient decent. In addition, we choose ReLU for the MNIST problem because MNIST has sparse feature where most values are 0. For more information about ReLU, please check wikipedia ) and other deep learning books or tutorials.
  • The second layer fc2 is similar to the first layer fc1: it takes the output from fc1 as input, and output to the third layer fc3.
  • fc3 works similar to the previous layer, and the only difference is that, it is an output layer, which has 10 nodes where each node outputs the probability of being one of the 10 digits, thus num_hidden=10.

With this network structure, MXnet also needs the stucture of input. Since each sample is 28X28 grey scale, MXnet takes the grey scale value vector of 28X28=784 elements, and give a python iterator get_iterator() for feeding data to the network defined above. The detailed code is in the example which is very clean, so I don’t copy-n-paste here.

The final step is running the model. If readers know scikit-learn, MXnet’s python looks very familiar, right?, net, get_iterator)

Congratulations! We can implement a MLP! It is the first step of deep learning, not hard, right?

It is Q&A time now. Some of my dear readers may ask, “Do I need to design my MNIST recognizer or some other deep learning network exactly like what you did here?” “hey, I see another function get_lenet() in the code, what is that?”

The answer to these two questions can be, most real life problems on deep learning are about designing the networks, and, no, you don’t have to design your network exactly like what I did here. Designing the neural network is art, and each problem needs a different network, and a single problem may have multiple solutions. In the MNIST example code, get_lenet() implements Yann Lecun’s convolution network LeNet for digit recognition, where each layer needs Convolution Activation and Pooling where the kernel size and filter are needed, instead of FullyConnected and ReLU. FYI: the detailed explanation of super cool convolution network (ConvNet) can be found at Yann Lecun’s tutorial: . Another good reference can be “Understanding ConvNet” by Colah. I may later on write a blog post for explaining ConvNet, since convnet is my personal favorite.

I have a fun homework for my dear readers. Let’s tune up some network parameters like the number of nodes and the activation function in get_mlp(), and see how it helps the precision and accuracy. One can also try changing num_epoch (number of iterations of learning from the data) and learning_rate (the speed of gradient decent, a.k.a the rate of learning) for better speed and precision. Please leave your comment for your network structure and precision score. Kaggle also has a MNIST competition, one can also go compete with mxnet MNIST models, please mention it is MXnet model. The portal:


Thanks for reading the first blog of “Deep learning for hackers with MXnet”. The following blogs will include some examples in MXnet, which may include RNN/LSTM for generating a Shakespeare script (well, looks like Shakespeare), generative models of simulating Van Gogh for painting a cat, etc. Some of these models are available now on MXnet github Is MXnet cool? Go star and fork it on github