Setup Amazon AWS GPU instance with MXnet

Niba Meisje met de parel

(every blog should have a cat, right? This cat is “Niba”, and she is Pogo’s younger sister. The neural art style is “Girl with a Pearl Earring“. More about Neural Art with MXnet, please refer to my last blog.)

With popular requests, I wrote this blog for starting an Amazon AWS GPU instance and install MXnet for kaggle competitions, like Second Annual Data Science Bowl. Installing CUDA on AWS is kind of tricky: one needs to update kernels and solve some conflicts. I want to have special thanks to Caffe EC2 installation guide,-CUDA-7,-cuDNN)

Have you clicked “star” on MXnet’s github repo? If not yet, do it now:

Create AWS GPU instance

For creating an AWS instance, please follow AWS spot instance can be an inexpensive solution for competing on Kaggle, and one can request a g2.2xlarge (single GPU) or a g2.8xlarge (4x GPUs) instance from the AWS console. The price when I wrote this blog is about 0.08$/h for g2.2xlarge and 0.34$/h for g2.8xlarge. Once approved, one can start an official Ubuntu 14.04 instance and start installing MXnet. One can save this AMI image for future reference.

Install dependencies


sudo apt-get update && sudo apt-get upgrade
sudo apt-get install -y build-essential git libcurl4-openssl-dev libatlas-base-dev libopencv-dev python-numpy unzip


Update Linux kernel on AWS

sudo apt-get install linux-image-extra-virtual

“Important: While installing the linux-image-extra-virtual, you may be prompted “What would you like to do about menu.lst?” I selected “keep the local version currently installed” ” as on Caffe reference.

Disable nouveau

nouveau conflicts with NVIDIA’s kernel module on AWS. One needs to edit /etc/modprobe.d/blacklist-nouveau.conf and disable nouveau.

sudo vi /etc/modprobe.d/blacklist-nouveau.conf

    blacklist nouveau
    blacklist lbm-nouveau
    options nouveau modeset=0
    alias nouveau off
    alias lbm-nouveau off

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot

Wait before finish rebooting, usually 1 min, and login back to the instance to continue installation:

sudo apt-get install -y linux-source linux-headers-`uname -r`


Install CUDA

sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

Important: Please reboot the instance for loading the driver

sudo reboot

If everything is fine, nvidia-smi should look like this (4 GPU instance) for example:

Screen Shot 2016-01-15 at 11.58.02 PM

Optional: cuDNN

One can apply for the developer program here When approved, download cuDNN for Linux (either v4 RC or v3 is fine), upload the cuDNN package from the local computer to the instance, and install cuDNN:

tar -zxf cudnn-7.0-linux-x64-v4.0-rc.tgz #or cudnn-7.0-linux-x64-v3.0-prod.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/


Install MXnet

git clone --recursive
cd mxnet; cp make/ .

#add CUDA options
echo "USE_CUDA=1" >>
echo "USE_CUDA_PATH=/usr/local/cuda" >>
#if you have cuDNN, uncomment the following line
#echo "USE_CUDNN=1" >>
echo "USE_BLAS=atlas" >>
echo "USE_DIST_KVSTORE = 1" >>
echo "USE_S3=1" >>
make -j8


Add some link lib path

echo "export LD_LIBRARY_PATH=/home/ubuntu/mxnet/lib/:/usr/local/cuda-7.5/targets/x86_64-linux/lib/" >>> ~/.bashrc

Install python package

One can either use the system’s python or Miniconda/Anaconda python as mentioned in my previous blog. If use system’s python, do:

sudo apt-get install -y python-pip
cd python
python install --user

Test it

python example/image-classification/ --network lenet --gpus 0

One can also give --gpus 0,1,2,3 for using all 4 GPUs, if runs on a g2.8xlarge (4x GPUs) instance. Enjoy Kaggle competitions with MXnet!

Some trouble shooting

One may see some annoying message when starting training with GPU

libdc1394 error: Failed to initialize libdc1394

It is OpenCV problem on AWS. One can simply disable it by:

sudo ln /dev/null /dev/raw1394



10 thoughts on “Setup Amazon AWS GPU instance with MXnet

  1. I am seeing the following error:
    g++: error: dmlc-core/libdmlc.a: No such file or directory
    Thank yo for the pointer on the last issue I posted. Your solution worked but now I am wondering why this error…any ideas?

  2. started all over and ran into an OSError this time around:
    OSError: cannot open shared object file: No such file or directory

    I’m wondering if this has something to do with the link lin path?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s