Setup Amazon AWS GPU instance with MXnet

Niba Meisje met de parel

(every blog should have a cat, right? This cat is “Niba”, and she is Pogo’s younger sister. The neural art style is “Girl with a Pearl Earring“. More about Neural Art with MXnet, please refer to my last blog.)

With popular requests, I wrote this blog for starting an Amazon AWS GPU instance and install MXnet for kaggle competitions, like Second Annual Data Science Bowl. Installing CUDA on AWS is kind of tricky: one needs to update kernels and solve some conflicts. I want to have special thanks to Caffe EC2 installation guide https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)

Have you clicked “star” on MXnet’s github repo? If not yet, do it now: https://github.com/dmlc/mxnet

Create AWS GPU instance

For creating an AWS instance, please follow http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html. AWS spot instance can be an inexpensive solution for competing on Kaggle, and one can request a g2.2xlarge (single GPU) or a g2.8xlarge (4x GPUs) instance from the AWS console. The price when I wrote this blog is about 0.08$/h for g2.2xlarge and 0.34$/h for g2.8xlarge. Once approved, one can start an official Ubuntu 14.04 instance and start installing MXnet. One can save this AMI image for future reference.

Install dependencies

Preparation

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install -y build-essential git libcurl4-openssl-dev libatlas-base-dev libopencv-dev python-numpy unzip

Reference: http://mxnt.ml/en/latest/build.html

Update Linux kernel on AWS

sudo apt-get install linux-image-extra-virtual

“Important: While installing the linux-image-extra-virtual, you may be prompted “What would you like to do about menu.lst?” I selected “keep the local version currently installed” ” as on Caffe reference.

Disable nouveau

nouveau conflicts with NVIDIA’s kernel module on AWS. One needs to edit /etc/modprobe.d/blacklist-nouveau.conf and disable nouveau.

sudo vi /etc/modprobe.d/blacklist-nouveau.conf

    blacklist nouveau
    blacklist lbm-nouveau
    options nouveau modeset=0
    alias nouveau off
    alias lbm-nouveau off

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot

Wait before finish rebooting, usually 1 min, and login back to the instance to continue installation:

sudo apt-get install -y linux-source linux-headers-`uname -r`

Reference: https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)

Install CUDA

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

Important: Please reboot the instance for loading the driver

sudo reboot

If everything is fine, nvidia-smi should look like this (4 GPU instance) for example:

Screen Shot 2016-01-15 at 11.58.02 PM

Optional: cuDNN

One can apply for the developer program here https://developer.nvidia.com/cudnn. When approved, download cuDNN for Linux (either v4 RC or v3 is fine), upload the cuDNN package from the local computer to the instance, and install cuDNN:

tar -zxf cudnn-7.0-linux-x64-v4.0-rc.tgz #or cudnn-7.0-linux-x64-v3.0-prod.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/

Reference: https://no2147483647.wordpress.com/2015/12/07/deep-learning-for-hackers-with-mxnet-1/

Install MXnet

git clone --recursive https://github.com/dmlc/mxnet
cd mxnet; cp make/config.mk .

#add CUDA options
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
#if you have cuDNN, uncomment the following line
#echo "USE_CUDNN=1" >>config.mk
echo "USE_BLAS=atlas" >> config.mk
echo "USE_DIST_KVSTORE = 1" >>config.mk
echo "USE_S3=1" >>config.mk
make -j8

Reference: http://mxnt.ml/en/latest/build.html

Add some link lib path

echo "export LD_LIBRARY_PATH=/home/ubuntu/mxnet/lib/:/usr/local/cuda-7.5/targets/x86_64-linux/lib/" >>> ~/.bashrc

Install python package

One can either use the system’s python or Miniconda/Anaconda python as mentioned in my previous blog. If use system’s python, do:

sudo apt-get install -y python-pip
cd python
python setup.py install --user

Test it

python example/image-classification/train_mnist.py --network lenet --gpus 0

One can also give --gpus 0,1,2,3 for using all 4 GPUs, if runs on a g2.8xlarge (4x GPUs) instance. Enjoy Kaggle competitions with MXnet!

Some trouble shooting

One may see some annoying message when starting training with GPU

libdc1394 error: Failed to initialize libdc1394

It is OpenCV problem on AWS. One can simply disable it by:

sudo ln /dev/null /dev/raw1394

Reference: http://stackoverflow.com/questions/12689304/ctypes-error-libdc1394-error-failed-to-initialize-libdc1394

Advertisements