
(every blog should have a cat, right? This cat is “Niba”, and she is Pogo’s younger sister. The neural art style is “Girl with a Pearl Earring“. More about Neural Art with MXnet, please refer to my last blog.)
With popular requests, I wrote this blog for starting an Amazon AWS GPU instance and install MXnet for kaggle competitions, like Second Annual Data Science Bowl. Installing CUDA on AWS is kind of tricky: one needs to update kernels and solve some conflicts. I want to have special thanks to Caffe EC2 installation guide https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)
Have you clicked “star” on MXnet’s github repo? If not yet, do it now: https://github.com/dmlc/mxnet
Create AWS GPU instance
For creating an AWS instance, please follow http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html. AWS spot instance can be an inexpensive solution for competing on Kaggle, and one can request a g2.2xlarge (single GPU) or a g2.8xlarge (4x GPUs) instance from the AWS console. The price when I wrote this blog is about 0.08$/h for g2.2xlarge and 0.34$/h for g2.8xlarge. Once approved, one can start an official Ubuntu 14.04 instance and start installing MXnet. One can save this AMI image for future reference.
Install dependencies
Preparation
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install -y build-essential git libcurl4-openssl-dev libatlas-base-dev libopencv-dev python-numpy unzip
Reference: http://mxnt.ml/en/latest/build.html
Update Linux kernel on AWS
sudo apt-get install linux-image-extra-virtual
“Important: While installing the linux-image-extra-virtual, you may be prompted “What would you like to do about menu.lst?” I selected “keep the local version currently installed” ” as on Caffe reference.
Disable nouveau
nouveau conflicts with NVIDIA’s kernel module on AWS. One needs to edit /etc/modprobe.d/blacklist-nouveau.conf and disable nouveau.
sudo vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot
Wait before finish rebooting, usually 1 min, and login back to the instance to continue installation:
sudo apt-get install -y linux-source linux-headers-`uname -r`
Reference: https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)
Install CUDA
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda
Important: Please reboot the instance for loading the driver
sudo reboot
If everything is fine, nvidia-smi should look like this (4 GPU instance) for example:

Optional: cuDNN
One can apply for the developer program here https://developer.nvidia.com/cudnn. When approved, download cuDNN for Linux (either v4 RC or v3 is fine), upload the cuDNN package from the local computer to the instance, and install cuDNN:
tar -zxf cudnn-7.0-linux-x64-v4.0-rc.tgz #or cudnn-7.0-linux-x64-v3.0-prod.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
Reference: https://no2147483647.wordpress.com/2015/12/07/deep-learning-for-hackers-with-mxnet-1/
Install MXnet
git clone --recursive https://github.com/dmlc/mxnet
cd mxnet; cp make/config.mk .
#add CUDA options
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
#if you have cuDNN, uncomment the following line
#echo "USE_CUDNN=1" >>config.mk
echo "USE_BLAS=atlas" >> config.mk
echo "USE_DIST_KVSTORE = 1" >>config.mk
echo "USE_S3=1" >>config.mk
make -j8
Reference: http://mxnt.ml/en/latest/build.html
Add some link lib path
echo "export LD_LIBRARY_PATH=/home/ubuntu/mxnet/lib/:/usr/local/cuda-7.5/targets/x86_64-linux/lib/" >>> ~/.bashrc
Install python package
One can either use the system’s python or Miniconda/Anaconda python as mentioned in my previous blog. If use system’s python, do:
sudo apt-get install -y python-pip
cd python
python setup.py install --user
Test it
python example/image-classification/train_mnist.py --network lenet --gpus 0
One can also give --gpus 0,1,2,3 for using all 4 GPUs, if runs on a g2.8xlarge (4x GPUs) instance. Enjoy Kaggle competitions with MXnet!
Some trouble shooting
One may see some annoying message when starting training with GPU
libdc1394 error: Failed to initialize libdc1394
It is OpenCV problem on AWS. One can simply disable it by:
sudo ln /dev/null /dev/raw1394
Reblogged this on learn and enjoy.
I am seeing the following error:
g++: error: dmlc-core/libdmlc.a: No such file or directory
Thank yo for the pointer on the last issue I posted. Your solution worked but now I am wondering why this error…any ideas?
started all over and ran into an OSError this time around:
OSError: libcudart.so.7.5: cannot open shared object file: No such file or directory
I’m wondering if this has something to do with the link lin path?
I ran into the same error.
This post mentions a similar error.
https://no2147483647.wordpress.com/2015/12/07/deep-learning-for-hackers-with-mxnet-1/
So I tried the following command.
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH
Then, I tested this,
python example/image-classification/train_mnist.py –network lenet –gpus 0
and achieved Validation-accuracy=0.992188.
Hope this helps.
Good guide! Not just for AWS, and also for my gaming laptop!
MXNet people need give your guide more attention! It helped me a lot!
Oh, thanks. Actually I am working with MXNet team and this tutorial is included in `awesome mxnet` page https://github.com/dmlc/mxnet/tree/master/example
Please add ‘export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH’ to your awesome blog entry.
Thanks. I thought CUDA installation should have done it.
Reblogged this on Do not stop thinking.