This blog is originally posted in Chinese from my
So long from last blog, and thanks for coming back. Here I want to present a MXNet example of instant neural art style transfer, from which you can build your own Prisma app.
Do you know
MXNet now can be installed via
pip search mxnet mxnet-cu75 (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses CUDA-7.5. mxnet (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses openblas. mxnet-cu80 (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses CUDA-8.0.
MXNet, please do
git clone https://github.com/zhaw/neural_style
which includes three different implementations of fast neural art style transfer. Big thanks to the author Zhao Wei. In this blog, I am going to talk about Perceptual Losses by Justin Johnson et al described in this paper. After
git clone, please go to
neural_style/perceptual/ and execute the following script:
import make_image maker = make_image.Maker('models/s4', (512, 512)) maker.generate('output.jpg', 'niba.jpg')
output.jpg is the output and
niba.jpg is picture of the cutest deep learning cat
Niba. Within a blink, we can see the output like this:
Beside this art style, multiple other pretrained neural art models are mentioned in
README page under
neural_style/perceptual/, please download them via the link mentioned in the page. These pretrained models should produce the art work and combine with
montage output*.jpg -geometry +7+7+7 merge.jpg
Please note: some machines may encounter the following error
terminate called after throwing an instance of 'dmlc::Error' what(): [21:25:23] src/engine/./threaded_engine.h:306: [21:25:23] src/operator/./convolution-inl.h:299: Check failed: (param_.workspace) >= (required_size)
The reason behind this is from the
workspace size of the convolution layers, where the default
workspace might be too small for some large images. Please edit
symbol.py by adding
workspace=4092 to each
Hope you have some fun with your own
Prisma app 🙂
Neural art transfer has been a hot topic in deep learning, and it starts from this paper
A Neural Algorithm of Artistic Style. As we have discussed in the last blog, this idea leverages the power of convolutional network where the high level features can describe so called
style of an image, if apply this high level feature to a new image, one can transfer the art style and generate new art work. In the original paper,
gram matrix is used for this magic. To understand
gram matrix magic, one can take look at my friend’s paper
Demystifying Neural Style Transfer for further understanding. There are many blogs and papers trying to understand why neural art transfer works, and this paper is probably the only correct one.
Back to the original neural art transfer: the original version calculates the per-pixel loss from the content image to the style image, and introduces a very large
gram matrix, meanwhile, it has to run a logistic regression for tuning the weight of each layer. This method needs much computing time due to couple of heavy load from per-pixel loss, gram matrix plus the LR. In the market, there are several faster implementation, where
Perceptual Losses method is one of the fastest ones.
Perceptual Losses introduces pretrained
loss networks from
ImageNet, and re-uses the content loss and style loss to calculate perceptual loss, however, it doesn’t update the loss network, which saves much computing time. It works like this: when give the input image (e.g.
Niba) to the transform network, it calculates the loss from the pretrained loss network, and gets back to transform network to minimize the loss, so transform network can learn the loss network style from minimizing the loss.
Perceptual loss network needs a set of pretrained network where each network for a style. One can follow
train.py under the same repo for creating new styles.
Why I paused updating this blog for a long time and resume?
Because I was carefully thinking about teaching
deep learning in a different way, much different from many other blogs or medium posts where each tutorial starts with theory or math or whatever fundamental knowledge, needs at least 30 minutes reading time, professionals don’t like the repeated fundamental knowledge part since they already know it, but new readers can’t understand what to do.
I believe the only way readers can remember the knowledge is by
JUST DO IT!. From last year, I opened my category on
zhihu.com and started publishing
two minutes demo of deep learning in Chinese. It turned out very welcomed: my 2000+ followers had much fun trying these demos, they really learned after doing it and reading the
theory part. If miss some math knowledge, I showed them where to learn. So, I am thinking that, why don’t I translate it back to English, and share with more readers. I will keep posting more blogs like this, hope you like them.
And, as always, have you clicked the
fork buttons on
MXNet repo https://github.com/dmlc/mxnet ?