The last Paris Machine Learning meetup #12, hosted at Google Paris, was actually held Europe wide, together with London, Berlin and Zurich ML meetups. Andrew Ng was the guest star, available from San Francisco through Hangout. You can watch the video of the entire meetup, which is available on YouTube, but make yourself comfortable because it’s almost 3h30 long. Here I will only write about Andrew Ng’s talk and the first set of questions he was asked, that is only a bit more than half-an-hour.
Andrew Ng talked about deep learning, a subject on which he’s been involved a lot with his teams, and which he describes as “our best shot at progress towards real AI”.
In the traditional learning framework, there are 3 steps:
- take an input,
- design & extract features,
- feed a learning algorithm.
The idea behind deep learning is driven by the “one learning algorithm” hypothesis, i.e. the idea that there must be somehow an algorithm that could learn to process almost any type of data. Our brain is capable of very impressive rewiring to reuse areas of the cortex for different kinds of learning, when a sensor is disconnected. There must be something in common in the way our brain learns to process the different channels.
A way to implement that is with representation learning, that is try to learn the features. Andrew Ng described how sparse coding is a useful way to learn features, not only for images, but also applicable to audio. And you can repeat this to build feature hierarchies.
After observing that this kind of system worked ever better when increasing the number of features, the idea was to scale up, really. He started working on neural networks with millions of parameters, with the Google Brain project. They made it scale up to 1 billion parameters, and famously made the neural network watch YouTube for a week. It learnt the concepts of faces & cats. They were able to ship new products with the same technology.
The next question was: how to make it more easily available, i.e. without needing the huge Google infrastucture ? In a word: use GPUs.
For future work, Andrew Ng explained that deep learning has been used in practical application mostly with supervised learning — to exploit the large amount of available labeled data accompanying the digitization of our society. It is an interesting feature of deep learning algorithms that it keeps getting better and better with more data. He pointed to the fact that there’s been an underinvestment on unsupervised learning. The obvious reason is that it is hard. However, this is probably nearer to how we learn. He gave the example of how we teach a child to recognize a car: you won’t point to him tens of thousands of cars, however loving a parent you are! A few labeled examples are enough. Most of the learning is unsupervised.
About his own future work, Andrew Ng explained that, after Coursera, he wants to spend more time working on machine learning, GPUs and AI, which is what he’ll be doing at Baidu.
Andrew Ng then proceeded to answer the questions asked on Google moderator.
1. “Could you give your top 3 most promising topics in machine learning ?” The first answer was, “unsupervised learning”. Then, about supervised learning, he mentioned the importance of giving tools and making infrastructures easily available to teams, and then listed “speech recognition” and “language embedding”.
2. “Your introductory course to ML at coursera was really great. Will you teach an advanced ML course at coursera with latest techniques around Convolutional Neural Networks, Deep Learning etc. ?” He’s not sure how he’ll find the time to do it, but he thinks about it and wishes to do so.
3. “How do you see the job of a Data Scientist in the future?” The increase in digital activities that create data and the rapid drop in computational cost are at the origin of the “big data” trend. As long as these two phenomena continue, the demand for data scientists will grow, and don’t worry, deep learning won’t replace them any time soon. This is an exciting discipline. Data science creates value.
4. “What are common use-cases where re-sampling (e.g. bootstrapping) is not sufficient for estimating distributions and considering the whole (Big)Data set is a real advantage?” Large neural networks need lots of data. When you have a lot of parameters, with a lot of flexibility in the model, booststrapping doesn’t help. With high VC-dimension, it is simply better to increase the size of the data set.
5. “What will be the next “killer application” of Deep Learning after visual object recognition and speech recognition?” Thinks that’s speech recognition. Vision & language are still to come.
6. “How do you see the gap between the research and practical implementation of ML algorithms?” You should minimize this gap. Have researcher for the innovative ideas, but avoid steps between researchers and production implementation. The same person does the research & works with the product team.
7. “What is it that you find most difficult in machine learning?” Innovation. Innovation requires teamwork. This entails employee development and teaching, to empower people to innovate “like crazy”.
I’m partial to question n°7 since I’m the one who asked it. I really like his answer, now it’s no wonder why he founded Coursera.
Deep learning is not a new fad. The theory behind it is decades old, it’s the result of years of research, and it’s booming due to hardware improvements. It’s not pure chance that it works so well. And there’s still research to do to make it even better.
More data & more computational power make for better performance. OK. No intelligence required? It seems it’s not that obvious.
Keep in mind that innovation is at the core of a researcher’s work. And that is the most difficult. Choose/educate your team well. Share knowledge.
Yann LeCun had a different approach to introducing deep learning for his talk at ESIEE a few days before Andrew Ng. Maybe I’ll write about it later.
Please share your thoughts about this post and the talk on Reddit.