地址：梦之帆留学地址：北京市朝阳区东三环中路39号 建外SOHO 东区 写字楼B座 8层0807（地铁国贸站 C口出100米）
加州大学伯克利分校（University of California, Berkeley），是一所位于美国加州旧金山东湾伯克利的公立研究型大学。其许多科系位于全球大学排行前十名，是世界上最负盛名的大学之一。伯克利加大曾有 72 位教职员或学生为诺贝尔奖得主、9 位沃尔夫奖、13 位菲尔兹奖、22 位图灵奖、45 位麦克阿瑟奖、20 位奥斯卡金像奖及 14 位普立兹奖得主。
科研导师：UCB - EECS 类专业导师；
科研地点：UCB 科研组会议室；科研时间：暑假，每期时间长度为 4 周左右；具体情况根据学生面试情况由美方进行调整；
报名后 1 周安排面试，面试前辅导学生阅读 1 篇专业论文；
I. [DL+System] Large-Scale Deep Neural Networks
Training on Supercomputers
Keywords: Deep Learning, supercomputing, distributed systems
Candidates: Students must have strong programming background (C/C++ and Python) and good machine learning knowledge to begin with.
Students need to be good at Linux command lines.
Students with good knowledge of TensorFlow programming, linear algebra,optimization,and parallel/distributed programming are preferred.
After the research project, in addition to a technical report or a paper, the students should learn the following skills:
● The process of computer science research: analyzing the pros and cons of an algorithm,designing numerical experiments, and writing a good scientific paper;
● The application of distributed systems and Supercomputers on emerging applications like deep learning;
● The codesign between system (supercomputer) and algorithm (deep learning technique).
Introduction: Deep neural network (i.e. Deep Learning) is the most successful artificial intelligence technique. However, deep neural networks training is extremely slow.
For example,finishing 90-epoch ImageNet-1k training with ResNet-50 model on a NVIDIA M40 GPU takes 14days.
It can take several months on a MAC laptop. This training requires 10^18 single precision operations in total.
On the other hand, the world's current fastest supercomputer can finish 2 * 10^17 single precision operations per second. If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in five seconds.
However, the current bottleneck for fast DNN training is in the algorithm level.
Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors.
In this project, students will focus on design a new optimization algorithm that can make full use thousands of computing servers.
The students are also welcome to propose their own project in related areas.
Specific ideas cannot be disclosed via this introduction, but raw directions include:
● Explore and explain why extremely large batches often lose accuracy. It is will be good if the students can give either a mathematical or empirical answer.
● Studying advanced optimization methods and trying to replace Momentum SGD or state-of-the-art adaptive optimization solvers. Ideally, the new proposed optimization solver should scale the batch size to at least 64K without losing accuracy for ImageNet training.
● The students can try designing some new parallel machine learning algorithms like model-parallelism approach or asynchronous approach.
A tentative 4-week plan:
● Week 1: get familiar with programming on supercomputers and build the platforms like distributed Tensorflow or Uber Horovod. Read 3-5 related papers.
● Week 2: reproduce the results of state-of-the-art approaches. Evaluate the pros and cons of existing approaches.
● Week 3: design our own algorithm and write the design documentation.
● Week 4: implement the proposed algorithm, conduct experiments, and write a technical report.
References:1. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
3. Large Batch Training of Convolutional Networks
4. Large Scale GAN Training for High Fidelity Natural Image Synthesis
5. Train longer, generalize better: closing the generalization gap in large batch training of neural networks
6. Don't Decay the Learning Rate, Increase the Batch Size
- 丰富的 CV、PS 履历；获得科研证书，表现优异者可获导师推荐信助力申请。