Episode 10: Notes from Deep Learning Course (Part I)

This is part of a series of posts on some of my notes from course on Deep Learning taught by Prof. Yann LeCunn at NYU in Spring 2020 [1].

From Week 1 - 3:

Concepts:

Source: James Dellinger, Weight Initialization in Neural Networks: A Journey From the Basics to Kaiming

Thought of the Week:

Generative Pre-trained Transformer 3 (GPT-3) is here! It is now the largest language model that we know of, developed by an independent lab called OpenAI in San Francisco. It has 175 billion parameters and has been trained on millions of text documents including Google books and Wikipedia articles. The underlying principle behind GPT modeling is semi-supervised learning which means to train the model on a large-sized unlabeled data in unsupervised manner and then fine-tune it by supervised learning on small set of labeled examples. [4]

Even though GPT-3 was developed for sentence completion tasks by its developers but it turns out that it can also be used to write code! Seems like it’s time for us computer scientists to get insecure about our jobs. Check out this interesting article in New York Times to know more fun stuff about GPT-3.

Until next time!

References:

[1] Yann LeCun Course
[2] 0th order methods
[3] Kaiming Trick of weight initialization
[4] GPT-3

Share this: