Data Science: Transformers for Natural Language Processing

ChatGPT, GPT-4, BERT, Deep Learning, Machine Learning, & NLP with Hugging Face, Attention in Python, Tensorflow, PyTorch

Register for this Course

$54.99 $219.99 USD 75% OFF!

Login or signup to register for this course

Have a coupon? Click here.

Course Data

Lectures: 128
Length: 18h 20m
Skill Level: All Levels
Languages: English
Includes: Lifetime access, certificate of completion (shareable on LinkedIn, Facebook, and Twitter), Q&A forum

Course Description

Hello friends!

Welcome to Data Science: Transformers for Natural Language Processing.

Ever since Transformers arrived on the scene, deep learning hasn't been the same.

  • Machine learning is able to generate text essentially indistinguishable from that created by humans
  • We've reached new state-of-the-art performance in many NLP tasks, such as machine translation, question-answering, entailment, named entity recognition, and more
  • We've created multi-modal (text and image) models that can generate amazing art using only a text prompt
  • We've solved a longstanding problem in molecular biology known as "protein structure prediction"

In this course, you will learn very practical skills for applying transformers, and if you want, detailed theory behind how transformers and attention work.

This is different from most other resources, which only cover the former.

The course is split into 3 major parts:

  1. Using Transformers
  2. Fine-Tuning Transformers
  3. Transformers In-Depth

PART 1: Using Transformers

In this section, you will learn how to use transformers which were trained for you. This costs millions of dollars to do, so it's not something you want to try by yourself!

We'll see how these prebuilt models can already be used for a wide array of tasks, including:
  • text classification (e.g. spam detection, sentiment analysis, document categorization)
  • named entity recognition
  • text summarization
  • machine translation
  • question-answering
  • generating (believable) text
  • masked language modeling (article spinning)
  • zero-shot classification

This is already very practical.

If you need to do sentiment analysis, document categorization, entity recognition, translation, summarization, etc. on documents at your workplace or for your clients - you already have the most powerful state-of-the-art models at your fingertips with very few lines of code.

One of the most amazing applications is "zero-shot classification", where you will observe that a pretrained model can categorize your documents, even without any training at all.

PART 2: Fine-Tuning Transformers

In this section, you will learn how to improve the performance of transformers on your own custom datasets. By using "transfer learning", you can leverage the millions of dollars of training that have already gone into making transformers work very well.

You'll see that you can fine-tune a transformer with relatively little work (and little cost).

We'll cover how to fine-tune transformers for the most practical tasks in the real-world, like text classification (sentiment analysis, spam detection), entity recognition, and machine translation.

PART 3: Transformers In-Depth

In this section, you will learn how transformers really work. The previous sections are nice, but a little too nice. Libraries are OK for people who just want to get the job done, but they don't work if you want to do anything new or interesting.

Let's be clear: this is very practical.

How practical, you might ask?

Well, this is where the big bucks are.

Those who have a deep understanding of these models and can do things no one has ever done before are in a position to command higher salaries and prestigious titles. Machine learning is a competitive field, and a deep understanding of how things work can be the edge you need to come out on top.

We'll look at the inner workings of encoders, decoders, encoder-decoders, BERT, GPT, GPT-2, GPT-3, GPT-3.5, ChatGPT, and GPT-4 (for the latter, we are limited to what OpenAI has revealed).

We'll also look at how to implement transformers from scratch.

As the great Richard Feynman once said, "what I cannot create, I do not understand".


  • Decent Python coding skills
  • Deep learning with CNNs and RNNs useful but not required
  • Deep learning with Seq2Seq models useful but not required
  • For the in-depth section: understanding the theory behind CNNs, RNNs, and seq2seq is very useful

Thank you for reading and I hope to see you soon!

Testimonials and Success Stories

I am one of your students. Yesterday, I presented my paper at ICCV 2019. You have a significant part in this, so I want to sincerely thank you for your in-depth guidance to the puzzle of deep learning. Please keep making awesome courses that teach us!

I just watched your short video on “Predicting Stock Prices with LSTMs: One Mistake Everyone Makes.” Giggled with delight.

You probably already know this, but some of us really and truly appreciate you. BTW, I spent a reasonable amount of time making a learning roadmap based on your courses and have started the journey.

Looking forward to your new stuff.

Thank you for doing this! I wish everyone who call’s themselves a Data Scientist would take the time to do this either as a refresher or learn the material. I have had to work with so many people in prior roles that wanted to jump right into machine learning on my teams and didn’t even understand the first thing about the basics you have in here!!

I am signing up so that I have the easy refresh when needed and the see what you consider important, as well as to support your great work, thank you.

Thank you, I think you have opened my eyes. I was using API to implement Deep learning algorithms and each time I felt I was messing out on some things. So thank you very much.

I have been intending to send you an email expressing my gratitude for the work that you have done to create all of these data science courses in Machine Learning and Artificial Intelligence. I have been looking long and hard for courses that have mathematical rigor relative to the application of the ML & AI algorithms as opposed to just exhibit some 'canned routine' and then viola here is your neural network or logistical regression. ...


I have now taken a few classes from some well-known AI profs at Stanford (Andrew Ng, Christopher Manning, …) with an overall average mark in the mid-90s. Just so you know, you are as good as any of them. But I hope that you already know that.

I wish you a happy and safe holiday season. I am glad you chose to share your knowledge with the rest of us.

Hi Sir I am a student from India. I've been wanting to write a note to thank you for the courses that you've made because they have changed my career. I wanted to work in the field of data science but I was not having proper guidance but then I stumbled upon your "Logistic Regression" course in March and since then, there's been no looking back. I learned ANNs, CNNs, RNNs, Tensorflow, NLP and whatnot by going through your lectures. The knowledge that I gained enabled me to get a job as a Business Technology Analyst at one of my dream firms even in the midst of this pandemic. For that, I shall always be grateful to you. Please keep making more courses with the level of detail that you do in low-level libraries like Theano.

I just wanted to reach out and thank you for your most excellent course that I am nearing finishing.

And, I couldn't agree more with some of your "rants", and found myself nodding vigorously!

You are an excellent teacher, and a rare breed.

And, your courses are frankly, more digestible and teach a student far more than some of the top-tier courses from ivy leagues I have taken in the past.

(I plan to go through many more courses, one by one!)

I know you must be deluged with complaints in spite of the best content around That's just human nature.

Also, satisfied people rarely take the time to write, so I thought I will write in for a change. :)

Hello, Lazy Programmer!

In the process of completing my Master’s at Hunan University, China, I am writing this feedback to you in order to express my deep gratitude for all the knowledge and skills I have obtained studying your courses and following your recommendations.

The first course of yours I took was on Convolutional Neural Networks (“Deep Learning p.5”, as far as I remember). Answering one of my questions on the Q&A board, you suggested I should start from the beginning – the Linear and Logistic Regression courses. Despite that I assumed I had already known many basic things at that time, I overcame my “pride” and decided to start my journey in Deep Learning from scratch. ...


By the way, if you are interested to hear. I used the HMM classification, as it was in your course (95% of the script, I had little adjustments there), for the Customer-Care department in a big known fintech company. to predict who will call them, so they can call him before the rush hours, and improve the service. Instead of a poem, I Had a sequence of the last 24 hours' events that the customer had, like: "Loaded money", "Usage in the food service", "Entering the app", "Trying to change the password", etc... the label was called or didn't call. The outcome was great. They use it for their VIP customers. Our data science department and I got a lot of praise.



2 Lectures · 13min

Getting Setup

3 Lectures · 17min
  1. Where to get the code and data - instant access (01:42)
  2. How to use Github & Extra Coding Tips (Optional) (11:12)
  3. Are You Beginner, Intermediate, or Advanced? All are OK! (05:01)

Beginner's Corner

21 Lectures · 03hr 28min
  1. Beginner's Corner Section Introduction (10:14)
  2. From RNNs to Attention and Transformers - Intuition (17:01)
  3. Sentiment Analysis (10:32)
  4. Sentiment Analysis in Python (17:00)
  5. Text Generation (10:47)
  6. Text Generation in Python (11:47)
  7. Masked Language Modeling (Article Spinner) (11:37)
  8. Masked Language Modeling (Article Spinner) in Python (08:26)
  9. Named Entity Recognition (NER) (04:53)
  10. Named Entity Recognition (NER) in Python (09:49)
  11. Text Summarization (05:15)
  12. Text Summarization in Python (07:00)
  13. Neural Machine Translation (06:18)
  14. Neural Machine Translation in Python (09:50)
  15. Question Answering (07:20)
  16. Question Answering in Python (06:14)
  17. Zero-Shot Classification (05:30)
  18. Zero-Shot Classification in Python (13:47)
  19. Beginner's Corner Section Summary (04:53)
  20. Beginner Q&A: Can We Use GPT-4 For Everything? (27:13)
  21. Suggestion Box (03:10)

Fine-Tuning (Intermediate)

14 Lectures · 02hr 27min
  1. Fine-Tuning Section Introduction (04:30)
  2. Text Preprocessing and Tokenization Review (13:35)
  3. Models and Tokenizers (15:22)
  4. Models and Tokenizers in Python (13:16)
  5. Transfer Learning & Fine-Tuning (pt 1) (09:29)
  6. Transfer Learning & Fine-Tuning (pt 2) (10:37)
  7. Transfer Learning & Fine-Tuning (pt 3) (10:08)
  8. Fine-Tuning Sentiment Analysis and the GLUE Benchmark (12:22)
  9. Fine-Tuning Sentiment Analysis in Python (19:36)
  10. Fine-Tuning Transformers with Custom Dataset (15:04)
  11. Hugging Face AutoConfig (05:45)
  12. Fine-Tuning with Multiple Inputs (Textual Entailment) (07:16)
  13. Fine-Tuning Transformers with Multiple Inputs in Python (07:36)
  14. Fine-Tuning Section Summary (03:13)

Named Entity Recognition (NER) and POS Tagging (Intermediate)

15 Lectures · 01hr 34min
  1. Token Classification Section Introduction (06:58)
  2. Data & Tokenizer (Code Preparation) (05:04)
  3. Data & Tokenizer (Code) (07:45)
  4. Target Alignment (Code Preparation) (09:57)
  5. Create Tokenized Dataset (Code Preparation) (03:46)
  6. Target Alignment (Code) (10:09)
  7. Data Collator (Code Preparation) (03:42)
  8. Data Collator (Code) (03:15)
  9. Metrics (Code Preparation) (06:47)
  10. Metrics (Code) (05:40)
  11. Model and Trainer (Code Preparation) (02:26)
  12. Model and Trainer (Code) (03:27)
  13. POS Tagging & Custom Datasets (Exercise Prompt) (05:18)
  14. POS Tagging & Custom Datasets (Solution) (18:16)
  15. Token Classification Section Summary (02:02)

Seq2Seq and Neural Machine Translation (Intermediate)

12 Lectures · 01hr 07min
  1. Translation Section Introduction (04:34)
  2. Data & Tokenizer (Code Preparation) (05:35)
  3. Things Move Fast (01:48)
  4. Data & Tokenizer (Code) (06:16)
  5. Aside: Seq2Seq Basics (Optional) (10:39)
  6. Model Inputs (Code Preparation) (08:15)
  7. Model Inputs (Code) (08:05)
  8. Translation Metrics (BLEU Score & BERT Score) (Code Preparation) (03:52)
  9. Translation Metrics (BLEU Score & BERT Score) (Code) (05:43)
  10. Train & Evaluate (Code Preparation) (04:34)
  11. Train & Evaluate (Code) (05:00)
  12. Translation Section Summary (02:39)

Question-Answering (Advanced)

18 Lectures · 02hr 34min
  1. Question-Answering Section Introduction (04:50)
  2. Exploring the Dataset (SQuAD) (04:20)
  3. Exploring the Dataset (SQuAD) in Python (05:05)
  4. Using the Tokenizer (08:30)
  5. Using the Tokenizer in Python (11:55)
  6. Aligning the Targets (14:53)
  7. Aligning the Targets in Python (16:15)
  8. Applying the Tokenizer (09:56)
  9. Applying the Tokenizer in Python (10:39)
  10. Question-Answering Metrics (03:46)
  11. Question-Answering Metrics in Python (02:41)
  12. From Logits to Answers (21:23)
  13. From Logits to Answers in Python (16:14)
  14. Computing Metrics (05:30)
  15. Computing Metrics in Python (05:49)
  16. Train and Evaluate (02:53)
  17. Train and Evaluate in Python (05:45)
  18. Question-Answering Section Summary (04:00)

Transformers and Attention Theory (Advanced)

18 Lectures · 02hr 06min
  1. Theory Section Introduction (05:06)
  2. Basic Self-Attention (09:35)
  3. Self-Attention & Scaled Dot-Product Attention (18:02)
  4. Attention Efficiency (04:36)
  5. Attention Mask (03:56)
  6. Multi-Head Attention (07:13)
  7. Transformer Block (06:45)
  8. Positional Encodings (07:16)
  9. Encoder Architecture (06:23)
  10. Decoder Architecture (10:58)
  11. Encoder-Decoder Architecture (08:31)
  12. BERT (04:52)
  13. GPT (06:45)
  14. GPT-2 (06:30)
  15. GPT-3 (05:14)
  16. ChatGPT (06:33)
  17. GPT-4 (03:00)
  18. Theory Section Summary (04:50)

Implement Transformers From Scratch (Advanced)

14 Lectures · 02hr 06min
  1. Implementation Section Introduction (05:57)
  2. Encoder Implementation Plan & Outline (06:11)
  3. How to Implement Multihead Attention From Scratch (12:33)
  4. How to Implement the Transformer Block From Scratch (02:15)
  5. How to Implement Positional Encoding From Scratch (05:14)
  6. How to Implement Transformer Encoder From Scratch (04:38)
  7. Train and Evaluate Encoder From Scratch (13:32)
  8. How to Implement Causal Self-Attention From Scratch (05:08)
  9. How to Implement a Transformer Decoder (GPT) From Scratch (04:40)
  10. How to Train a Causal Language Model From Scratch (18:55)
  11. Implement a Seq2Seq Transformer From Scratch for Language Translation (pt 1) (12:30)
  12. Implement a Seq2Seq Transformer From Scratch for Language Translation (pt 2) (16:15)
  13. Implement a Seq2Seq Transformer From Scratch for Language Translation (pt 3) (16:45)
  14. Implementation Section Summary (01:40)

Setting Up Your Environment (Appendix/FAQ by Student Request)

2 Lectures · 37min
  1. Anaconda Environment Setup (20:21)
  2. How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow (17:33)

Extra Help With Python Coding for Beginners (Appendix/FAQ by Student Request)

3 Lectures · 37min
  1. How to Code Yourself (part 1) (15:55)
  2. How to Code Yourself (part 2) (09:24)
  3. Proof that using Jupyter Notebook is the same as not using it (12:29)

Effective Learning Strategies for Machine Learning (Appendix/FAQ by Student Request)

4 Lectures · 59min
  1. How to Succeed in this Course (Long Version) (10:25)
  2. Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced? (22:05)
  3. What order should I take your courses in? (part 1) (11:19)
  4. What order should I take your courses in? (part 2) (16:07)

Appendix / FAQ Finale

2 Lectures · 08min
  1. What is the Appendix? (02:48)
  2. Where to get discount coupons and FREE deep learning material (05:31)
This website is using cookies. That's Fine