課程目錄: 基于樣本的學習方法培訓
4401 人關注
(78637/99817)
課程大綱:

    基于樣本的學習方法培訓

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 中文字幕制服丝袜| 国内精品videofree720| 日韩毛片在线免费观看| 窝窝午夜看片国产精品人体宴| 中文国产在线观看| 亚洲AV永久无码天堂网| 女人被弄到高潮的免费视频| 黑人又大又硬又粗再深一点| chinese体育生gayxxxxhd| 五月婷婷伊人网| 亚洲欧美日韩在线不卡| 免费欧洲美女牲交视频| 四虎精品视频在线永久免费观看| 日本欧美视频在线| 真实子伦视频不卡| 美女胸又www又黄网站| 鲁一鲁中文字幕久久| 亚洲资源最新版在线观看| 69久久夜色精品国产69小说| 乱人伦人妻精品一区二区| 亚洲欧美日韩精品久久亚洲区 | 收集最新中文国产中文字幕| 晚上睡不着正能量网址入口| 精品久久久无码中字| 美日韩一区二区三区| 羞差的漫画sss| 老汉色老汉首页a亚洲| 色噜噜狠狠色综合欧洲 | a毛片全部免费播放| xxxxwww免费| aaa毛片在线| 91青青草视频| 69av免费观看| 你懂的国产高清在线播放视频| 三上悠亚在线网站| 激情综合五月天| 99久久er这里只有精品18| 中文字幕www| wtfpass欧美极品angelica| chinese乱子伦xxxx国语对白| 99久久精品费精品国产一区二区 |