Toggle navigation sidebar
Toggle in-page Table of Contents
심층강화학습
심층강화학습 (Deep Reinforcement Learnings)
Markov Decision Process
1. 순차적 의사 결정 문제, 에이전트, 환경
2. Markov Decision Process (MDP)
3. 정책, Return, 가치 함수
4. 벨만 방정식: 가치 함수의 재귀적 성질
5. 가치 함수 근사하기: Stochastic approximation
Policy gradient methods
6. Policy Gradient Theorem
7. REINFORCE
8. REINFORCE 구현
9. REINFORCE with baseline
10. Actor-critic 알고리즘
11. Online/batch actor-critic 구현
12.
\(n\)
-step return actor-critic
13. Generalized Advantage Estimation (GAE)
14. Trust Region Policy Optimization (TRPO)
15. TRPO 구현
16. Proximal Policy Optimization (PPO)
17. PPO 구현
18. Soft Actor-Critic (SAC)
19. SAC 구현
Advanced topics
20. Unsupervised RL
참고문헌
참고문헌
repository
open issue
Index