Reinforcement Learning--Deep Q Network[DQN]
DQN的全称是Deep Q Network,其中的Q就是指Q-Learning。从名字上就能看出,这是一种把Q-Learning和DNN[Deep Neural Network]结合起来的模型构架。
DQN的全称是Deep Q Network,其中的Q就是指Q-Learning。从名字上就能看出,这是一种把Q-Learning和DNN[Deep Neural Network]结合起来的模型构架。
与MC一样,时序差分学习[Temporal-Difference Learning,TD]也从Episode学习,不需要了解模型本身,而是主动的是对环境做试验来得到相应”经验”。与MC不同的是,它通过学习不完整的Episode和自身的引导[bootstrapping],猜测Episode的结果,同时持续更新这个猜测。也就是说,TD方法不要求$不完整的Episode$,所以可以走一步就估算一次,也就是进行Step-Based的强化学习。因为可以克服诸多限制,TD方法是强化学习理论中最核心的内容,是强化学习领域最重要的成果。
蒙特卡洛强化学习[Monte-Carlo Reinforcement Learning,MC]是强化学习中的经典方法,应用于model-free的场景中,并且可以得到相对好的结果。
本年介绍了作为强化学习基础的Markov Decision Process,可以帮助理解比较重要但容易忽略的知识。
Abstract: Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.
BlinkDB[1] uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy or response time requirements.
Reinforcement Learning发展至今日已无法进行非常明确而清晰的分类,很多新方法也是博采众长。不过,为了更好的了解强化学习算法主要学习什么以及怎么学习,本文尝试从不同角度对主要的RL算法进行大致分类。
强化学习[Reinforcement Learning]是一个独立的机器学习研究领域,主要任务是根据不同的环境通过训练机器学习的模型来进行一系列决策。