Reinforcement Learning--A3C
A3C模型[Asynchronous Advantage Actor-Critic,异步的优势演员-评论家模型],这也是深度强化学习中非常著名的模型。
A3C模型[Asynchronous Advantage Actor-Critic,异步的优势演员-评论家模型],这也是深度强化学习中非常著名的模型。
基于价值函数的方法就是通过计算每一个状态动作的价值[基于$V(s)$或者$Q(s,a)$],然后选择价值最大的策略执行。其实,这是一种比较间接的做法,因为我们最终要的是最优的策略。
DQN的全称是Deep Q Network,其中的Q就是指Q-Learning。从名字上就能看出,这是一种把Q-Learning和DNN[Deep Neural Network]结合起来的模型构架。
与MC一样,时序差分学习[Temporal-Difference Learning,TD]也从Episode学习,不需要了解模型本身,而是主动的是对环境做试验来得到相应”经验”。与MC不同的是,它通过学习不完整的Episode和自身的引导[bootstrapping],猜测Episode的结果,同时持续更新这个猜测。也就是说,TD方法不要求$不完整的Episode$,所以可以走一步就估算一次,也就是进行Step-Based的强化学习。因为可以克服诸多限制,TD方法是强化学习理论中最核心的内容,是强化学习领域最重要的成果。
蒙特卡洛强化学习[Monte-Carlo Reinforcement Learning,MC]是强化学习中的经典方法,应用于model-free的场景中,并且可以得到相对好的结果。
本年介绍了作为强化学习基础的Markov Decision Process,可以帮助理解比较重要但容易忽略的知识。
Abstract: Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.
BlinkDB[1] uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy or response time requirements.