A Libertine of Computer Science

Reinforcement Learning--Temporal-Difference

与MC一样,时序差分学习[Temporal-Difference Learning,TD]也从Episode学习,不需要了解模型本身,而是主动的是对环境做试验来得到相应”经验”。与MC不同的是,它通过学习不完整的Episode和自身的引导[bootstrapping],猜测Episode的结果,同时持续更新这个猜测。也就是说,TD方法不要求$不完整的Episode$,所以可以走一步就估算一次,也就是进行Step-Based的强化学习。因为可以克服诸多限制,TD方法是强化学习理论中最核心的内容,是强化学习领域最重要的成果。

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

Abstract: Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB[1] uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy or response time requirements.

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]