[← Previous review][Next review →]
Ramstedt, S., Bouteiller, Y., Beltrame, G., Pal, C., & Binas, J. (2020). Reinforcement Learning with Random Delays. arXiv preprint arXiv:2010.02966.
Delays between action and reward are common, and are a central problem in RL. Even in the real world: an action can produce a reward either immediately (e.g., negative rewards for pain that come immediately after a fall), or with a very long delay (doing well in school gets you a job away from financial trouble). Obviously, the whole intermediate spectrum is covered: an action can produce rewards arbitrarily distant in time). Conversely, a reward…
Wauthier, S. T., Mazzaglia, P., Çatal, O., De Boom, C., Verbelen, T., & Dhoedt, B. (2021). A learning gap between neuroscience and reinforcement learning. arXiv preprint arXiv:2104.10995.
Here are two possible configurations. The reward is represented by the red circle. In the first configuration, the reward is on the left, in the second it is on the right.
Kalashnikov, D., Varley, J., Chebotar, Y., Swanson, B., Jonschkowski, R., Finn, C., … & Hausman, K. (2021). MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale. arXiv preprint arXiv:2104.08212.
Intelligent robots: an inexhaustible source of inspiration for science fiction, and a rich field of study for reinforcement learning. To be truly “intelligent”, a robot must master a large repertoire of skills (we call it a generalist robot). There are many robots that have successfully learned 1 task using reinforcement learning. These robots share one problem, however: learning requires a lot of training. …
Leblond, R., Alayrac, J., Sifre, L., Pislar, M., Lespiau, J., Antonoglou, I., Simonyan, K., & Vinyals, O. (2021). Machine Translation Decoding beyond Beam Search. arXiv preprint arXiv:2104.05336.
Let’s talk about machine translation. BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of a text that has been translated by a machine from one language to another. The closer a machine translation is to a professional human translation, the better the evaluation. The best results are obtained with beam search. This is a heuristic search algorithm that explores a graph by considering only a…
To the memory of Andréas.
Hamrick, J. B., Friesen, A. L., Behbahani, F., Guez, A., Viola, F., Witherspoon, S., … & Weber, T. (2020). On the role of planning in model-based deep reinforcement learning. arXiv preprint arXiv:2011.04021.
What is the contribution of planning in reinforcement learning? It’s hard to know: it is part of many very powerful algorithms like MuZero. But to what extent is this planning phase necessary for good learning results? This is the question that the authors of this publication try to answer. …
Raposo, D., Ritter, S., Santoro, A., Wayne, G., Weber, T., Botvinick, M., van Hasselt H. & Song, F. (2021). Synthetic Returns for Long-Term Credit Assignment. arXiv preprint arXiv:2102.12425.
Good actions produce high rewards. Besides, the principle of causality tells us that the cause always precedes the effect. Put that together: a good action is associated with a high reward in the future. A logician would answer: is the reciprocal true? Does a high reward imply that all preceding actions are good? No. And it is on this false assertion that all RL algorithms are based…
Eysenbach, B., Levine, S., & Salakhutdinov, R. (2021). Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification. arXiv preprint arXiv:2103.12656.
The core of reinforcement learning is the reward function: how well the agent does. In some cases, this reward function is easy to describe: in video games: let’s take the score. In other cases, it is not easy to give a reward function. Let’s take the example of the article: closing a drawer. …
[Paper] — Panerati J. and al.
It is quite common to see robotic environments containing robotic arms or navigation robots. But have you ever tried your learning algorithms on drones? That’s what the authors of this paper propose: an open-source OpenAI gym environment based on PyBullet for different tasks involving one or more quadricopters.
[Paper] in Nature —Ecoffet A., Huizinga J. and al.
Let’s start with a very important article, recently published in Nature. The authors tackle a fundamental problem in reinforcement learning: exploration.
They suggest that one of the difficulties to achieve an efficient exploration is the difficulty to return to an interesting state. Example: an agent manages to reach Mario’s advanced state, and thus gets a high reward. Nevertheless, in the next episode of training, this agent is not able to return to this state to continue the exploration of the game. …
Here is the very minimal directory you need to create a package.
│ └── __init__.py
Let’s see what’s in each of the files.
It is the main file, where you’ll put all your functions, class, objects… Here is an example :
And that’s it !
This file is used for installation. Here is an minimal example:
from setuptools import setup, find_packages
find_packages() is used so that, during installation, all the packages necessary for the proper functioning of your package are also installed.
Open a terminal, and…