Every Monday, I present 4 publications from my research area. Let’s discuss them!

Image by the author

[← Previous review][Next review →]

Paper 1: Reinforcement Learning with Random Delays

Ramstedt, S., Bouteiller, Y., Beltrame, G., Pal, C., & Binas, J. (2020). Reinforcement Learning with Random Delays. .

Delays between action and reward are common, and are a central problem in RL. Even in the real world: an action can produce a reward either immediately (e.g., negative rewards for pain that come immediately after a fall), or with a very long delay (doing well in school gets you a job away from financial trouble). Obviously, the whole intermediate spectrum is covered: an action can produce rewards arbitrarily distant in time). Conversely, a reward…


Every Monday, I present 4 publications from my research area. Let’s discuss them!

Image by the author

[← Previous review][Next review →]

Paper 1: A learning gap between neuroscience and reinforcement learning

Wauthier, S. T., Mazzaglia, P., Çatal, O., De Boom, C., Verbelen, T., & Dhoedt, B. (2021). A learning gap between neuroscience and reinforcement learning. .

Here are two possible configurations. The reward is represented by the red circle. In the first configuration, the reward is on the left, in the second it is on the right.


Every Monday, I present 4 publications from my research area. Let’s discuss them!

Image by the author

[← Previous review][Next review →]

Paper 1: MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale

Kalashnikov, D., Varley, J., Chebotar, Y., Swanson, B., Jonschkowski, R., Finn, C., … & Hausman, K. (2021). MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale. .

Intelligent robots: an inexhaustible source of inspiration for science fiction, and a rich field of study for reinforcement learning. To be truly “intelligent”, a robot must master a large repertoire of skills (we call it a generalist robot). There are many robots that have successfully learned 1 task using reinforcement learning. These robots share one problem, however: learning requires a lot of training. …


Every Monday, I present 4 publications from my research area. Let’s discuss them!

Image by the author

[← Previous review][Next review →]

Paper 1: Machine Translation Decoding beyond Beam Search

Leblond, R., Alayrac, J., Sifre, L., Pislar, M., Lespiau, J., Antonoglou, I., Simonyan, K., & Vinyals, O. (2021). Machine Translation Decoding beyond Beam Search.

Let’s talk about machine translation. BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of a text that has been translated by a machine from one language to another. The closer a machine translation is to a professional human translation, the better the evaluation. The best results are obtained with beam search. This is a heuristic search algorithm that explores a graph by considering only a…


Every Monday, I present 4 publications from my research area. Let’s discuss them!

Image by author

[← Previous review][Next review →]

To the memory of Andréas.

Paper 1: On the role of planning in model-based deep reinforcement learning

Hamrick, J. B., Friesen, A. L., Behbahani, F., Guez, A., Viola, F., Witherspoon, S., … & Weber, T. (2020). On the role of planning in model-based deep reinforcement learning. .

What is the contribution of planning in reinforcement learning? It’s hard to know: it is part of many very powerful algorithms like MuZero. But to what extent is this planning phase necessary for good learning results? This is the question that the authors of this publication try to answer. …


Every Monday, I present 4 publications from my research area. Let’s discuss them!

[← Previous review][Next review →]

Paper 1: Synthetic Returns for Long-Term Credit Assignment

Raposo, D., Ritter, S., Santoro, A., Wayne, G., Weber, T., Botvinick, M., van Hasselt H. & Song, F. (2021). Synthetic Returns for Long-Term Credit Assignment. .

Good actions produce high rewards. Besides, the principle of causality tells us that the cause always precedes the effect. Put that together: a good action is associated with a high reward in the future. A logician would answer: is the reciprocal true? Does a high reward imply that all preceding actions are good? No. And it is on this false assertion that all RL algorithms are based…


Every Monday, I present 4 publications from my research area. Let’s discuss them!

[← Previous review][Next review →]

Paper 1: Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Eysenbach, B., Levine, S., & Salakhutdinov, R. (2021). Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification. .

The core of reinforcement learning is the reward function: how well the agent does. In some cases, this reward function is easy to describe: in video games: let’s take the score. In other cases, it is not easy to give a reward function. Let’s take the example of the article: closing a drawer. …


Every Monday, I present 4 publications from my research area. Let’s discuss them!

[← Previous review][Next review →]

Paper 1: Learning to Fly — a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control

[Paper] Panerati J. and al.

It is quite common to see robotic environments containing robotic arms or navigation robots. But have you ever tried your learning algorithms on drones? That’s what the authors of this paper propose: an open-source OpenAI gym environment based on PyBullet for different tasks involving one or more quadricopters.


Every Monday, I present 4 publications from my research area. Let’s discuss them!

[Next review →]

Paper 1: First return, then explore

[Paper] in Ecoffet A., Huizinga J. and al.

Let’s start with a very important article, recently published in Nature. The authors tackle a fundamental problem in reinforcement learning: exploration.
They suggest that one of the difficulties to achieve an efficient exploration is the difficulty to return to an interesting state. : an agent manages to reach Mario’s advanced state, and thus gets a high reward. Nevertheless, in the next episode of training, this agent is not able to return to this state to continue the exploration of the game. …


Photo by NordWood Themes on Unsplash

Create the directory for your package

Here is the very minimal directory you need to create a package.

my_package
├── my_package
│ └── __init__.py
└── setup.py

Let’s see what’s in each of the files.

__init__.py

It is the main file, where you’ll put all your functions, class, objects… Here is an example :

def my_function():
print("Hello world!")

And that’s it !

setup.py

This file is used for installation. Here is an minimal example:

from setuptools import setup, find_packages
setup(
name="my_package",
version="0.1",
packages=find_packages(),
)

find_packages() is used so that, during installation, all the packages necessary for the proper functioning of your package are also installed.

Let’s test it

Open a terminal, and…

Quentin Gallouédec

PhD student in machine learning. Engineer from Ecole Centrale de Lyon, France. https://qgallouedec.github.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store