This data was taken from 734 crowdsourced games of Space Invaders where crowdsourcers provided real-time feedback on the performance of a DQN-based AI. Votes were binary, signaling either a good moves or bad moves, and filtered by taking the median where there was consensus by at least two crowdsourcers within one second. This turned out to yield negative rewards where the spaceship dies and is useful not only because deaths are not present in the original Space Invaders reward signal, but because this technique (voting on binary rewards) can be scaled to arbitrarily sophisticated AI's, is algorithm agnostic, and is tractably crowdsourceable due to the simplicity with which humans can be asked to provide such signals. The application and interface used to obtain the reward votes can be seen here.
[
{
"action": "FIRE" | "MOVE_RIGHT_AND_FIRE" | "MOVE_LEFT_AND_FIRE",
"action_number": 1 | 11 | 12;
"game_over": true | false,
"reward": -1 | 0 | 5 | 10 | 15 | 20 | 25 | 30 | 100 | 200,
"screen_hex" < full screen hex from ALE (non run-length encoded - NTSC colors) >
},
...
]
-1
for deaths and the regular
Space Invaders reward otherwise.pip install snappy
import json
import snappy
with open('/your-download-path/episode_000.json.snappy', 'r') as file_ref:
json_str = snappy.decompress(file_ref.read())
frames = json.loads(json_str)