Training the AWS DeepRacer

From AWS DeepRacer Community Wiki
Revision as of 14:44, 5 November 2019 by Tptak (talk | contribs) (Add rewards section)
Jump to navigation Jump to search

AWS DeepRacer has to be trained to get around the track. For this a technique called reinforcement learning is used.

AWS DeepRacer documentation about training

Reward Functions

The reward function describes immediate feedback (as a score for reward or penalty) when the vehicle takes an action to move from a given position on the track to a new position. Its purpose is to encourage the vehicle to make moves along the track to reach its destination quickly. The model training process will attempt to find a policy which maximizes the average total reward the vehicle experiences.

In this function, you write the brain of the car itself, that learns from rewarding itself for good behavior. This is part of the basics of Reinforcement learning. You encourage the car to behave a certain way by encouraging it with reward. This is similar to training a dog for example, where you may get your dog to sit or lay down by providing a treat. In the same way, the car is encouraged to drive fast on the track by getting reward every time it does something correct. As the developer, you're asked to define what behaviors the car is rewarded for.

For the DeepRacer, the reward function is formatted as a function with the input dictionary params that returns a float reward.

Parameters

The reward function input parameters (params) are passed in as a dictionary object, specifying a given state (params["x"], params["y"], params["all_wheels_on_track"], params["distance_from_center"], etc.) the agent is in and a given action (params["speed"] and params["steering"]) the agent takes. You manipulate one or more of the input parameters to create a customized reward function most appropriate for your solution.

all_wheels_on_track boolean A boolean flag to indicate if the vehicle is on-track or off-track. The vehicle is off-track (False) if all of its wheels are outside of the track borders. It's on-track (True) if any of the wheels is inside the two track borders.
x float Location in meters of the vehicle center along the x axis of the simulated environment containing the track. The origin is at the lower-left corner of the simulated environment.
y float Location in meters of the vehicle center along the y axis of the simulated environment containing the track. The origin is at the lower-left corner of the simulated environment.
distance_from_center float [0, ~track_width/2] Distance from the center of the track, in unit meters. The observable maximum displacement occurs when any of the agent's wheels is outside a track border and, depending on the width of the track border, can be slightly smaller or larger than half of track_width.
is_left_of_center boolean A Boolean flag to indicate if the vehicle is on the left side to the track center (True) or on the right side (False).
heading float (-180, 180] Heading direction in degrees of the vehicle with respect to the x-axis of the coordinate system.
progress float [0, 100] Percentage of the track complete.
steps integer Number of steps completed. One step is one (state, action, next state, reward tuple).
speed float [0.0, 8.0] The observed speed of the vehicle, in meters per second (m/s).
steering_angle float [-30, 30] Steering angle, in degrees, of the front wheels from the center line of the vehicle. The negative sign (-) means steering to the right and the positive (+) sign means steering to the left.
track_width float Track width in meters.
waypoints List of (float, float) An ordered list of milestones along the track center. Each milestone is described by a coordinate of (x, y). A list of waypoints for each track is found in the resources section
closest_waypoints (integer, integer) The zero-based indices of the two neighboring waypoints closest to the vehicle's current position of (x, y). The distance is measured by the Euclidean distance from the center of the vehicle.

Sample rewards

Pure Pursuit

Source

Adds an exponential speed component

  def reward_function(self, on_track, x, y, distance_from_center, car_orientation, progress, steps,
                        throttle, steering, track_width, waypoints, closest_waypoints):

        reward = 1e-3

        rabbit = [0,0]
        pointing = [0,0]

        # Reward when yaw (car_orientation) is pointed to the next waypoint IN FRONT.

        # Find nearest waypoint coordinates

        rabbit = [waypoints[closest_waypoints+1][0],waypoints[closest_waypoints+1][1]]

        radius = math.hypot(x - rabbit[0], y - rabbit[1])

        pointing[0] = x + (radius * math.cos(car_orientation))
        pointing[1] = y + (radius * math.sin(car_orientation))

        vector_delta = math.hypot(pointing[0] - rabbit[0], pointing[1] - rabbit[1])

        # Max distance for pointing away will be the radius * 2
        # Min distance means we are pointing directly at the next waypoint
        # We can setup a reward that is a ratio to this max.

        if vector_delta == 0:
            reward += 1
        else:
            reward += ( 1 - ( vector_delta / (radius * 2)))

        return reward

Center Line Square Root

Source

This is inspired by Pure Pursuit reward.

import math
def reward_function(params):
    '''
    Use square root for center line
    '''
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    speed = params['speed']
    progress = params['progress']
    all_wheels_on_track = params['all_wheels_on_track']
    SPEED_TRESHOLD = 6
    reward = 1 - (distance_from_center / (track_width/2))**(4)
    if reward < 0:
        reward = 0
    if speed > SPEED_TRESHOLD:
        reward *= 0.8
    if not (all_wheels_on_track):
        reward = 0
    if progress == 100:    
        reward += 100
    return float(reward)

Waypoint System

Source

Uses waypoints and lane preference to encourage a racing line

import math
def reward_function(params):

    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    steering = abs(params['steering_angle'])
    direction_stearing=params['steering_angle']
    speed = params['speed']
    steps = params['steps']
    progress = params['progress']
    all_wheels_on_track = params['all_wheels_on_track']
    ABS_STEERING_THRESHOLD = 15
    SPEED_TRESHOLD = 5
    TOTAL_NUM_STEPS = 85

    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']

    reward = 1.0

    if progress == 100:
        reward += 100

    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]
    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0]) 
    # Convert to degree
    track_direction = math.degrees(track_direction)
    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)
    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0

    malus=1

    if direction_diff > DIRECTION_THRESHOLD:
        malus=1-(direction_diff/50)
        if malus<0 or malus>1:
            malus = 0
        reward *= malus

    return reward

Self Motivator

Source

Simply encourage getting around the track in as few steps as possible

def reward_function(params):

    if params["all_wheels_on_track"] and params["steps"] > 0:
        reward = ((params["progress"] / params["steps"]) * 100) + (params["speed"]**2)
    else:
        reward = 0.01

    return float(reward)