- 1 Is my usage of AWS DeepRacer Console covered by Free Usage Tier?
- 2 Can I stop the training and continue later?
- 3 How much training is enough to complete a lap?
- 4 When does the episode end?
- 5 Why is my car having a break in the training? Why does it leave the track and wander off?
- 6 Is there a one and only reliable way of training the car or is it more about trying solutions based on the gut feeling and checking the outcomes?
- 7 I received a NaN in training. What to do?
- 8 What tracks can I train on?
- 9 Action Space
- 10 Reward function
- 11 Hyperparameters
- 12 Evaluation
Is my usage of AWS DeepRacer Console covered by Free Usage Tier?
It depends. In a new free tier account approximately $30 equivalent of training is covered and gets refunded when the bill is prepared in the first days of the following month. That sum is credits dedicated to using SageMaker and is a one-time offer. That said rules can change in the upcoming season.
Costs can rise quickly when you're having fun. Be careful. Use Billing Dashboard and Cost Explorer(you need to enable it first) to monitor the costs. Set billing alarms. DeepRacer can suck you in and there have been reports of many bills over $500 of people who did not know or think about it.
Consider using a local training to cut costs, but remember it also cuts simplicity and convenience.
Can I stop the training and continue later?
Yes you can. This is a common scenario, supported in every training environment (AWS DeepRacer Console, AWS SageMaker Notebook, local training). It's called cloning. Refer to docs for your environment to learn how to do it.
While you do so, you can change a couple things:
- reward function
- action space values (but it's a risky area, you may improve, you may break things big time) - but never change the amount of actions!
How much training is enough to complete a lap?
There isn't one answer to this question. The factors that can influence the training time include:
- size of the action space and what actions it contains - whether they can give a complete lap and whether there aren't too many actions in there
- minimum/maximum speed in the training - if it's too high, the car can fail to get around the turns
- complexity of the reward function - too many elements and it will take longer to find the right behaviour (and more difficult), not enough and the car may learn to perform suboptimal actions
- strategy in the reward function - are you rewarding behaviour or state, or maybe something else?
- correctness of the reward function - is your reward actually rewarding completion of a lap?
- strategy in the hyperparameters
- correctness of the hyperparameters
- size of the neural network
- amount of sensors used in the car
- setup of the training environment - do you have any tweaks in your environment? Are they aimed at building robust models or rather having a faster training?
- starting point for the training - are you starting with a fresh model or cloning an existing one? If cloning, how generic is it?
When does the episode end?
There are a couple conditions that can stop the episode:
- not a single wheel is touching the tarmac (this is the definition of off-track); It is not fully confirmed yet but it might be that beginning from the 2020 version of training environment at least one wheel touching the white line means the car is still on track.
- the car makes no progress (stays in place) for a bit
- the car completes the track
- the car takes 10000 steps and doesn't reach the finish line. There are 15 steps per second so that would mean about 10 minutes
The car will not just stop if you don't have an action with zero speed value. That said it can get blocked. In the New York City track there was a bridge that could block the car - this would normally stop the training episode. But under a specific angle the car would not go off the track, would not stop completely either. It would make tiny progress that would get it stuck for a while.
In the AWS DeepRacer Console after the 2019 re:Invent the car doesn't get reset immediately after leaving the track. This might be because there is some new communication mechanism between rl_coach simulation and gazebo.
Why is my car having a break in the training? Why does it leave the track and wander off?
After the car completes a set of episodes for the iteration, SageMaker starts learning and the car's behaviour isn't always deterministic. It might stand on the track and wait or slowly drive off track or along it. It may be sometimes funny or confusing like driving into water or falling off the edge of the track.
Once the learning is complete and all the outcome files are in place, the car picks up the training again.
Is there a one and only reliable way of training the car or is it more about trying solutions based on the gut feeling and checking the outcomes?
Some people share their strategies to training - you can follow them. Some have it more organised, some rely more on trying things out and learning. With local training it is much easier (cheaper) to just try things and reach interesting conclusions. Many of them will fail to give outcomes you desire. Based on your own findings you can find your own training strategy. I recommend getting familiar with logs analysis to be able to measure progress in training.
I received a NaN in training. What to do?
NaN is an error where an arithmetic operation makes it impossible for the computer to get an outcome value. During the model training there can be a division by zero in some unfortunate conditions. AWS engineers and scientists have improved the tools to solve the issue but there have still been some reports in the community.
There are ways to delay a NaN:
- move a couple checkpoints back
- decrease learning rate
- change track
- change reward function
Sometimes they work, sometimes they don't.
If you train in a local training environment, check the Local Training Troubleshooting on how to remove the .finished files that will block you from training again.
What tracks can I train on?
You have many options for training on a track. Currently AWS Console offers 12 different tracks. The AWS SageMaker Notebook for DeepRacer lets you use some track that are not available in the console on top of the 12 mentioned As far local training is concerned, there are all tracks that were ever released. Some of them might not be working. New tracks are added as they become available, there is a manual from RichardFan on how to get the tracks from AWS stack.
Some models have speeds 1-4m/s, some have 1-12m/s, what are the speeds allowed?
After the AWS re:Invent 2019 releases the action space values are 1-4m/s and reflect the reality more. The code however is backwards compatible, both in the console and on the car. It detects it based on the additional values in the model-metadata.json outside the action space.
Remember that the speeds are not applied discretely - the car will accelerate (or slow down) towards a given value.
In the early versions of DeepRacer (before re:Invent 2019) there was some mix-up in the code for speed which meant the speeds were three times lower than in reality. It also caused some confusion in understanding what the value represented. Luckily it is resolved now. Treat old speeds as new speeds times 3.5, as the source code shows
Does the action space need to be evenly spaced and all combinations of speed and turns?
No, it does not. AWS DeepRacer Console has the main goal of making it as easy as possible to start training, so it comes with certain simplifications and limitations. In local training values are being set directly in files which gives more power.
The action space can have one action (well, it might not learn very well) or many. If you have too many actions, it will be difficult for the model to converge within a reasonable time.
Can I submit models with custom action-space?
Yes you can.
In local training the
in your bucket/custom_files folder is your buddy, set your values and make sure you set correct indices (see below). Do not change the number of actions in between training. Use caution when changing the values for actions. In the console it requires a bit more fiddling. Start a training for a minimum time (a minute?) with an action space that has as many actions as you want, then go to your DeepRacer bucket, to model-metadata folder (at the bottom), to folder for a given training, download the
, alter values, upload and clone the model. Going forward it will use those values. The reason for a short training is making sure the model doesn't get trained for the initial actions so that you don't have to train it away. When you alter the
file, make sure you number the indices properly, they need to be numbered from 0 to N-1 where N is the count of actions.
The actions don't have to be specifically aligned (left to right or slow to fast). There isn't a clear answer on what would happen if action values were duplicated.
How do I reward/penalize terminal states (off-track/lap complete)?
There are parameters like all wheels on track and distance from the centre. That said the training code detects if at least one wheel is touching the tarmac. If not, it will stop the training episode - the hidden reward value for such step is 0.0001. It is a penalty on its own for not completing the lap.
The value 0.0001 has some interesting consequences - if you constantly grant your car rewards below that value, especially negative ones, it will learn that the best reward is given for getting off the track.
Complete lap can be detected using progress: progress equal 100.0 means complete lap.
Read the docs to learn more about parameters available in the reward function.
Why the car learns to complete the lap with reward
This is a valid reward. Your car will get best rewards for completing a track making the most steps possible (lowest velocity, greatest distance). It's pretty much same as rewarding higher speeds - the car will go fast, but will try the greatest distance. If you go too fast, it might learn to spin and go back and forth to gather even more.
Is the reward function run on a real DeepRacer car?
No, it is not. Reward function is used in the virtual environment during the simulation of the training. The outcome of a training is a model that can be loaded onto a car and then perform inference which picks an action based on the camera input. It can also be submitted for evaluation or a virtual race.
That said the submission to evaluation/race has logs that are built the same way, which means the field that normally holds the reward values has something written to it. It appears that those values reflect distance from the centre of the track and are not used for anything.
How complex should a reward function be?
If you put sophisticated things in your reward function, or too many factors, the car may get confused. If you put too few, the car may not progress. In many cases the training may succeed with regards to reward, but not with the performance that you want. Think of it as a mean 13-year-old that always complies with your rules but only in a way that will annoy you the most.
We've had good players that would come and nail it using a very simple function. It's doable but may take longer or other strategies included.
Whichever you go for, verify progress and reevaluate.
How to adjust the hyperparameters for the initial training and how to progress them?
If you don't really know what the hyperparameters represent, starting with default values may be the best option. Then with time, on subsequent cloning operations, reduce the learning rate if you notice that your reward function draws a zigzag on the graph.
Some people lower the learning rate to 0.0001 and then drop even further. Some start with default 0.0003 and lower it from that as a starting point.
The official documentation say that the discount rate 0.9 means reward is analysed 10 steps ahead, 0.99 - 100 steps and 0.999 - 1000 steps. If one tries to conclude a formula from that, the number of steps(s) for discount rate(dr) is result of a formula s=1/(1-dr). So dr=1-1/s. Then 0.9875 would mean 80 steps - but it doesn't appear to be that obvious, especially since even for 0.999 there is some fading of values applied.
What will happen if my hyperparameters go wrong?
If you for instance set a low discount factor or high entropy, your training may go wrong and the reward function will drop and go flat. Try avoiding such cases, either start your training over from the previous pretrained state or fetch the last not broken checkpoint from RoboMaker and use that as the starting point.
"Unable to finish 1 lap" - what does it mean?
It may be a good thing. You've successfully submitted a model, it got loaded and evaluated. Now make sure you get around the track. Don't stop with one attempt - keep submitting every 30 minutes - this is how often you can do it. Each submission is 5 lap attempts.
Check out AWS DeepRacer Autosubmitter by Cahya Wirawan.