This topic is well covered elsewhere.
"Your Logits are shit"
- some wizard mumbled that once, and it's been a motivating mantra in head ever since.
You know those diagrams of circles and lines? Those really are how neural nets work! Each circle and each line has its own floating point number. Each line is a number called a "weight" (which might be negative) Each circle holds one number, and has another number added to it which is the "bias."
You start with some numbers in the first row (the entry data) and each number gets multiplied by the line (the weight) it follows. In the next row, all the incoming weights are added together, plus the bias. Then some simple function is applied, for example ensure that the result is at least zero. Now that circle has its own number and the process moves forward to the next row.
The last row of circles is the output, called the "logits." For a project like this, each logit corresponds to one specific action, and whichever has the highest value is the final choice. For example, if you're training a digit recognizer, it could have 10 output circles (10 logits) one for each digit 0-9. After the net runs, if the circle for '7' has 9.3 and the circle for '1' has 3.1 (with small numbers for the rest), your net is pretty sure the digit is a '7' but thinks there's a chance it could be a '1'. Often these logits are then passed through a normalizing or softmax function that converts them into probabilities adding up to 100%, so you can say "74% confident this is a 7".