Dandelions

A game of seeds and wind from "Math Games with Bad Drawings" by Ben Orlin

Watch AIs Play

Neural Net vs Neural Net

Watch them battle

7 seed models | 16 wind models available

GitHub

The Board is a 5x5 meadow. Each player makes SEVEN moves, dandelions first.

Dandelions win if the entire board is covered in dandelions and seeds.

Wind wins if any square is uncovered after seven moves.

Each turn: Dandelion plants one flower, then the Wind blows.

The gust of wind spreads seeds from each flower in the direction the wind blew. Any empty square in that direction is covered with a new seed. Dandelions can place flowers on top of seeds, but only flowers will spread new seeds.

There are eight compass directions the wind can blow, but each direction can only be used ONCE per game, the eighth will be unused. Choose carefully!

It's simpler than it sounds! Just try one round and you'll see.

This topic is well covered elsewhere.

"Your Logits are shit"
- some wizard mumbled that once, and it's been a motivating mantra in head ever since.

You know those diagrams of circles and lines? Those really are how neural nets work! Each circle and each line has its own floating point number. Each line is a number called a "weight" (which might be negative) Each circle holds one number, and has another number added to it which is the "bias."

You start with some numbers in the first row (the entry data) and each number gets multiplied by the line (the weight) it follows. In the next row, all the incoming weights are added together, plus the bias. Then some simple function is applied, for example ensure that the result is at least zero. Now that circle has its own number and the process moves forward to the next row.

The last row of circles is the output, called the "logits." For a project like this, each logit corresponds to one specific action, and whichever has the highest value is the final choice. For example, if you're training a digit recognizer, it could have 10 output circles (10 logits) one for each digit 0-9. After the net runs, if the circle for '7' has 9.3 and the circle for '1' has 3.1 (with small numbers for the rest), your net is pretty sure the digit is a '7' but thinks there's a chance it could be a '1'. Often these logits are then passed through a normalizing or softmax function that converts them into probabilities adding up to 100%, so you can say "74% confident this is a 7".

It's cool. I want to write more, but this topic is well covered elsewhere by better communicators than I

I will say that the Bellman Equation is magic! At any intermediate step (like being in the middle of a maze) Deep Q Learning with the Bellman equation (given enough runs) can pick the optimal move for that middle state, even if it doesn't know the optimal end state! In pure Bellman, there is no need to define small rewards for the middle states (like small bonus for moving forward), just the big juicy rewards at the end state, (such as win or lose) and they will propagate backwards like magic.

Coming soon

Input rows (board state)
Output rows (which move, diff for 2 brains) (had to train 2 diff brains)
Brief bits about what the hyperparameters are, if I get to it, maybe other design decisions.

Training vs random play to bootstrap. Training model vs model, but deterministic, so added temperature. Logits scaled by temperature before softmax, then pick weighted sample.

Coming soon? Maybe? Try tweaking different hyperparameters and see how it does!

Meanwhile, you can fork the code on GitHub and play around with it yourself!

Dandelions

Play The Game

Watch AIs Play

How to Play

About Reinforcement Learning

About Deep Q Learning

About Deep Q Learning for Dandelions in particular

Train Your Own Model