To train the shortest-path solving GPT

  ./beaver.py

Same, lighter settings (~95% test success instead of ~99%):

  ./beaver.py --nb_train_samples=25000 --nb_test_samples=10000

To train with a non-causal attention on the prompt + random
auto-regression order:

  ./beaver.py --nb_epochs=50 --learning_rate_schedule='25: 2e-4' --random_regression_order --noncausal_prompt

to get the one-shot prediction from an existing checkpoint (trained
with --random_regression_order and --noncausal_prompt):

  ./beaver --oneshot