To train the shortest-path solving GPT ./beaver.py Same, lighter settings (~95% test success instead of ~99%): ./beaver.py --nb_train_samples=25000 --nb_test_samples=10000 To train with a non-causal attention on the prompt + random auto-regression order: ./beaver.py --nb_epochs=50 --learning_rate_schedule='25: 2e-4' --random_regression_order --noncausal_prompt to get the one-shot prediction from an existing checkpoint (trained with --random_regression_order and --noncausal_prompt): ./beaver --oneshot