Post

Trionyx-2-2B

I trained another 2B parameter LLM from scratch today. Am running SFT now, but I am excited that the base eval shows I have noticably improved on the last run on a number of key benchmarks:

Here is a quick comparison.

TaskTrionyx 2BTrionyx-2 2BChange
ARC Easy0.3540.610+0.256
ARC Challenge0.0460.420+0.374
COPA0.2400.610+0.370
CommonsenseQA0.0790.380+0.301
PIQA0.2960.630+0.334
SQuAD0.2350.485+0.250
CoQA0.2450.365+0.120

Pretty pleased with this!