baggle BAGA
ummmmmmmmmmmmmmmmm so. there are language models that ummm produce many tokens at once! maybe these could run more effectively on embedded systems! likely so! ummmmmmmm so if you can produce n tokens at once, then that amortizes the cost of going through all your layers. it would make it cheaper to offload them!