canteen's blog

Machine Learning

it's cool

It feels like i'm living in the future friends. All these tasty generative language models!

what am I up to

Because we're 7 of us and the human body was not designed to contain this much awareness, we sleep terribly. I am thus now exploring efficient language models that perform to a standard that I would enjoy reading the output of. Behold:

It is very cool and it is supposed to perform adequately even on CPU. I will try this tomorrow, if I wake up.

what's next

To demo this to my non-tech friends (this is important to me), the model needs to be more capable of following instructions I think (i.e. be susceptible to prompt engineering). For this some fine-tuning may be in order, for which I've found:

Which appears to be of a decent quality if a little small. I've also found a ton of chat datasets (e.g. but evaluating their quality on my tiny phone screen is hard. A lot of them are also not useful (f.e. a Facebook dataset said "thankfully this is pretty easy to recreate if you have access to a SLURM cluster"), sometimes hilariously so.


I wish a paper would only be deemed good if it also published the datasets that went into getting the results (so it can be replicated), because that would make all these LLM papers way cooler. Sadly we do know from experience a lot of papers are full of bogus. But perhaps these are different! :)


I'm going to attempt to sleep now. Thanks for reading!
