>> can you please explain how I can recreate the files *.tiktoken? >> There seem to be some sources missing ... > > The two files in question are 50k lines of ASCII text that seem to be > some kind of index / vocabulary, and I have no idea how they were > created.
Perhaps there is some clues to be had at the reimplementation at https://github.com/ggerganov/whisper.cpp/ - or perhaps their authors know? ...and perhaps you might find interest in packaging that C++ reimplementation too/instead? ;-) - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc
Description: signature