@cr0n0s @simon I just realized how unhappy i was with just using `ttok` to fit it into the model. I wrote this python script to strip out garbage.
I went from 100k+ tokens to `25626` and no loss of context!
@cr0n0s @simon I just realized how unhappy i was with just using `ttok` to fit it into the model. I wrote this python script to strip out garbage.
I went from 100k+ tokens to `25626` and no loss of context!
@cr0n0s
I don't know if this is useful, but I use it alot.
```
set -l vttfilename (yt-dlp --write-auto-sub --skip-download -o '%(id)s.%(ext)s' 'https://www.youtube.com/watch?v=IuF0GlO2Myk' 2>&1 | rg "Destination: " | rg -o '[a-zA-Z0-9_-]+\.en\.vtt')
cat $vttfilename| ttok -m gpt-4 -t 120000 | llm -m 4o 'convert this vtt file to readable prose'
```
This requires tooling from @simon; `ttok` and `llm`
Also - this is fish shell.
Hope it's useful!
Bobinas P4G is a social network. It runs on GNU social, version 2.0.1-beta0, available under the GNU Affero General Public License.
All Bobinas P4G content and data are available under the Creative Commons Attribution 3.0 license.