Bobinas P4G
  • Login
  • Public

    • Public
    • Groups
    • Popular
    • People

#!/usr/bin/env python3.11 import webvtt import sys import argparse def vtt_to_markdown(vtt_file, markdown_file): with open(markdown_file, 'w') as md: for caption in webvtt.read(vtt_file): md.write(caption.text + '\n\n') def main(): parser = argparse.ArgumentParser(description="Convert VTT to Markdown.") parser.add_argument('vtt_file', nargs='?', type=argparse.FileType('r'), default=sys.stdin, help="Input VTT file") parser.add_argument('markdown_file', nargs='?', type=str, default='output.md', help="Output Markdown file") args = parser.parse_args() vtt_file_path = args.vtt_file.name markdown_file_path = args.markdown_file vtt_to_markdown(vtt_file_path, markdown_file_path) print(f"Conversion complete. Markdown file saved as {markdown_file_path}") if __name__ == "__main__": main()

Download link

https://media.hachyderm.io/media_attachments/files/112/458/865/762/880/668/original/fe45e97f9ebe2e49.png

Notices where this attachment appears

  1. Brahn (brahn@hachyderm.io)'s status on Monday, 20-May-2024 15:58:13 UTC Brahn Brahn
    in reply to

    @cr0n0s @simon I just realized how unhappy i was with just using `ttok` to fit it into the model. I wrote this python script to strip out garbage.

    I went from 100k+ tokens to `25626` and no loss of context!

    In conversation Monday, 20-May-2024 15:58:13 UTC from hachyderm.io permalink
  • Help
  • About
  • FAQ
  • Privacy
  • Source
  • Version
  • Contact

Bobinas P4G is a social network. It runs on GNU social, version 2.0.1-beta0, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All Bobinas P4G content and data are available under the Creative Commons Attribution 3.0 license.