pdf2text/README.md at ec1497eef55270e94a92530b390b723d20706eeb

 # PDF2Text
 I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge.
 ## Convert all PDF's to text
 This is an [script](/pdf2text) for converting a batch of PDF's to text for machine learning.
 It only has two dependencies:
  - `python3`
  - `pdf.miner` (python requirement, specified in [requirements.txt](/requirements.txt) file)
 ## Installation
 ```bash
 python3 -m venv .venv
 source .venv/bin/activate
 pip install -r requirements.txt
 ```
 ## Usage:
 Activate your virtual environment.
 ```bash
 source .venv/bin/activate
 ./pdf2text [source/destination dir]
 ```
 You read that correctly, the source directory is also the destination directory.
 ## Todo:
 Make decent python package so it's installable on system without having to load environment first. Not sure if worth it, it's not something you daily use.