| Library | Use Case | Key Feature |
|---------|----------|--------------|
| pypdf (formerly PyPDF2) | Reading, merging, splitting, rotating, cropping | Pure Python, no dependencies |
| pdfplumber | Extract text, tables, metadata | Handles complex layouts better |
| reportlab | Generate PDFs from scratch | Canvas, Platypus for flowables |
| pikepdf | Advanced manipulation, repair, linearization | Wrapper around QPDF |
| borb | Modern PDF reading/writing, annotations, forms | OO design, type hints |
| pdf2image + pytesseract | OCR on scanned PDFs | Converts pages to images |
Verified pick for 2024+: pypdf + pdfplumber + pikepdf cover 90% of needs.
Save with pikepdf:
pdf.save("web_ready.pdf", linearize=True)
Makes first page load instantly on browsers. Non-negotiable for web apps.
The pain: Government PDF forms come in three incompatible formats. | Library | Use Case | Key Feature
The verified strategy:
Example for AcroForms:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("form.pdf")
writer = PdfWriter()
writer.append(reader)
writer.update_page_form_field_values(writer.pages[0], "name": "John Doe")
with open("filled.pdf", "wb") as f:
writer.write(f)
For XFA, use python-xfdf library.