Below is a single‑line Bash script that will generate many of the above sections automatically and drop the results into a folder called MIDV-354_report. Feel free to adapt paths and parameters to your environment.
#!/usr/bin/env bash
set -euo pipefail
VIDEO="MIDV-354.mp4"
OUTDIR="$VIDEO%.*_report"
mkdir -p "$OUTDIR"
# 1️⃣ Basic media info
ffprobe -v error -show_format -show_streams "$VIDEO" > "$OUTDIR/ffprobe.txt"
# 2️⃣ Checksum
sha256sum "$VIDEO" > "$OUTDIR/checksum.sha256"
# 3️⃣ Keyframes (I‑frames)
ffmpeg -i "$VIDEO" -vf "select='eq(pict_type\,I)'" -vsync vfr -frame_pts true "$OUTDIR/keyframe_%04d.jpg"
# 4️⃣ Scene detection (PySceneDetect)
scenedetect -i "$VIDEO" detect-content list-scenes -f "$OUTDIR/scenes.csv"
# 5️⃣ Object detection (YOLOv8 – assumes you have it installed)
yolo task=detect mode=predict model=yolov8n.pt source="$OUTDIR/keyframe_*.jpg" conf=0.25 save=False project="$OUTDIR" name="yolo_preds"
# 6️⃣ OCR (Tesseract)
for img in "$OUTDIR"/keyframe_*.jpg; do
fname=$(basename "$img" .jpg)
tesseract "$img" "$OUTDIR/ocr/$fname" -l eng txt
done
# 7️⃣ Audio extraction + Whisper transcription
ffmpeg -i "$VIDEO" -vn -acodec pcm_s16le -ar 16000 "$OUTDIR/audio.wav"
whisper "$OUTDIR/audio.wav" --model medium --language en --output_format txt > "$OUTDIR/transcript.txt"
# 8️⃣ Speaker diarization (pyannote)
pyannote-audio diarization "$OUTDIR/audio.wav" > "$OUTDIR/diarization.rttm"
echo "Report assets generated in $OUTDIR"
Tip: After running the script, open the generated
ffprobe.txt,scenes.csv, and the OCR text files to fill in the bold placeholders in the template above.
| Goal | Command / Tool | Example |
|------|----------------|---------|
| Extract basic technical info | ffprobe -v error -show_format -show_streams MIDV‑354.mp4 | – |
| Generate key‑frame thumbnails | ffmpeg -i MIDV‑354.mp4 -vf "select='eq(pict_type\,I)'" -vsync vfr -frame_pts true key_%04d.jpg | – |
| Detect objects | yolo detect --model yolov8n.pt --source key_*.jpg --conf 0.25 --save-txt | Outputs *.txt per frame |
| OCR on frames | tesseract frame_001.png out -l eng | – |
| Audio transcription | whisper MIDV‑354.mp4 --model medium --language en --output_format txt | – |
| Speaker diarization | pyannote-audio diarization MIDV‑354.wav | – |
| Music / sound classification | essentia_extractor -i MIDV‑354.wav -o features.json | – |
| Checksum | sha256sum MIDV‑354.mp4 | – |
| Metadata dump | exiftool MIDV‑354.mp4 | – |
| Scene change detection | scenedetect -i MIDV‑354.mp4 detect-content list-scenes | – |
| Export annotated frames (COCO) | Custom Python script using pycocotools + detection boxes | – |
The filename MIDV-354.mp4 follows a standard naming pattern often used in organized digital workflows. Let’s break it down:
Numbering: "354"
Extension: ".mp4"