Copyright © 2026 Digital Exception Sthlm AB. All rights Reserved. Privacy Policy.
Most tiny models require you to hunt for a separate tokenizer configuration or manually implement generation loops. The CompleteTinyModelRaven Top ships as a self-contained .bin file paired with a generation_config.json. A single line of Python loads the entire ecosystem:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("completetinymodelraven_top")
tokenizer = AutoTokenizer.from_pretrained("completetinymodelraven_top")
Here is a standard script to get you started: completetinymodelraven top
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
Unlike standard decoder-only models, the Raven architecture utilizes a Recursive Attention with Variable Extraction Nodes (RAVEN). This allows the model to maintain a longer effective context window (up to 8k tokens) without the quadratic blowup of standard attention. The "Top" variant trims the top 2 layers during inference, reducing latency by 30%. Most tiny models require you to hunt for
Because the CompleteTinyModelRaven Top runs locally, there is no data leakage to API endpoints. However, the model is not aligned against harmful content by default. The base "Raven Top" was trained on a filtered Common Crawl subset, but developers should implement their own safety guardrails if deploying in public-facing applications. Here is a standard script to get you
A lightweight safety filter is included in the safety/ folder of the repository. Enable it via:
model.enable_safety_filter(threshold=0.85)