GET /api/v1/recommendations/foryou - Personalized Recommendations
Overview
This endpoint returns a personalized list of video recommendations for the currently authenticated user. It works by representing both videos and the user's watch/interest history as high-dimensional vectors, then finding the videos whose vectors are most similar to the user's preference vector using an Approximate Nearest Neighbor (ANN) search in Cassandra.
Why it exists: Chronological or popularity-based feeds are easy to implement but poor at personalization. Vector similarity search lets the system surface videos that are semantically related to what a user has engaged with, without needing to hand-code similarity rules.
HTTP Details
- Method: GET
- Path:
/api/v1/recommendations/foryou - Auth Required: Yes — viewer role (JWT bearer token)
- Success Status: 200 OK
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page |
integer | 1 | Page number (1-based) |
pageSize |
integer | 10 | Number of recommendations per page |
Response Body
{
"items": [
{
"videoId": "550e8400-e29b-41d4-a716-446655440000",
"title": "Introduction to Cassandra Data Modeling",
"thumbnailUrl": "https://cdn.killrvideo.com/thumbs/550e8400.jpg",
"description": "Learn the fundamentals of Cassandra table design",
"tags": ["cassandra", "databases", "tutorial"],
"addedDate": "2025-10-15T09:00:00Z",
"userId": "7f3e1a2b-dead-beef-cafe-123456789abc"
}
],
"total": 42,
"page": 1,
"pageSize": 10
}
Cassandra Concepts Explained
Vector Embeddings
A vector embedding is a list of floating-point numbers that represents the semantic meaning of content in a high-dimensional space. For videos, the embedding might encode topics, style, pacing, and subject matter as coordinates in a 1,536-dimensional space (a common size for embedding models).
The key property of embeddings is that semantically similar content has numerically similar vectors. Two videos about Cassandra data modeling will have embeddings that are "close" to each other in vector space, even if they share no keywords.
How video embeddings are generated:
- A machine learning model (e.g., OpenAI's text-embedding model) processes the video title, description, and tags
- It outputs a fixed-length array of floats
- This array is stored in the
content_featurescolumn of thevideostable
Vector Similarity Search
Given a query vector (representing the user's preferences), a vector similarity search finds the stored vectors that are closest to it. "Closeness" is typically measured by:
- Cosine similarity: Angle between vectors (measures directional similarity, ignores magnitude)
- Dot product: Magnitude-weighted similarity
- Euclidean distance: Straight-line distance in vector space
Cassandra 5.0 supports vector search natively with the ANN OF syntax:
SELECT videoid, title, thumbnail_url
FROM killrvideo.videos
ORDER BY content_features ANN OF [0.12, -0.45, 0.78, ...]
LIMIT 10;
This returns the 10 videos whose content_features vectors are most similar to the provided query vector.
Approximate Nearest Neighbor (ANN) Search
"Exact" nearest neighbor search requires comparing the query vector against every stored vector — O(n) per query, impractical for large datasets.
ANN algorithms trade a small amount of accuracy for dramatically better performance. They use data structures like HNSW (Hierarchical Navigable Small World graphs) to organize vectors so that similar ones cluster together. A search then explores only a fraction of the total vectors.
Cassandra's vector search uses HNSW internally:
- Typical recall: > 95% (finds most of the true nearest neighbors)
- Query time: O(log n) rather than O(n)
- Suitable for real-time recommendation at scale
User Preference Modeling
To generate personalized recommendations, the system needs a vector that represents the user's preferences. Several approaches exist:
- Average of watched video vectors: Sum the
content_featuresvectors of videos the user has viewed, then normalize. Simple and effective. - Weighted average: Give recent views more weight than older ones.
- Dedicated user embedding: Train a model that learns user preference vectors directly.
The service retrieves the user's preference vector (however it was computed) and uses it as the ANN query vector.
Data Model
Table: videos (relevant columns)
CREATE TABLE killrvideo.videos (
videoid uuid PRIMARY KEY,
userid uuid,
name text,
description text,
tags set<text>,
thumbnail_url text,
added_date timestamp,
content_features vector<float, 1536> -- The embedding vector
);
-- Vector index required for ANN search
CREATE CUSTOM INDEX videos_content_features_idx
ON killrvideo.videos(content_features)
USING 'StorageAttachedIndex';
Key detail: The content_features column stores the video's semantic embedding. Without the SAI vector index, you cannot perform ANN queries on this column.
User Preference Vector
The user preference vector is not stored in a separate Cassandra table for this endpoint — it is derived or retrieved from a user profile service or computed from the user's watch history. The exact storage is implementation-specific, but the result is a float[] of the same dimensionality as content_features.
Database Queries
1. Retrieve User Preference Vector
async def get_user_preference_vector(user_id: UUID) -> list[float]:
# Retrieve pre-computed preference vector for the user
# This might come from a user_preferences table or be computed on the fly
prefs_table = await get_table("user_preferences")
row = await prefs_table.find_one(filter={"userid": str(user_id)})
return row["preference_vector"] if row else None
2. ANN Vector Search on Videos
async def find_similar_videos(
preference_vector: list[float],
limit: int
) -> list[dict]:
videos_table = await get_table("videos")
results = await videos_table.find(
sort={"content_features": {"$vector": preference_vector}},
limit=limit,
projection={"videoid": 1, "name": 1, "thumbnail_url": 1,
"description": 1, "tags": 1, "added_date": 1, "userid": 1}
)
return results
Equivalent CQL:
SELECT videoid, name, thumbnail_url, description, tags, added_date, userid
FROM killrvideo.videos
ORDER BY content_features ANN OF [0.12, -0.45, 0.78, ...]
LIMIT 10;
Performance: Sub-100ms for typical datasets thanks to HNSW indexing. Exact performance depends on vector dimensionality and dataset size.
3. Exclude Already-Watched Videos (Optional)
For a better user experience, the service can filter out videos the user has already seen:
watched_ids = await get_watched_video_ids(user_id)
recommendations = [r for r in raw_results if r["videoid"] not in watched_ids]
This filtering happens in the application layer because Cassandra's ANN query does not support complex post-filters natively.
Implementation Flow
┌─────────────────────────────────────────────────────────┐
│ 1. Client sends GET /api/v1/recommendations/foryou │
│ Header: Authorization: Bearer <jwt> │
│ ?page=1&pageSize=10 │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. JWT middleware validates token, extracts userid │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 3. Retrieve user preference vector │
│ ├─ Found: use as ANN query vector │
│ └─ Not found: fall back to popular/recent videos │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 4. ANN search: ORDER BY content_features ANN OF <vector> │
│ LIMIT pageSize * 2 (over-fetch for filtering) │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 5. Filter out already-watched videos (optional) │
│ Apply pagination offset │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 6. Return 200 OK with PaginatedResponse │
│ { items (VideoSummary[]), total, page, pageSize } │
└─────────────────────────────────────────────────────────┘
Special Notes
1. Cold Start Problem
A new user has no watch history, so there is no preference vector. The system must handle this gracefully:
- Fall back to trending/popular videos
- Use demographic signals (if available) to seed recommendations
- Show onboarding content to build initial preferences
2. Vector Dimensionality Must Match
The preference vector and content_features column must have the same number of dimensions. If videos were ingested with 1,536-dimensional embeddings, the user preference vector must also be 1,536-dimensional. Dimension mismatch causes a query error.
3. ANN is Not Exact
The ANN search may occasionally miss videos that would technically be "closer" in vector space. The probability of missing a true nearest neighbor is low (typically < 5%) and the tradeoff for performance is worth it at scale.
4. Over-Fetching for Post-Processing
If the service filters results after the ANN query (e.g., removing watched videos), it should fetch more results than needed from Cassandra and then trim to pageSize. Fetching exactly pageSize results, then filtering, risks returning fewer items than requested.
5. Embedding Model Versioning
If the embedding model is upgraded (e.g., from 1,536 to 3,072 dimensions), all existing video vectors and user preference vectors must be regenerated with the new model before the ANN search will work correctly. Plan for this migration before changing models.
Developer Tips
Common Pitfalls
-
Forgetting the vector SAI index: ANN queries (
ORDER BY ... ANN OF) require a Storage-Attached Index on the vector column. Without it, the query will fail or fall back to a full scan. -
Cold start with no preference vector: Always implement a fallback for users who have not yet developed a preference profile.
-
Returning videos the user already watched: Without filtering, the top ANN results often include videos the user has already seen (high similarity because they've engaged with them). Always filter or deprioritize watched content.
-
Dimension mismatch: Ensure the user preference vector and video embedding vector always use the same model and dimensionality.
Query Performance Expectations
| Operation | Performance | Why |
|---|---|---|
| Fetch user preference vector | < 10ms | Partition key lookup |
| ANN vector search | < 50ms | HNSW index traversal |
| Post-processing (filtering, pagination) | < 5ms | In-memory |
| Total | < 65ms | Dominated by vector search |
Testing Tips
async def test_foryou_requires_auth():
response = await client.get("/api/v1/recommendations/foryou")
assert response.status_code == 401
async def test_foryou_returns_video_items():
response = await client.get(
"/api/v1/recommendations/foryou",
headers={"Authorization": f"Bearer {viewer_token}"}
)
assert response.status_code == 200
data = response.json()
assert "items" in data
assert isinstance(data["items"], list)
for item in data["items"]:
assert "videoId" in item
assert "title" in item
Related Endpoints
- GET /api/v1/videos/{video_id}/related - Content-based related videos
- POST /api/v1/reco/ingest - Ingest video embeddings that power this endpoint