Download Spec — Machine-readable spec artifact for this endpoint.

Concepts covered: vector-search sai-indexes nvidia-embeddings similarity-scoring

GET /api/v1/search/videos - Video Search (Semantic & Keyword)

Overview

This endpoint searches videos using either semantic search (AI-powered, meaning-based) or keyword search (traditional text matching). It demonstrates Astra DB's vector search capabilities with NVIDIA embeddings and Storage-Attached Indexes for text search.

Why it exists: Modern search needs go beyond exact keyword matching. Semantic search understands intent and meaning, enabling queries like "funny cat videos" to match videos titled "Hilarious Feline Compilation" even though they share no common words.

HTTP Details

Method: GET
Path: /api/v1/search/videos
Auth Required: No (public endpoint)
Success Status: 200 OK

Request Parameters

GET /api/v1/search/videos?query=python+tutorials&mode=semantic&page=1&pageSize=10

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	Search term (min 1 char)
`mode`	string	No	`keyword`	Search mode: `semantic` or `keyword`
`page`	integer	No	1	Page number (≥1)
`pageSize`	integer	No	10	Results per page (1-100)

Response Body

{
  "data": [
    {
      "videoid": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Advanced Python Tutorial",
      "description": "Learn advanced Python concepts",
      "preview_image_location": "https://...",
      "userid": "...",
      "added_date": "2025-10-31T10:00:00Z",
      "tags": ["python", "tutorial", "programming"],
      "$similarity": 0.87
    }
  ],
  "pagination": {
    "currentPage": 1,
    "pageSize": 10,
    "totalItems": 42,
    "totalPages": 5
  }
}

Note: $similarity only appears in semantic mode results.

Cassandra Concepts Explained

What is Vector Search?

Vector search finds items based on semantic similarity rather than exact text matches.

How it works:

Text → Numbers: Convert text to a vector (array of numbers)

"python tutorial" → [0.23, 0.87, -0.45, ..., 0.12]  # 4096 numbers

Measure distance: Calculate how similar two vectors are

cosine_similarity(query_vector, video_vector) = 0.87  # 0-1 scale

Rank by similarity: Return results sorted by similarity score

Example:

Query: "learn coding"
Results:
  1. "Programming Tutorial" (similarity: 0.92)
  2. "Software Development Basics" (similarity: 0.89)
  3. "How to Code" (similarity: 0.85)

Notice: None match exactly, but all are semantically related!

Vector Column in Cassandra

Cassandra 5.0 introduces the vector data type:

CREATE TABLE videos (
    videoid uuid PRIMARY KEY,
    name text,
    content_features vector<float, 4096>  -- 4096-dimensional vector
);

What's stored:

{
  "videoid": "...",
  "name": "Python Tutorial",
  "content_features": [0.234, 0.876, -0.453, ..., 0.123]
}

Size: 4096 floats × 4 bytes = ~16 KB per video embedding

Vectorize Feature (Astra DB)

Problem: How do we convert text to vectors?

Traditional approach: Run your own embedding model

import openai
embedding = openai.embeddings.create(
    model="text-embedding-ada-002",
    input="Python tutorial"
)

Astra approach: Built-in $vectorize (automatic)

# Just insert text, Astra handles embedding
await videos_table.insert_one({
    "videoid": video_id,
    "content_features": "Python tutorial for beginners"  # Text, not vector!
})
# Astra automatically converts to 4096-dim vector using NVIDIA model

For search:

# Query with text, Astra embeds it automatically
results = videos_table.find(
    sort={"content_features": "learn python"}  # Text query
)
# Astra embeds "learn python" and finds similar vectors

Benefits:

No need to run embedding models yourself
Consistent embeddings (same model for all data)
Lower latency (embeddings happen in-database)

Vector Index with SAI

To enable fast vector search, create a SAI index:

CREATE CUSTOM INDEX videos_content_features_idx
ON videos(content_features)
USING 'StorageAttachedIndex'
WITH OPTIONS = {
  'similarity_function': 'COSINE',  -- How to measure similarity
  'source_model': 'nv-qa-4'         -- NVIDIA NV-Embed-QA model
};

Index Properties:

Similarity Function: COSINE (others: EUCLIDEAN, DOT_PRODUCT)
Source Model: nv-qa-4 (NVIDIA NV-Embed-QA, 4096 dimensions)
Performance: Approximate Nearest Neighbor (ANN) search, not exhaustive

Keyword Search with SAI

For traditional text search, use SAI on text columns:

CREATE CUSTOM INDEX videos_name_idx
ON videos(name)
USING 'StorageAttachedIndex';

Query:

results = videos_table.find(
    filter={"name": {"$regex": "python", "$options": "i"}}  # Case-insensitive
)

Semantic vs Keyword Search

Aspect	Semantic Search	Keyword Search
Matching	Meaning-based	Exact text
Example Query	"learn coding"	"python tutorial"
Matches	"Programming Basics", "Software Dev"	"Python Tutorial" only
Technology	Vector embeddings + ANN	Text index + regex
Speed	Slower (~50-200ms)	Faster (~10-50ms)
Accuracy	Better for natural language	Better for exact terms
Storage	+16KB per video	Minimal

Data Model

Table: `videos`

CREATE TABLE killrvideo.videos (
    videoid uuid PRIMARY KEY,
    added_date timestamp,
    description text,
    name text,
    tags set<text>,
    content_features vector<float, 4096>,  -- For semantic search
    userid uuid,
    preview_image_location text
);

-- Vector search index
CREATE CUSTOM INDEX videos_content_features_idx
ON killrvideo.videos(content_features)
USING 'StorageAttachedIndex'
WITH OPTIONS = {
  'similarity_function': 'COSINE',
  'source_model': 'nv-qa-4'
};

-- Keyword search index
CREATE CUSTOM INDEX videos_name_idx
ON killrvideo.videos(name)
USING 'StorageAttachedIndex';

Database Queries

Mode Selection Logic

use_semantic = (
    mode == "semantic" and
    settings.VECTOR_SEARCH_ENABLED  # Feature flag
)

if use_semantic:
    results = await search_videos_by_semantic(query, page, page_size)
else:
    results = await search_videos_by_keyword(query, page, page_size)

Why a feature flag?

Vector search requires embeddings to be populated
May be disabled for cost/performance reasons
Allows A/B testing

Query 1: Semantic Search

async def semantic_search_with_threshold(
    db_table,
    vector_column="content_features",
    query="python tutorials",
    page=1,
    page_size=10,
    similarity_threshold=0.7,   # Min similarity score
    overfetch_factor=3          # Fetch 3x to allow filtering
):
    # Calculate how many docs to fetch
    overfetch = page_size * overfetch_factor * page

    # Vector search query
    cursor = db_table.find(
        filter={},                      # No WHERE clause (search all)
        sort={vector_column: query},    # Sort by similarity to query
        limit=overfetch,                # Fetch extra for filtering
        include_similarity=True         # Return $similarity score
    )

    docs = await cursor.to_list()

    # Client-side filtering by similarity threshold
    docs = [d for d in docs if d.get("$similarity", 0) >= similarity_threshold]

    # Paginate client-side
    start = (page - 1) * page_size
    end = start + page_size
    page_docs = docs[start:end]

    return page_docs, len(docs)

What Astra does:

Embeds query "python tutorials" using NVIDIA model → 4096-dim vector
Performs ANN search to find nearest neighbors
Returns results with $similarity score (0.0-1.0)

Performance: ~50-200ms depending on dataset size

Query 2: Keyword Search

async def search_videos_by_keyword(query: str, page: int, page_size: int):
    videos_table = await get_table("videos")

    # Case-insensitive regex search on video name
    cursor = videos_table.find(
        filter={"name": {"$regex": query, "$options": "i"}},
        limit=page_size,
        skip=(page - 1) * page_size
    )

    docs = await cursor.to_list()
    return [VideoSummary.model_validate(d) for d in docs], len(docs)

Actual Data API syntax:

{
  "find": {
    "filter": {"name": {"$regex": "python", "$options": "i"}},
    "options": {"limit": 10, "skip": 0}
  }
}

Performance: ~10-50ms (faster than vector search)

Implementation Flow

┌───────────────────────────────────────────────────────────┐
│ 1. Client sends GET /api/v1/search/videos?               │
│    query=python&mode=semantic                            │
└────────────────────┬──────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────────────┐
│ 2. Validate query parameters                             │
│    ├─ query too short? → 422 Validation Error            │
│    ├─ page/pageSize invalid? → 422                       │
│    └─ Valid? → Continue                                  │
└────────────────────┬──────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────────────┐
│ 3. Determine search mode                                 │
│    mode == "semantic" AND VECTOR_SEARCH_ENABLED?         │
│    ├─ Yes → Use semantic_search_with_threshold()         │
│    └─ No → Use search_videos_by_keyword()                │
└────────────────────┬──────────────────────────────────────┘
                     │
                ┌────┴────┐
                │         │
                ▼         ▼
    ┌────────────────┐  ┌──────────────────┐
    │ SEMANTIC PATH  │  │  KEYWORD PATH    │
    └────────────────┘  └──────────────────┘
                │         │
                ▼         ▼
┌───────────────────────────────────────────────────────────┐
│ 4. Execute database query                                │
│                                                           │
│ SEMANTIC:                                                 │
│   find(sort={content_features: query},                   │
│        limit=overfetch, include_similarity=True)         │
│                                                           │
│ KEYWORD:                                                  │
│   find(filter={name: {$regex: query}},                   │
│        limit=pageSize, skip=offset)                      │
└────────────────────┬──────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────────────┐
│ 5. Post-process results (semantic only)                  │
│    ├─ Filter by similarity_threshold (>=0.7)             │
│    ├─ Paginate client-side                               │
│    └─ Map to VideoSummary models                         │
└────────────────────┬──────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────────────┐
│ 6. Build paginated response                              │
│    {data: [...], pagination: {page, totalItems, ...}}    │
└────────────────────┬──────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────────────┐
│ 7. Return 200 OK with search results                     │
└───────────────────────────────────────────────────────────┘

Semantic Search Queries: 1 vector search

Keyword Search Queries: 1 text search

Special Notes

1. The Overfetch Pattern

Problem: We want to filter by similarity threshold and paginate

Naive approach (doesn't work):

# Can't filter by $similarity in the query itself
cursor = db_table.find(
    filter={"$similarity": {"$gte": 0.7}},  # Not supported
    sort={"content_features": query}
)

Solution: Overfetch + client-side filter

# Fetch 3x the page size
cursor = db_table.find(
    sort={"content_features": query},
    limit=page_size * 3,  # Overfetch
    include_similarity=True
)

docs = await cursor.to_list()

# Filter client-side
docs = [d for d in docs if d["$similarity"] >= 0.7]

# Then paginate
page_docs = docs[start:end]

Why overfetch by 3x?

With threshold=0.7, ~60-70% of results typically pass
3x gives us enough results after filtering
Grows with page number (page 10 fetches more to account for earlier pages)

Trade-off: More data transferred, but necessary for filtering

2. Token Limit for Embeddings

Limitation: NVIDIA NV-Embed-QA has a 512-token limit

What's a token? Roughly 1 token ≈ 0.75 words

Examples:

"Python tutorial" ≈ 2 tokens
300-word description ≈ 400 tokens
1000-word description ≈ 1333 tokens - Too long!

Solution: Clip text before embedding

def clip_to_512_tokens(text: str) -> str:
    """Clip text to 512 tokens (rough estimate)."""
    # Rough estimate: 1 token ≈ 4 characters
    max_chars = 512 * 4
    if len(text) <= max_chars:
        return text
    return text[:max_chars] + "..."

Used during video submission:

content_text = f"{video.name} {video.description} {' '.join(video.tags)}"
clipped_text = clip_to_512_tokens(content_text)

await videos_table.insert_one({
    "content_features": clipped_text  # Safe to embed
})

3. Similarity Score Interpretation

Scale: 0.0 (completely different) to 1.0 (identical)

Typical values:

Score	Interpretation	Action
0.9-1.0	Nearly identical	Perfect match
0.7-0.9	Highly relevant	Include in results
0.5-0.7	Somewhat related	Marginal
0.0-0.5	Unrelated	Filter out

Current threshold: 0.7 (configurable in code)

Tuning:

Higher threshold (0.8+): Fewer, more relevant results
Lower threshold (0.5+): More results, some less relevant

4. Cold Start Problem

Scenario: New video just uploaded, no embedding yet

Vector search behavior:

# Videos without embeddings won't appear in vector search
cursor = db_table.find(sort={"content_features": query})
# Only returns videos with populated content_features

Solution:

Process embeddings during video submission (synchronous)
OR ensure background job runs quickly
OR fall back to keyword search for new videos

Current implementation: Embeddings created during submission (synchronous)

5. Cost Considerations

Vector embeddings have costs:

Aspect	Cost
Storage	~16KB per video (4096 floats)
Compute	Embedding API calls (NVIDIA charges per call)
Search	ANN index maintenance

For 1 million videos:

Storage: 1M × 16KB = ~16GB of vector data
Embedding cost: Depends on pricing model

Optimization:

Only embed text up to 512 tokens (saves compute)
Batch embedding operations
Cache frequent queries
Use keyword search for simple exact-match queries

Developer Tips

Common Pitfalls

Forgetting token limits: Clip text to 512 tokens before embedding
Not handling empty results: Vector search may return no results above threshold
Overfetching too much: Balance between filtering flexibility and performance
Mixing search modes: Don't combine vector + keyword in same query
Ignoring similarity scores: Threshold is crucial for result quality

Best Practices

Combine both search modes:

# Try semantic first
results = semantic_search(query)
if len(results) < 5:
    # Fall back to keyword
    results += keyword_search(query)

Tune similarity threshold based on user feedback

Add query expansion:

# "python" → ["python", "programming", "coding"]
expanded_query = expand_synonyms(query)

Cache popular queries:

cache_key = f"search:{mode}:{query}:{page}"
if cached := await redis.get(cache_key):
    return cached

Log zero-result queries: Identify gaps in content

Testing Tips

# Test semantic search
async def test_semantic_search():
    response = await client.get(
        "/api/v1/search/videos",
        params={"query": "learn programming", "mode": "semantic"}
    )

    assert response.status_code == 200
    data = response.json()

    # Check pagination structure
    assert "data" in data
    assert "pagination" in data

    # Check similarity scores
    for video in data["data"]:
        assert "$similarity" in video
        assert 0 <= video["$similarity"] <= 1

# Test keyword search
async def test_keyword_search():
    response = await client.get(
        "/api/v1/search/videos",
        params={"query": "python", "mode": "keyword"}
    )

    assert response.status_code == 200
    data = response.json()

    # Results should contain "python" in name
    for video in data["data"]:
        assert "python" in video["name"].lower()

# Test empty results
async def test_no_results():
    response = await client.get(
        "/api/v1/search/videos",
        params={"query": "xyzabc123notfound"}
    )

    assert response.status_code == 200
    data = response.json()
    assert len(data["data"]) == 0
    assert data["pagination"]["totalItems"] == 0

# Test pagination
async def test_search_pagination():
    response = await client.get(
        "/api/v1/search/videos",
        params={"query": "tutorial", "page": 2, "pageSize": 5}
    )

    data = response.json()
    assert data["pagination"]["currentPage"] == 2
    assert data["pagination"]["pageSize"] == 5
    assert len(data["data"]) <= 5

GET /api/v1/search/videos - This endpoint
BM25 Search Roadmap - Future full-text search improvements