Tech Implemenation

1. HIGH-LEVEL ARCHITECTURE: THE "HYBRID CLOUD"

We are utilizing a Hybrid approach where N8N acts as the API Gateway/Orchestrator, but all heavy computation and storage are offloaded to AWS and specialized APIs.

The 4-Layer Stack

Layer	Component	Technology Stack	Function
1. Interaction	Frontend / UI	React / Next.js	User Dashboard, Approval Interface, Upload Portal.
2. Orchestration	The Controller	N8N (Self-Hosted)	Traffic control, Webhook listening (Reviews), API routing.
3. Intelligence	The Brain (RAG)	OpenAI (GPT-4o) + Pinecone	Scriptwriting, Semantic Search, Brand Voice Memory.
4. Factory	The Engine	AWS Lambda + S3 + Shotstack	File storage, Computer Vision, Beat-Sync, Rendering.

2. CORE COMPONENT DEEP DIVE

A. The "Data Vault" (Storage & Ingestion)

Design Pattern: Asynchronous S3 Trigger

We do not pass video files through the application server. We use a Direct-to-Cloud pattern.

Architecture: Amazon S3 (Simple Storage Service) configured with Transfer Acceleration.
Security: All uploads use Presigned URLs. The frontend requests a temporary "Key" to upload directly to AWS. This keeps our servers lightweight.
Event Loop:
- Event: File lands in s3://raw-uploads.
- Trigger: AWS Lambda function fires automatically.
- Action: Calls Banana.dev / Google Gemini Vision to analyze the footage.
- Output: Metadata tags [Sunset, Oysters, Luxury, 4K] are sent to the Vector Database.

B. The "Nexus Brain" (RAG - Retrieval Augmented Generation)

Design Pattern: Vector Semantic Search

Standard AI (ChatGPT) hallucinates because it lacks context. We implement RAG to ground the AI in reality.

Vector Database (Pinecone): We convert every asset tag and every client brand document into "Embeddings" (numerical representations of meaning).
The Query Logic:
- Input: "Make a video for a luxury foodie couple."
- Vector Search: The system queries Pinecone for vectors matching "Luxury" + "Food" + "Couples" + "High Resolution."
- Retrieval: It pulls the exact File IDs for the best footage from the Data Vault.
Script Generation: GPT-4o receives the Context (the selected file descriptions) and generates the JSON script.

C. The "Video Factory" (Rendering Engine)

Design Pattern: Serverless Media Processing

This is the core engineering challenge. We use Shotstack driven by code.

Audio Sync Engine (Python):
- We run a Python microservice using the librosa library.
- It analyzes the selected music track to extract Onsets (Beats) and Energy Levels.
- It mathematically calculates the cut points (e.g., Cut at 3.4s, 5.8s, 9.2s) to ensure the video is rhythmically satisfying.
The Assembly (JSON):
- The Logic Layer compiles a JSON payload containing: S3 URLs for video, HTML/CSS for text overlays (Reviews), and the Cut Points.
- This is sent to the Shotstack API for cloud rendering.

D. The Fallback Protocol (Nano Banana / Gemini)

Design Pattern: Conditional Logic

Confidence Check: If the Vector Search returns a match score < 0.3 (Missing Footage), the system flags a "Gap."
Generative Pipeline:
- Step 1: Call Nano Banana Pro (Gemini 3 Image) to generate a photorealistic static image.
- Step 2: Pipe the image to Runway/Luma API (Image-to-Video) to add 2 seconds of ambient motion (steam, light leaks, water movement).
- Step 3: Inject the resulting URL into the Shotstack timeline.

3. DATA FLOW DIAGRAMS

Pipeline 1: The "Smart Ingestion" (Building the Moat)

Goal: Turn raw files into searchable data without human tagging.
User Upload -> AWS S3 Bucket -> (Trigger) -> AWS Lambda (Computer Vision) -> Extract Tags -> Store in Pinecone (Vector DB)

Pipeline 2: The "Review-to-Revenue" Engine

Goal: Autonomous Video Creation.

Trigger: Google Maps Webhook (New 5-Star Review).
Analysis: N8N parses text -> GPT-4o extracts Keywords & Sentiment.
Retrieval: Pinecone searches for matching Video Assets + Cross-Pollination Partner Assets.
Fallback: If asset missing -> Trigger Generative AI Pipeline.
Logic: Python calculates Music Beat Sync.
Assembly: N8N constructs Shotstack JSON.
Render: Shotstack processes 4K video -> Saves to s3://rendered-output.
Notify: User receives SMS/Email with Approval Link.

Pipeline 3: Cross-Pollination Logic (Relational Filtering)

Goal: Promoting the right partner.
We query the PostgreSQL relational database before the Vector search.
Query: Partner Business
WHERE distance < 2 miles
AND category = 'Restaurant'
AND tier != 'Budget' (If Client is Luxury)
RETURN Top 3 Partner IDs.

4. DATABASE SCHEMA DESIGN

We utilize a Dual-Database Strategy:

A. Relational DB (PostgreSQL) - Structured Data

Table: Users (Auth, Credits, Role)
Table: Businesses (Name, Location, Brand_Tier, Google_Place_ID)
Table: Assets (S3_URL, File_Type, Owner_ID, License_Status)
Table: Relations (Which businesses are allowed to cross-pollinate)

B. Vector DB (Pinecone) - Unstructured Intelligence

Index: Brand_Voice (Embeddings of previous successful captions/scripts)
Index: Asset_Library (Embeddings of visual content descriptions)

5. SECURITY & INFRASTRUCTURE

Multi-Tenancy (Data Isolation)

To address your concern about confidentiality, we implement Row Level Security (RLS) in the database.

Rule: A client can only query assets where Owner_ID == Self OR Asset_Type == Public_B_Roll.
Logic: The API enforces this filter on every request. Even if the AI wants to use a competitor's clip, the database layer will block access before it happens.

Scalability

Statelessness: Our N8N and Python workers are stateless.
Load Handling: If 500 reviews come in at once, they are added to a Message Queue (AWS SQS). The rendering engine processes them in order (FIFO), ensuring the server never crashes, no matter the volume.

Here is the architecture diagram: https://necessary-ivory-byr5lmsdsi.edgeone.dev/

Summary

This architecture moves Viral Vacation from a "Tool" to a "Platform." By leveraging S3 for heavy storage, Vector DBs for intelligence, and Shotstack for rendering, we create a system that is robust, secure, and infinitely scalable.

Chandan & Tushar