project live + expanding

Spoken Word
Archive

3,800+ lectures, morning walks, room conversations, and interviews — with full transcripts, audio playback, and upcoming AI-powered word-level sync. Hear and read Srila Prabhupada simultaneously.

3,800+
recordings
1966-77
twelve years
100%
transcribed
SRT
timecoded (planned)

What We Have Today

complete

Full Transcripts

Every lecture transcribed from the original VedaBase 2025 export. Speaker markers, Sanskrit terms in italic, scripture references linked. Meticulously cleaned up over years — still a work in progress.

complete

Audio Files

MP3 audio for every recording hosted at media.prabhupada.io. Playable inline with the transcript. Speed control, position memory, chapter-based audiobook queues.

complete

Structured Metadata

Every lecture tagged: date, location, type (BG class, SB class, morning walk, room conversation, initiation, arrival address), speaker markers, scripture references.

in progress

Formatting Quality

21-test formatting suite validating markdown structure. Fixing italic pairing, diacritical marks, speaker markers, wiki link formatting. 3,800 files, thousands of issues being resolved systematically.

The Audio Sync Pipeline

The next major step: matching every word in the transcript to its exact moment in the audio. This enables highlighted-as-spoken reading, quotable audio clips, and searchable audio.

1

AI Transcription Pass

Run each audio file through a speech-to-text model (Whisper or Gemini) that produces a timestamped transcript. This gives us word-level timecodes — but the text won't match our cleaned transcripts perfectly.

Output: raw AI transcript with timestamps per word/phrase

2

Alignment Against Verified Transcript

Our existing transcripts (from VedaBase, manually cleaned) are the source of truth for text. The AI transcript is the source of truth for timing. We align the two using sequence matching — transferring timestamps from the AI output onto our verified text.

This is the critical step: the AI might hear "Krishna" where our transcript has "Krsna" — the alignment handles these mismatches.

3

SRT / Subtitle Generation

Produce standard SRT subtitle files for every lecture. Each subtitle entry maps a passage of text to a time range. These are universal — usable in any media player, embeddable in web players, parseable by apps.

1
00:01:12,400 --> 00:01:18,200
So this Krishna consciousness movement
is not a sentimental movement.
2
00:01:18,200 --> 00:01:24,800
It is the most scientific movement
for the benefit of the whole human society.
4

Quality Review

Spot-check alignment accuracy. Flag sections where audio quality is poor (early recordings, background noise, multiple speakers talking over each other). Mark confidence levels per segment. Human review for flagged sections.

5

Integration

Once we have SRT files, the possibilities open up:

Highlight-as-spoken

Current passage lights up as audio plays, like a karaoke for lectures

Quotable audio clips

Select a passage in the transcript, get a shareable audio clip of just that quote

Audio search

Search for a phrase, jump to the exact moment in the audio where it's spoken

Transcript verification

Compare AI hearing vs existing transcript to catch transcription errors

How We Got Here

Source

Transcripts exported from VedaBase 2025 — the authoritative source. Raw text with encoding issues, formatting inconsistencies, and legacy markup.

Cleanup

Years of meticulous cleanup: fixing character encoding, restoring Sanskrit diacriticals, structuring speaker markers, linking scripture references with wiki links, formatting stage directions.

Testing

Comprehensive 21-test formatting suite scanning all 3,800 files. Catches unpaired asterisks, broken italic spans, orphaned markers, misplaced speaker names. Thousands of issues identified and being resolved.

Next

AI-powered audio sync: Whisper/Gemini transcription, alignment against verified text, SRT generation, highlight-as-spoken integration in VaniReader and prabhupada.io.

What's in the Archive

Bhagavad-gita Classes
Verse-by-verse lectures on all 18 chapters
bg
Srimad-Bhagavatam Classes
Daily morning lectures on SB verses
sb
Caitanya-caritamrta Classes
Lectures on Lord Caitanya's pastimes
cc
Morning Walks
Informal discussions while walking
mw
Room Conversations
Meetings with guests, scholars, devotees
r1, r2...
Arrival Addresses, Initiations, Festivals
Special occasions and ceremonies
ar, in, mf

Browse the Archive

Every lecture is available now — with transcript and audio. Audio sync is the next step.

Browse Lectures at prabhupada.io