Guides

What Is Transcription? Meaning, Types & How It Works

Q: What are the main types of transcription?

The three most common types are verbatim (every word, filler, and false start captured exactly), clean or intelligent verbatim (a readable version with filler words and stutters removed), and edited transcription (lightly rewritten for clarity). In linguistics, 'phonetic transcription' is a separate meaning — writing out the exact sounds of speech using symbols like the IPA — which is different from transcribing a recording into plain text.

Q: What is the difference between transcription and translation?

Transcription keeps the same language — it turns spoken words into written words. Translation changes the language — it converts text or speech from one language into another. You often transcribe first and then translate the resulting text.

Q: Is transcription done by a person or by AI?

Both. Human transcription is highly accurate but slow and costly. AI transcription uses speech-to-text models to produce a transcript in seconds at a fraction of the cost, and modern tools also separate speakers and add timestamps automatically. Many workflows use AI for the first draft and a quick human review for anything critical.

Q: How accurate is AI transcription?

On clear audio with minimal background noise, modern AI transcription is often 90%+ accurate, and editable afterward. Accuracy drops with heavy accents, crosstalk, poor microphones, or background noise — which is why being able to read, edit, and fix speaker labels matters.

Sarah T

2026-06-255 mins read

What Is Transcription? Meaning, Types & How It Works

If you've ever needed the written version of a recorded conversation, you've run into transcription. Here's what the word actually means, the different forms it takes, and how modern tools do it in seconds.

The short answer

Transcription is the process of converting speech into written text. You start with spoken audio — an interview, a meeting, a lecture, a podcast, a voice memo — and you end with a written record of what was said, typically broken up by speaker so you can tell who said what.

That's it at its core: spoken words in, written words out, in the same language.

The main types of transcription

Not every transcript is made the same way. The right type depends on what you need the text for.

Verbatim transcription — every single word exactly as spoken, including "um," "uh," false starts, and repeated words. Useful for legal, research, or anything where precisely how something was said matters.
Clean (or "intelligent") verbatim — the same content, but tidied up: filler words, stutters, and obvious slips are removed so the transcript reads smoothly. This is what most people want for meetings, interviews, and content.
Edited transcription — lightly rewritten for grammar and clarity while keeping the meaning. Common for publishing quotes or turning a talk into an article.

There's also a different, more technical meaning worth knowing about: in linguistics, "phonetic transcription" means writing out the exact sounds of speech using a system like the International Phonetic Alphabet (IPA) — for example, writing the pronunciation of a word rather than the word itself. That's a separate discipline from transcribing a recording into readable text, even though it shares the same root word.

Transcription vs. translation

These two get mixed up constantly, but they're different jobs:

Transcription stays in the same language — spoken English becomes written English.
Translation changes the language — English text becomes Spanish text.

A common workflow is to transcribe a recording first, then translate the resulting transcript into another language.

How AI transcription works

Traditional transcription meant a person listening to a recording and typing it out — accurate, but slow and expensive (often an hour of typing for 15 minutes of audio).

Modern AI transcription uses speech-to-text models that:

Break the audio into sound units and predict the most likely words.
Add punctuation and capitalization automatically.
Separate speakers ("Speaker 1," "Speaker 2") so the transcript isn't one unbroken block.
Add timestamps so you can jump back to any moment in the recording.

The result is a transcript in seconds rather than hours — which you can then read, search, edit, and export.

What people use transcription for

Meetings & calls — a searchable record of decisions and action items.
Interviews & research — quotable, speaker-separated text instead of scrubbing audio.
Lectures & classes — study-ready notes from a recorded session.
Podcasts & videos — show notes, captions, and pull-quotes.
Accessibility — captions and transcripts that make audio and video usable by everyone.

Try it yourself

The fastest way to understand transcription is to make one. With AudioScribe's free transcript maker, you can upload any audio or video file and get a speaker-labeled transcript in seconds — no signup to start. From there you can read it, get an AI summary, and export it.

If you want to go deeper on the practical side, see our guide on how to transcribe audio.

Frequently asked questions

What is transcription? Transcription is the process of converting speech into written text. It takes a recording — an interview, meeting, lecture, or podcast — and produces a written, usually speaker-labeled record of what was said. It can be done by a person typing or automatically by AI speech-to-text.

What are the main types of transcription? Verbatim (every word and filler captured), clean or intelligent verbatim (filler removed for readability), and edited (lightly rewritten for clarity). In linguistics, "phonetic transcription" is a separate meaning — writing the exact sounds of speech using symbols like the IPA.

What is the difference between transcription and translation? Transcription keeps the same language (speech becomes written text); translation changes the language (text or speech from one language into another). You often transcribe first, then translate the result.

Is transcription done by a person or by AI? Both. Human transcription is accurate but slow and costly; AI transcription produces a transcript in seconds, separates speakers, and adds timestamps. Many workflows use AI for the first draft plus a quick human review.

How accurate is AI transcription? On clear audio with little background noise, modern AI transcription is often 90%+ accurate and fully editable. Accuracy drops with heavy accents, crosstalk, poor microphones, or background noise.