Volver al Blog

How to Transcribe an Interview (Step-by-Step Guide)

How to Transcribe an Interview (Step-by-Step Guide)

A good interview transcript is the difference between a quote you can use and a recording you'll never relisten to. But interviews are also the hardest thing to transcribe well: there are multiple speakers, people talk over each other, and a single mislabeled line can put words in the wrong person's mouth.

Here's how to do it properly — fast, accurate, and with the speakers correctly separated.

Step 1: Record for the transcript, not just the conversation

Accuracy is decided before you transcribe anything. A few minutes of setup saves hours of cleanup:

  • Get the mic close to each speaker. Distance is the #1 enemy of accuracy. A lapel mic or a phone near each person beats one recorder in the middle of the table.
  • Record in a quiet room. Background noise, music, and echo all degrade results.
  • Ask one person to speak at a time where you can. Crosstalk is the hardest thing for any transcription tool — human or AI — to get right.
  • Save a lossless or high-bitrate file (WAV, M4A, or a good MP3). Heavily compressed audio loses detail the model needs.

Step 2: Choose your method

There are three honest options, and the right one depends on your accuracy needs and budget:

MethodSpeedCostBest when
AI transcription toolMinutesFree–$20/moYou want accurate text fast and will do a quick edit pass
Human transcription serviceHours–days~$1.25–$2/minThe transcript is legal, published, or accuracy-critical
Manual typing4–6 hrs per hourYour timeTiny clips, or you need to relive every detail

For most people — journalists, researchers, podcasters, students — an AI tool plus a short edit hits the sweet spot. Human services (like Rev, ~$1.25/min) are worth it when an error could cost you. Manual typing is rarely the best use of your time anymore.

Step 3: Upload and transcribe

With an AI tool, the workflow is simple:

  1. Upload your recording (audio or video — MP3, M4A, WAV, MP4, MOV).
  2. The tool transcribes it in a few minutes and automatically separates speakers into "Speaker 1," "Speaker 2," etc.
  3. You get back a full, time-stamped transcript you can read, search, and edit.

If your interview is on video, choose a tool that keeps the video and plays it back next to the transcript — being able to click a line and see that moment on screen makes editing and quoting far easier. (Many tools strip the video and keep only the audio.)

You can try this on a real clip, with no signup, on our interview transcription or audio to text tools.

Step 4: Fix the speaker labels (the step everyone skips)

This is what separates a usable interview transcript from a frustrating one. Automatic diarization gets you 80–95% of the way, but it makes two kinds of mistakes:

  1. It labels a speaker generically. You'll want to change every "Speaker 1" to a real name.
  2. It misattributes lines — especially when people overlap, a few words get assigned to the wrong person.

So choose a tool that lets you fix both. You want to be able to:

  • Rename a speaker everywhere at once (every "Speaker 1" becomes "Maria"), and
  • Reassign an individual line — or even individual words — to the correct speaker when diarization crosses them up.

That last capability matters more than it sounds: in a real interview, the moments people talk over each other are often the most quotable, and they're exactly where labels get scrambled. (This is one area where AudioScribe goes further than most — you can re-assign at the word level — whereas tools like Otter and TurboScribe only let you rename a label, not move a misattributed segment.)

Step 5: Edit and clean up

A quick pass turns a raw transcript into a finished one:

  • Read along with the audio and fix any words the model misheard (names, jargon, and acronyms are the usual culprits).
  • Decide on verbatim vs clean. For research, keep the "ums" and false starts. For publication, remove them.
  • Add light formatting — paragraph breaks at topic changes make the transcript scannable.

If your tool has an AI summary, generate one — a short list of key points is invaluable for finding the moments worth quoting in a long interview.

Step 6: Export and use it

Export to the format you need — plain text for quoting, subtitles (SRT/VTT) for video, or a document for your records. Then put it to work: search it for a specific quote, pull the highlights, or feed it into your article, paper, or show notes.

A realistic time estimate

  • Recording: however long the interview is.
  • AI transcription: a few minutes.
  • Speaker fixes + edit pass: 15–40 minutes for a one-hour interview, depending on audio quality.

Compared to 4–6 hours of manual typing, that's the whole reason AI transcription took over.

Frequently asked questions

What is the fastest way to transcribe an interview? Upload it to an AI transcription tool — it transcribes a one-hour interview in minutes, separates speakers, and lets you fix any mislabeled lines.

How do I transcribe an interview for free? Use a free tool or free tier. You can transcribe a clip with no signup on a free interview transcription tool, and free accounts (3 files/day) cover most short interviews.

How do I separate the speakers in an interview transcript? Use a tool with speaker diarization, and pick one that lets you both rename a speaker everywhere AND reassign individual lines or words to the correct speaker.

How accurate is AI interview transcription? Very accurate on clear, two-person audio; less so with noise, crosstalk, or distant mics. Recording close to each speaker is the biggest factor.