Real-world workflows
Extracting Audio from Video: Best Format Choices
Quick answer
What extraction actually does
"Extracting audio from video" sounds like it should be straightforward — and for the most part it is. A video file like MP4, MOV, or MKV is a container that holds separate video and audio streams. Extraction means reading the audio stream and saving it separately, without the video.
What happens to the audio data during extraction depends on the output format you choose:
- →If the output format matches the source codec: the audio stream can be remuxed (copied directly without re-encoding). No quality loss. Extraction is fast. This is what happens when you extract an MP4's AAC audio to M4A — same codec, just a different container.
- →If the output format requires a different codec: the audio is decoded and re-encoded. There's a small quality cost. This is what happens when you extract to MP3 from an MP4 that contains AAC audio.
In either case, the quality is bounded by the source. If the video's audio track was compressed AAC at 128 kbps, extracting to WAV gives you a large WAV file containing 128 kbps AAC-quality audio. The WAV container doesn't add quality.
What audio codec is in your video
Most video files use one of these audio codecs inside:
| Container | Typical audio codec | Typical quality |
|---|---|---|
| MP4 (YouTube, iPhone, web) | AAC | 128–256 kbps |
| MOV (iPhone, Final Cut) | AAC or PCM | AAC 128 kbps or lossless |
| MKV (films, rips) | AAC, AC3, or DTS | Varies widely |
| WebM (browser video) | Opus or Vorbis | 128–192 kbps |
| AVI (older Windows format) | MP3 or PCM | Varies |
Most MP4 files — downloaded YouTube videos, phone recordings, screencasts, conference recordings — contain AAC audio at 128–192 kbps. That's the quality ceiling for everything you extract from them.
Choosing your output format
Once you've extracted the audio, the output format choice comes down to what you need to do with it:
MP3 — for sharing, uploading, or general use
The safe default. Small, plays everywhere, widely accepted by every platform. Choose 192 kbps for music-heavy content; 128 kbps is fine for speech. The re-encode from AAC to MP3 causes minimal quality loss at these bitrates. Use the MP4 to MP3 converter for this workflow.
WAV — for editing or software compatibility
Extract to WAV if you'll edit the audio in a DAW, video editor, or any tool that prefers lossless input. Remember: the WAV contains the same lossy audio as the source — it's not a quality upgrade. The benefit is avoiding further re-encoding during editing. Use the MP4 to WAV converter for this.
M4A — for Apple ecosystem or same-codec efficiency
If the video's audio was AAC (most MP4s), extracting to M4A keeps the same codec — just changing the container. No re-encoding, no quality loss. Use this if the file will live on an iPhone, Mac, or Apple ecosystem device. It's also slightly smaller than the equivalent MP3 at the same perceived quality.
FLAC — not usually worth it for video-extracted audio
FLAC is lossless, but if the source audio was compressed AAC, FLAC doesn't make it lossless — it just stores the AAC-quality audio in a lossless container. The file is much larger than MP3 for no audible benefit. The only reason to do this is if the video's audio track was genuinely lossless PCM (some professional MOV files), in which case FLAC extraction is a genuine lossless-to-lossless operation.
Special cases
YouTube videos
YouTube serves video with AAC audio at 128–192 kbps for standard streams. Downloaded YouTube MP4 files contain that AAC stream. Extracting to MP3 at 192 kbps gives you a small quality hit from the AAC-to-MP3 transcode, but the result is clean and universally compatible. For music, extracting to M4A (same codec, no transcode) is technically better — but you'll need a player that supports M4A.
Lecture recordings and interviews
If you're extracting audio from a Zoom recording, a lecture capture, or a video interview — MP3 at 128 kbps is plenty. Speech compresses well and these files are often long (1–2 hours), where file size matters. No need for anything higher than 128 kbps mono for pure speech.
Concert or live music recordings
If the source video had good audio — a well-recorded live concert, for example — use 320 kbps MP3 or M4A to preserve as much of the source quality as possible. You're still bounded by the source encoding, but you minimise the additional loss from the extraction transcode.
Converters
Related
Last updated: March 26, 2026