// a practical guide for podcasters, YouTubers, and voiceover artists
When you finish processing your audio in Calvio, you're presented with a choice: export as WAV or MP3. It seems like a simple technical decision, but getting it wrong can mean unnecessary quality loss, wasted storage, or a workflow that introduces cumulative degradation across every project.
This guide breaks down what each format actually does, how they sound in practice, and — most importantly — when you should use each one in a real content production workflow.
WAV (Waveform Audio File Format) is an uncompressed audio container developed by Microsoft and IBM. It stores raw PCM (Pulse-Code Modulation) data — every single audio sample, at full bit depth and sample rate, with absolutely nothing discarded.
A typical speech WAV file is 44.1 kHz sample rate at 16-bit depth. That translates to approximately 10 MB per minute of stereo audio — or about 5 MB per minute for mono. At 24-bit (used in professional studio recording), files run to around 15 MB per minute stereo.
Because no data is ever discarded, WAV is considered lossless. What you recorded is exactly what is stored, with no algorithmic approximation and no quality ceiling imposed by the container format.
MP3 (MPEG-1 Audio Layer III) is a lossy compressed audio format. It reduces file size by discarding audio information that psychoacoustic models predict the human ear is unlikely to notice. This process is permanent and irreversible — once information is discarded during encoding, it cannot be recovered.
At 128 kbps (the most common podcast setting), MP3 produces files of roughly 1 MB per minute — about a 10× reduction compared to WAV. At 192 kbps, quality improves significantly with files at around 1.5 MB per minute. At 320 kbps, files are approximately 2.4 MB per minute and are virtually indistinguishable from WAV for speech in most listening tests.
The most important concept to understand about lossy formats like MP3 is generation loss. Every time you encode audio to MP3, quality degrades — even if only slightly. The first encode from a clean WAV source is fine. But if you re-encode an existing MP3 to a new MP3, you're encoding already-degraded audio again, compounding the loss each time.
For speech content at reasonable bitrates (128 kbps+), a single generation of loss is generally imperceptible. But after two or three generations of re-encoding, artefacts accumulate: metallic resonance on sustained vowels, smeared transients at the start of consonants, and a kind of harsh, over-compressed quality on sibilants ("s", "sh", "ch" sounds).
Never re-encode MP3 to MP3. If you receive an MP3 file, need to edit it, and want to export again — Calvio will decode it to PCM, you make your edits, then encode to MP3 once at the final step. Never stack encoding passes.
For music — especially harmonically rich content like orchestral, jazz, or acoustic guitar — WAV vs MP3 differences become audible at lower bitrates and in double-blind tests with trained listeners. For speech — podcasts, voiceover, narration, educational content — the difference at 128 kbps and above is generally imperceptible in real-world listening conditions.
ABX blind listening tests (where listeners must identify which of two samples is the original) consistently show that most people cannot reliably distinguish 192 kbps MP3 from WAV for speech content. At 128 kbps, very trained ears may notice subtle differences in sibilants and room reverb tails — but this is the exception, not the rule.
The practical takeaway for most creators: your listeners cannot hear the difference between 128 kbps MP3 and WAV when consuming speech content on standard headphones or speakers. The format choice matters for your workflow, not for your audience's experience.
Speech-only podcast (mono): 96–128 kbps · Speech with music beds (stereo): 192 kbps · Maximum quality / archive: WAV. YouTube accepts anything — it re-encodes to AAC internally regardless.
Podcast editing workflow: Record in WAV → process silence removal in Calvio → export WAV → edit in DAW (Audacity, Logic, GarageBand) → export to MP3 128 kbps for hosting. This chain never involves re-encoding MP3 to MP3 at any step.
YouTube voiceover: Record in WAV → Calvio silence removal → export WAV → import into video editor → final video export handles audio encoding. The video editor's AAC or MP3 output is the only encoding pass the audio experiences.
Quick podcast distribution: Record in M4A on phone → process in Calvio → export MP3 128 kbps → upload directly to RSS/hosting. No DAW involved. This is one encoding pass and completely acceptable quality for podcasting.
Professional voiceover delivery: Record in WAV 24-bit → Calvio silence removal → export WAV → send to client or mastering engineer. Always deliver WAV to professionals who will do further processing.
Social media clips: Process in Calvio → export MP3 192 kbps → upload to Instagram, TikTok, Twitter. These platforms re-encode anyway, so MP3 is more than adequate as the source.
When you choose MP3 export in Calvio, the tool encodes your processed audio using lamejs — a JavaScript port of the widely respected LAME MP3 encoder, loaded from Cloudflare's CDN. This encoding happens entirely in your browser with no server involved whatsoever.
Calvio decodes your input file to raw PCM, processes silence removal on that PCM data, then re-encodes to MP3 using lamejs — all within your browser tab. LAME is the same engine used by Audacity, FFmpeg, and many professional tools for decades. Output quality is solid and well-established for speech content.
One practical note: MP3 encoding is more CPU-intensive than WAV export, which is essentially just writing the raw PCM data directly. For very long files (60+ minutes), encoding to MP3 may take a few extra seconds compared to WAV export. If you're in a hurry, WAV exports faster.
For completeness, here's where other formats fit in a creator's workflow:
// about WAV vs MP3 for audio export