How we turn a raw recording into a polished podcast

When we receive a recording that needs editing and enhancing, it’s usually an MP3 file. The first thing we do is open it in Adobe Audition and convert it to a WAV file so that as we periodically save it, we’re saving it as an uncompressed file. In other words, we’re not throwing away vital data by continuing to save it as an MP3 file. We need it to have enough information to “manipulate” (i.e. enhance) so that it sounds its very best when we’re finished. Next, we adjust the volume to have it sound as loud as possible without sounding distorted or unnatural. We continuously adjust the volume as we work on the recording, so volume adjustment is an ongoing procedure.

After that first volume adjustment in Audition, we open it in SpectraLayers and remove low frequency rumble (usually 80 Hz and below). If there are obvious plosives (popped P’s) that we can see on the spectral display (they look like little downward spikes usually below the 100 Hz line), we erase those with the eraser tool. We save the file and move on to the next and most important software we use.

We open the recording in iZotope RX 8 (the current version at this posting) and start the meticulous task of cleaning and editing the recording. We start by sampling the room noise (when no one is speaking) using the Spectral De-Noise tool. We’re careful that we don’t reduce it more than 12 dB because we want the people speaking to sound natural. Over manipulation of a recording can give it a robotic sound.

When we’re satisfied that we have reduced the background room noise sufficiently, we start at the very first word spoken in the recording and move word by word through the recording, removing clicks, breath noise, thumps, cars passing, computer alerts (like incoming email), and verbal flubs. So, basically all sounds other than voice are removed or reduced in the recording. Sometimes we rearrange words so the person speaking sounds coherent. For example, if a person forgot to add an S to a word meant to be plural, we copy an S from elsewhere in the recording and paste it to the end of the mispronounced word. Our job is to make the people talking sound their very best. Just a quick note here: we don’t rearrange words in an old recording (for example, someone’s grandmother speaking) because we are trying to preserve authenticity.

After we’ve completed our combing over a recording and feel confident we’ve edited and enhanced it the best it can sound, we then mix in an intro and outro if one is available. By the way, Audiobag also creates intros and outros. We then save the final presentation in the format requested by the customer. We give the customer options to choose from on our Script and Instructions form (which the customer filled out when making a purchase from us). We explain on the form that a 320 kbps MP3 files is the highest MP3 quality they can get, but a 128 kbps MP3 file will download and stream more quickly and smoothly. Of course, the customer can also choose to have the finished presentation delivered to them as an uncompressed WAV file and then convert it to whatever format(s) they want.

Many people don’t realize that a 45-minute recording can take 3 or 4 days to edit and enhance. You can find faster turnaround time from other editing companies, but it probably won’t be the cleanest it can be. Like a fine wine, great editing takes time. If you’d like to learn more about Audiobag’s editing service, visit our Editing and Enhancing web page, where you can also hear samples of our work and place an order.