Introduction to Digital Audio
In order to understand how digital audio works it's helpful to know a bit about how sound itself works. What is sound? We all know when we hear sound, but how do we qualify it?
Last updated: Tuesday, 13 October 2015
You light a firecracker. Bang! Let's look closer... As the firecracker explodes the surrounding air is pushed away. Because air has mass and elasticity (as we know from experience when we inflate a tire) the air resists and is compressed. The compressed air then re-expands, pushing against the air around it. This neighbouring air is compressed in turn, creating a 'shell' of high pressure slightly further away from the disturbing source, the firecracker. The expansion of this shell causes yet another shell to form, and so on.
As successive shells of air compress and expand waves move outwards in all directions. (Throw a pebble into a pool of water and watch the
- it's the same effect, more or less.) Note that the actual particles that make up the air do not flow along with the waves, like water in a river. Instead the particles move slightly out and then back again as the waves pass by. Air is the medium through which the waves are transmitted.
When the waves arrive at your inner ear a fraction of a second after the explosion they cause specialized hairs there to vibrate. This is interpreted by your brain as sound. 'Sound waves' are really just nested regions of compressed and rarefied air.
Now we have a rough idea of what's happening when we hear a sound, we can begin to make sense of what audio experts cryptically refer to as 'waveform' diagrams. Here's what a waveform looks like:
If you've played around with the audio-edit window in either Band-in-a-Box® or PowerTracks Pro Audio it's likely you've seen one of these before. Waveforms aren't as intimidating as they look. In essence, a waveform is a graph that charts minute changes in air pressure as sound waves propagate. The y axis represents air pressure and the x axis represents time.
Let's chart some of the sound waves that are created by the exploding firecracker...
A sound wave arrives. The air pressure goes up (remember that we can think of a sound wave as an expanding region of compressed air) and the green line on the waveform rises. As the sound wave passes the air pressure falls again and the green line falls as well.
These changes in pressure happen very quickly - thousands of times a second. Waveform diagrams representing even a few seconds of sound are, consequently, very big. This is the reason that waveforms of audio recordings often look complicated and squiggly when you view them on your computer. Chances are you are "zoomed-out" in order to view the entire waveform at once.
There are several technical terms used to describe waveforms that you should know. They'll come in useful when we get to discussing digital audio...
Zero Line: The horizontal line running through the middle of the graph is called the zero line. It represents the rest state, when there is no compression or rarefaction.
Amplitude: Amplitude is the amount of compression or rarefaction at any point on the waveform. Graphically, it is the distance above or below the zero line. In general, the greater the amplitude, the louder the volume.
Cycle: A cycle is the amount of time it takes for the amplitude of the waveform to return to the same level.
Frequency: The frequency of a sound is the number of cycles that happen every second. The higher the frequency, the higher the perceived pitch of a sound. Most people can hear sounds that have frequencies of between 20 cycles a second and 16,000 cycles a second, or 20 Hz and 16,000 Hz. (In case you're interested, the average dog can hear frequencies as high as 45,000 Hz. Cats can hear up to 63,000 Hz. And the beluga whale can hear frequencies of up to 123,000 Hz.)
Analog representations of sound...
The changes in air pressure that are caused by sound waves happen 'smoothly.' By smooth I mean that in the process of moving from a state of low pressure to a state of high pressure, every state of pressure in-between exists, however briefly.
Analog recordings of sound store all of these smooth changes, making them "analogous" to the waveforms of the sounds they represent. Records, for example, have grooves cut into them. The depth of these grooves changes smoothly up and down in direct correspondence with the waveform of the recorded music. Cassette tapes work along the same principle, except that they use magnetic flux instead of physical grooves to represent the waveform.
Digital representations of sound...
Digitally recorded sound is different from analog sound because of the way that digital computers work. Digital computers can only understand two values, on and off. Smooth changes in information cannot be stored on a digital computer.
Let's get back to our firecracker. Watch what happens as a digital computer equipped with a microphone records the explosion...
Bang! Sound waves ripple past the microphone, causing a diaphragm inside to vibrate just like the hairs in your inner ear. The vibrating diaphragm creates a change in voltage in the wire that runs from the microphone to the computer. This fluctuating voltage is an analog representation of the sound, for it changes smoothly from one amplitude to the next, encompassing all values in-between.
Inside of our computer there is a special device called an Analog to Digital Convertor (usually a part of the sound card). At regular intervals the ADC measures the microphone's analog signal and outputs a number representing the amplitude of the signal at that precise instant. This is called a "sample." Before long there are a huge number of samples all arranged in chronological order - a kind of "spot-map" of the original waveform. Voila. Digital audio.
Digital audio samples, unlike analog information, can be saved in a computer file. When the time comes to play the recording back, the samples are sent to a Digital to Analog Convertor (again usually a part of the sound card). The DAC "connects the dots," creating a smoothly changing signal between the samples. The result is an approximation of the original waveform.
Let's take a close look at a segment of the original waveform and the same segment after the DAC has reconstructed it from digital information:
Uh oh. Clearly these waveforms are different. Shouldn't they also sound different? How is it that a digital recording can sound identical to an original?
A closer look at digital recording...
The human ear is marvelously sensitive. Nevertheless, it has limits. If you take very accurate samples of a sound often enough, there is a point where the ear can no longer tell the difference between an original and a recording.
You may have heard the phrase "CD quality audio" before. CD audio is digital audio encoded at a bit depth a 16 and a sample rate of 44.1 kHz. Few (if any) can tell the difference between a CD quality recording and an original, so obviously these numbers mean that CD audio is extremely good. But what exactly do the numbers mean, and how do they relate to waveforms?
Let's start with the sample rate, 44.1 kHz.
44.1 kHz means that for every second of audio 44,100 unique samples are measured. For reasons I'm not going to get into, this also means that it is possible to accurately reproduce frequencies of up to 22 kHz. Higher frequencies are lost. However, for practical purposes this doesn't matter. Even the most sensitive of us cannot hear frequencies much higher than 16 kHz. Yes, CD quality sound is missing information, but only information outside the range of our hearing.
Now let's discuss the bit depth. What's a bit? I mentioned earlier that digital computers can only understand things in terms of on and off. The bit is the fundamental unit of information that digital computers work with. A single bit can have two values: 1 and 0, on and off.
If you start combining bits, you can express more values. For example, if you put two bits together, you can have four values because there are four possible combinations:
If you put three bits together you can have eight values because there are eight possible combinations:
0 0 0
0 0 1
0 1 0
1 0 1
0 1 1
1 0 1
1 1 0
1 1 1
Every time you add a bit you double the number of values you can express. Four bits can have 16 values. Five bits can have 32 values. Six bits can have 64 values. And so on.
In digital recording the number of samples taken per second is only one part of the waveform picture. You also have to distinguish, with a fair degree of precision, between sample amplitudes. The more bits you use to represent a single sample, the more accurate that sample will be. Let's say, for instance, that you record a sound using a bit depth of 2. This means that you can only use two bits to represent each sample, which means that each sample can only be one of four amplitudes.
As you can see by the above diagram, a bit depth of 2 does not provide a very accurate representation of the original sound. (In fact a sound digitized at a resolution of 2 bits would be unintelligible - regardless of how many samples you took.)
CD audio is recorded at a resolution of 16 bits. 16 bits per sample translates into 65,336 unique possible amplitudes per sample - enough amplitude accuracy to (when combined with a high sample rate) generate a nearly flawless reproduction.
Other sample rates and bit depths...
If you're just getting started recording and editing digital audio with a program like PowerTracks Pro Audio or Band-in-a-Box®, it's probably best to leave the program settings at 16 bit 44Khz. However, you may notice in PowerTracks that other, higher quality sampling rates are supported, like 24 bit 96 kHz audio.
If CD quality audio already sounds perfect to our ears, why should anyone bother to record even higher quality audio? Isn't that overkill?
The full answer is in-depth and complicated, beyond the scope of this article. (Look for a future article, coming soon!) Here's the short answer: CD quality audio, though very good, has small errors. If you are doing a substantial amount of waveform editing, like mixing multiple waveforms together or applying heavy effects, these small errors can get bigger and become noticeable as a background hiss or as distortion. Ultra high quality audio also has errors, but less so than CD quality audio. To achieve the highest quality sound possible, it is best to record, edit and mix your audio at ultra high levels, and then to downsample to 16 bit 44 kHz as the final step.
Note: Higher quality audio rates are supported by PowerTracks Pro Audio, but depending on your hardware you may also need a special sound card.
Today digital audio tends to be used more than analog audio as a method of storage. Records and cassette tapes continue to be used, but by a relatively small market. (Compare the prevalence of records 20 years ago to today.) Why is digital audio more prevalent? While there is some debate as to whether or not digital audio actually sounds better than analaog audio, digital audio is certainly easier to reproduce and to manipulate without loss of quality.
Because of digital audio, it is much easier for both amateur and professional musicians today to produce studio-quality music.
PG Music Inc.
29 Cadillac Ave Victoria BC Canada V8Z 1T3
Sales: 1-800-268-6272, 250-475-2874,
+ 800-4746-8742 *
Support: 1-866-983-2474, 250-475-2708,
+ 800-4746-8742 *
* Outside USA & Canada where International Freephone service is available. Access Codes