top of page
  • rbis55



Direct Stream Digital (DSD) has become the big thing in high-end audio. Simplified encoding and decoding, along with ultra-high sampling frequencies, promise unparalleled performance. Is this what we’ve been waiting for, or just mass-marketing hype? This blog separates the hype from the technical facts. I’ll explain in what ways DSD has the advantage, and in what ways pulse-code modulation (PCM) is better.

If you're not sure if you should believe the statements in this blog that contradict much of the marketing hype, myth, and legend in the audiophile industry, feel free to check the references at the end of this blog that were written by recording engineers, such as Dan Lavry, and companies that manufacture electronics used in recording studios, such as Antelope Audio.

If you don’t want a history lesson and don’t want to wade through a lot of technical data, you may want to skip to the summary, where I hit all the major points. You also may want to refer to my other blog on “The 24-Bit Delusion.”

A Brief History:

In 1857, Édouard-Léon Scott de Martinville invented the phonautograph, which could graphically record sound waves. In early 1877, Charles Cros devised a way to reverse that process on a photoengraving to form a groove that could be traced by a stylus, causing vibrations that could be passed on to a diaphragm, recreating sound waves.

In late 1877, Thomas Edison used Cros’ theories to invent the cylinder phonograph, allowing music lovers to experience recorded music in their homes for the first time. Can you imagine a modern cylinder phonograph? Tangential tracking…no arc error…no skating error. The concept was flawless.

In 1887, Emile Berliner invented the technically inferior disk phonograph. Since disks are much cheaper to produce, fit nicely in display bins at stores, and can include larger cover art and notes, they became the standard. And so began the long history of the recorded music industry being more about consumer convenience and optimal profits than about optimal fidelity.

The digital revolution was no different. Philips and Sony collaborated on the new standard for a consumer digital format in 1979. Philips wanted a 20 cm disk, but Sony insisted on a 12 cm disk that could be played in a smaller portable device. In 1980, they published the Red Book CD-DA standard, and mass-market digital music was born. Many in the recording industry in the early days of digital joked that CD stood for “compromised disk.”

In the early 1980s, when digital recording became readily available, studios converted from analog to digital to save money. For studios, this cost less for the equipment, required less space for both recording and archiving, and made it easier to mix and edit tracks in post-production. For consumers, there weren't many advantages. Most of the early digital recordings were produced with relatively low resolution and sounded so fatiguing they would make you want to tear your ears off.

The switch from PCM to DSD was no different. In the early 1990s, Sony wanted a future-proof, less expensive medium to archive their analog masters. In 1995, they concluded that storing a 1-bit signal directly from analog-to-digital would allow them to output to any conceivable consumer digital format (LOL...later I'll explain how Sony screwed the pooch on this decision). This new 1-bit technology was achieved by outputting from the monitoring pin on Crystal’s new 1-bit 2.8Mhz Bit Stream DAC chip.

Later, Sony’s consumer division caught wind of DSD and collaborated with Philips to create the SACD format. Of course, from the time the SACD was conceived until the time it came to market, DAC chip manufacturers had advanced from 64fs to a higher 128fs sampling rate (aka Double-Rate DSD) and from 1-bit to a higher-resolution 5-bit format. If the SACD format was DSD128 instead of DSD64 and 5-bits instead of 1-bit it would have made a huge difference in performance. Oops.

Long before the DVD, SACD, or DSD formats were developed, the Bit Stream DAC chip was introduced to the consumer market as a lower-cost alternative to the significantly more expensive R-2R multi-bit DAC chip. Bit Stream DAC chips have built-in algorithms to convert PCM input to DSD, which is then converted to analog. Once again, the result was a huge cost saving at the expense of fidelity.

It was in part Bit Stream DAC technology that allowed the development of our modern 7.1 channel audio that’s embedded into video formats. This also allowed electronics manufacturers to market DVD players in small chassis with cheap power supplies that could retail for under $70. Once again, the audio purist never stood a chance.

In contrast, not only do multi-bit R-2R DAC chips cost significantly more to manufacture than single-bit DAC chips, but they also require much larger and more sophisticated power supplies. If you were to make a 7.1 channel R-2R CD/DVD/SACD player, it would cost several times the price of Bit Stream technology, and it would be several times the size. Certainly not what the average consumer is looking for.

To sum things up, the recorded music industry has made decision after decision to maximize profits and mass consumer appeal at the expense of the audio purist. History lesson over.

DSD vs. PCM Technology:

PCM recordings are commercially available in 16-bit or 24-bit and in several sampling rates from 44.1KHz to 192KHz. The most common format is the Red Book CD with 16-bits sampled at 44.1KHz. DSD recordings are commercially available in 1-bit with a sample rate of 2.8224MHz. This format is used for SACD and is also known as DSD64.

There are more modern, higher-resolution DSD formats, such as DSD128, DSD256, and DSD512, which I will explain later. These formats were created for recording studios and comprise only a very small portion of the recordings that are commercially available.

Though you can’t make a direct comparison between the resolution of DSD and PCM, various experts have tried. One estimate is that a 1-bit 2.8224MHz DSD64 SACD has similar resolution to a 20-bit 96KHz PCM. Another estimate is that a 1-bit 2.8224MHz DSD64 SACD is equal to 20-bit 141.12KHz PCM or 24-bit 117.6KHz PCM.

In other words a DSD64 SACD has higher resolution than a 16-bit 44.1KHz Red Book CD, roughly the same resolution as 24-bit 96KHz PCM recording, and not as much resolution as a 24-bit 192KHz PCM recording.

Both DSD and PCM are “quantized,” meaning numeric values are set to approximate the analog signal. Both DSD and PCM have quantization errors. Both DSD and PCM have linearity errors. Both DSD and PCM have quantization noise that requires filtering. In other words, neither one is perfect.

PCM encodes the amplitude of the analog signal sampled at uniform intervals (sort of like graph paper), and each sample is quantized to the nearest value within a range of digital steps. The range of steps is based on the bit depth of the recording. A 16-bit recording has 65,536 steps, a 20-bit recording has 1,048,576 steps, and a 24-bit recording has 16,777,216 steps.

The more bits and/or the higher the sampling rate, the higher the resolution. That translates to a 20-bit 96KHz recording having roughly 33 times the resolution of a 16-bit 44.1KHz recording. No small difference. So why is it that a 24-bit 96KHz recording only sounds slightly better than a 16-bit 44.1KHz Red Book CD? I'll answer that later in the blog.

DSD encodes music using pulse-density modulation, a sequence of single-bit values at a sampling rate of 2.8224MHz. This translates to 64 times the Red Book CD sampling rate of 44.1KHz, but at only one 32,768th of its 16-bit resolution.

In the above graphical representation of PCM as a dual axis quantization, and DSD as a single axis quantization, you can see why the accuracy of DSD reproduction is so much more dependent on the accuracy of the clock than PCM. Of course, the accuracy of the voltage of each bit is just as important in DSD as PCM, so the regulation of the reference voltage is equally important in both types of converters. Of course the accuracy of the clocking during the recording process that is done at several times the resolution of commercial DSD64 SACD and 24-bit 192 PCM recordings is significantly more important than the accuracy of the clocking of either DSD or PCM during playback.

There are other DSD formats that use higher sampling rates, such as DSD128 (aka Double-Rate DSD), with a sampling rate of 5.6448MHz; DSD256 (aka Quad-Rate DSD), with a sampling rate of 11.2896MHz; and DSD512 (aka Octuple-Rate DSD), with a sampling rate of 22.5792MHz. All of these higher-resolution DSD formats were intended for studio use as opposed to consumer use, though there are some obscure companies selling recordings in these formats.

Note that Double, Quad, and Octuple DSD have both the potential for a 44.1KHz multiple and a 48KHz multiple sample rate for 100% equal division down to DSD64 SACD and 44.1KHz Red Book (44.1KHz multiple) or 96KHz and 192KHz High-Definition PCM formats (48KHz multiple).

Of course when studios convert a 48KHz multiple format to a 44.1KHz multiple format or visa versa they introduce quantization errors. Sadly this is often the case with older recordings when they are released in a remastered 24-bit 192KHz HD version derived from DSD64 masters, such as the ones Sony and other companies used to archive their analog masters in the mid-90's. Note that the optimal HD PCM format that can be created from a DSD64 master would be 24-bit 88.2KHz. Any sampling rate over 88.2KHz or that is equally divisible by 48KHz has to be interpolated (not good). But consumers demand 24-bit 192KHz versions of all their old favorites, so companies provide them, despite the known consequences.

The Problems:

There are three major areas where both PCM and DSD fall short of perfection: quantization errors, quantization noise, and non-linearity.

Quantization errors can occur in several ways. One way that was most common in the early days of digital recording had to do with the resolution being too low. Think of the intersection points on a piece of graph paper. You can’t quantize to a fraction of a bit, and you can’t quantize to a fraction of a sampling rate. You can only quantize to a value that falls on the intersection points of bit-depth and sampling rate. When the value of the analog signal falls between two quantization values, the digital recording ends up recreating the sound lower or higher in volume and/or slower or faster in frequency, distorting the time, tune, and intensity of the original music. Often this creates unnatural, odd harmonics that result in the hard, fatiguing sound associated with early digital recordings. Note on the graphic below that the solid blue line represents the actual music wave and the black dots represent the closest quantization values.

Though modern sampling rates are high enough to fool the human ear, quantization errors still occur when translating from one format to another. For example, when Sony decided to archive their analog master libraries to DSD64 back in 1995, they were wrong to believe that these masters would be future-proof and able to reproduce any consumer format. The fact is, these masters could only properly reproduce a format that was divisible by 44.1KHz. So any modern 96KHz or 192KHz recording created from DSD64 master files have quantization errors.

This leads me to one of the many things that enrage me about the recorded entertainment industry. If 44.1KHz was the standard that was engineered to put aliasing errors in less critical audio frequencies, then why did they start using multiples of 48KHz?!?!?!? All they had to do was go with 88.2KHz and 176.4KHz as the modern HD consumer formats, and all of this mess could have been avoided. They made DXD, a 24-bit 352.8KHz studio format, equally divisible by 44.1KHz. What blithering idiot decided to put a wrench in the works with 96KHz and 192KHz HD audio?!?!?!?

The actual reason for the 48KHz multiple has to do with optimal synchronizing to video. So it makes sense to have sound tracks from movies recorded in a 48KHz multiple, such as the 24-bit 96KHz format embedded into 7.1 channel audio on DVDs and Blu-Rays. But since over 90% of all music recordings are sold in a 44.1KHz for Red Book CD or DSD64 SACD it is rather ridiculous to offer any HD music in 96KHz or 192KHz as opposed to the optimal 88.2KHz and 176.4KHz HD formats. But because ignorant consumers demand 192Khz falsely believing it is better than 176.4KHz, that is what record companies market.

Quantization noise is unavoidable. No matter what format you digitize in, ultrasonic artifacts are created. The more bits you have, the lower the noise floor. Noise floor is lowered by roughly 6db for each bit. So as you can imagine, 1-bit DSD has significantly more ultrasonic noise than even 16-bit PCM. With PCM, you have to deal with significant noise at the sampling frequency. This is why Sony and Philips engineered the Red Book CD to sample at 44.1KHz, which is over twice the human high-frequency hearing limit of 20KHz.

Since quantization noise is present around the sampling frequency of a PCM recording, a 44.1KHz recording has quantization noise one octave above the human hearing limit of 20KHz. This quantization noise needs to be filtered out, so all DACs have a low-pass filter at the output. Because the quantization noise is only one octave above audibility the filters used have to have a very steep slope so as to not filter out the desirable high frequencies. These steeply sloped low-pass digital filters are commonly known as "brick wall" filters.

Though you hear a lot about "brick wall" filters in the top end of early Red Book CD players causing an audible distortion, the fact is that was not the reason for the unnatural sounding top end. Most of the hard, harsh, unnatural sounding high frequencies in early digital recordings has more to do with flaws in the power supplies and flaws in the recording process, not "brick wall" filters. Sorry to be the one to burst your bubble, but despite what many audiophiles may believe, less than one person a thousand can actually hear anything above 20KHz as a child, and there are almost no people over the age of 40 that can hear much above 15KHz.

Of course DSD64 is another story: above 25KHz the quantization noise rises sharply, requiring far more sophisticated filters and/or noise-shaping algorithms. When you filter the output of DSD64 with a simple low-pass filter, the result is distorted phase/time and some rather nasty artifacts in the audible range. The solution is noise-shaping algorithms that move the noise to less audible frequencies and/or higher sampling rates. This is why DSD128 (aka Double-Rate DSD) and DSD256 (aka Quad-Rate DSD) formats came into being. This is also why advanced player software, such as JRiver, offers Double-Rate DSD output. Using player software that upsamples DSD64 to DSD128 or DSD256 significantly improves performance by putting the digital artifacts octaves above audibility allowing more advanced noise-shaping algorithms and less severe digital filters. Note these extremely high sampling frequencies are why ultra accurate clocking is more important in DSD vs. PCM recordings.

Jitter is defined as inconsistencies in playback frequency caused by inaccurate clocking. The result is observable as distortion of the time and tune of the music. Often the pattern of the inconsistency of frequency can result in an analog wave form that has an unnatural, odd harmonic frequency. This results in the fatiguing character commonly known as “digititis.” Note in the two graphs below: jitter is an inconsistency in the horizontal time axis and non-linearity is an inconsistency in the vertical amplitude axis. Note that some would consider inconsistencies in either axis to be considered non-linearity.