ActiveWin: Technical Frequently Asked Questions about MPEG Audio Layer-3

ActiveWin: MP3

Active Network | MP3 Intro | MP3 History | FAQ | Forums

DirectX

ActiveMac

Downloads

Forums

Interviews

News

MS Games & Hardware

Reviews

Support Center

Windows 2000

Windows Me

Windows Server 2003

Windows Vista

Windows XP

News Centers

Windows/Microsoft

DVD

Apple/Mac

Xbox

News Search

ActiveXBox

Xbox News

Box Shots

Inside The Xbox

Released Titles

Announced Titles

Screenshots/Videos

History Of The Xbox

Links

Forum

FAQ

Windows XP

Introduction

System Requirements

Home Features

Pro Features

Upgrade Checklists

History

FAQ

Links

TopTechTips

FAQ's

Windows Vista

Windows 98/98 SE

Windows 2000

Windows Me

Windows Server 2002

Windows "Whistler" XP

Windows CE

Internet Explorer 6

Internet Explorer 5

Xbox

Xbox 360

DirectX

DVD's

TopTechTips

Registry Tips

Windows 95/98

Windows 2000

Internet Explorer 5

Program Tips

Easter Eggs

Hardware

DVD

ActiveDVD

DVD News

DVD Forum

Glossary

Tips

Articles

Reviews

News Archive

Links

Drivers

Latest Reviews

Xbox/Games

Fallout 3

Applications

Windows Server 2008 R2

Windows 7

Hardware

iPod Touch 32GB

Latest Interviews

Steve Ballmer

Jim Allchin

Site News/Info

About This Site

Affiliates

Default Home Page

Link To Us

Links

News Archive

Site Search

Awards

Credits
©1997-2012, Active Network, Inc. All Rights Reserved.
Please click here for full terms of use and restrictions or read our Light Tower Privacy Statement.

Technical Frequently Asked Questions about MPEG Audio Layer-3
Copyright ©1999 Fraunhofer-Gesellschaft

Version 3.0

This page gives a comprehensive text about various topics related to audio compression. Because many foreign pages link to this page we decided to maintain it and recommend it as an comprehensive overview for off-line processing as printout. Please send questions and comments to amm_info@iis.fhg.de

Q: O.K., Layer-3 is obviously a key to many applications. Where are its limitations?

A: Well, MPEG Layer-3 is a perceptual audio coding scheme, exploiting the properties of the human ear, and trying to maintain the original sound quality as far as possible.
In contrast, a dedicated speech codec exploits the properties of the human vocal tract, trying to maintain the intelligibility of the voice signals as far as possible. Advanced speech coding schemes (e.g., CS-ACELP [LD-CELP] as standardised by ITU as G.723.1 [G.728]) achieve a useful voice reproduction at bitrates as low as 5.3 [16] kbps, with a codec delay below 40 [1] ms. At such very low bitrates, they behave superior to MPEG Layer-3 for pure voice signals, and they offer the low delay that is necessary for full- duplex voice communications.
In the framework of MPEG-4, scalable audio coding schemes are devised that combine speech coding and perceptual audio coding.

Q: You mentioned the codec delay. May I have some figures?

A: Well, the standard gives some figures of the theoretical minimum delay:

Layer-1: 19 ms (<50 ms)
Layer-2: 35 ms (100 ms)
Layer-3: 59 ms (150 ms)

Practical values are significantly above that. As they depend on the implementation, precise figures are hard to give. So the numbers in brackets are just rough thumb values - real codecs may show even higher values. So yes, there are certain applications that may suffer from such a delay (like feedback links for remote reporter units). For many other applications (like the ones mentioned above), delay is of minor interest.

Q: What is "MPEG"?

A: MPEG is the "Moving Picture Experts Group", working under the joint direction of the International Standards Organization (ISO) and the International Electro-Technical Commission (IEC). This group works on standards for the coding of moving pictures and audio. MPEG has created its own homepage, providing information on the what, where, when and how of the standards.

Q: Are MPEG-3 and Layer-3 the same thing?

A: No! Layer-3 is a powerful audio coding scheme which certainly is part of the MPEG standard. Layer-3 is defined within the audio part of both existing international standards, MPEG-1 and MPEG-2.
But:There is no MPEG 3 defined.

Q: How do I get the MPEG documents?

A: Well, you may contact ISO, or you order it from your national standards body. E.g., in Germany, please contact DIN.

Q: Is some public C source available?

A: Well, there is "public C source" available on various sites, e.g. at ftp://ftp.iis.fhg.de/pub/layer3/public_c/. This code has been written mainly for explanation purposes, so do not expect too much performance.

What about Layer-1, Layer-2, Layer-3?

Q: Talking about MPEG audio, I always hear "Layer 1, 2 and 3". What does it mean?

A: MPEG describes the compression of audio signals using high performance perceptual coding schemes. It specifies a family of three audio coding schemes, simply called Layer-1, Layer-2, and Layer-3. From Layer-1 to Layer-3, encoder complexity and performance (sound quality per bitrate) are increasing.
The three codecs are compatible in a hierarchical way, i.e. a Layer-N decoder may be able to decode bitstream data encoded in Layer-N and all Layers below N (e.g., a Layer-3 decoder may accept Layer-1,-2,-3, whereas a Layer-2 decoder may accept only Layer-1 and -2.)

Q: So we have a family of three audio coding schemes. What does the MPEG standard define, exactly?

A: For each Layer, the standard specifies the bitstream format and the decoder. To allow for future improvements, it does not specify the encoder, but an informative chapter gives an example for an encoder for each Layer.

Q: What have the three audio Layers in common?

A: All Layers use the same basic structure. The coding scheme can be described as "perceptual noise shaping" or "perceptual subband / transform coding". The encoder analyzes the spectral components of the audio signal by calculating a filterbank (transform) and applies a psychoacoustic model to estimate the just noticeable noise-level. In its quantization and coding stage, the encoder tries to allocate the available number of data bits in a way to meet both the bitrate and masking requirements.
The decoder is much less complex. Its only task is to synthesize an audio signal out of the coded spectral components.
All Layers use the same analysis filterbank (polyphase with 32 subbands). Layer-3 adds a MDCT transform to increase the frequency resolution.
All Layers use the same "header information" in their bitstream, to support the hierarchical structure of the standard.
All Layers have a similar sensitivity to biterrors. They use a bitstream structure that contains parts that are more sensitive to biterrors ("header", "bit allocation", "scalefactors", "side information") and parts that are less sensitive ("data of spectral components").
All Layers support the insertion of program-associated information ("ancillary data") into their audio data bitstream.
All Layers may use 32, 44.1 or 48 kHz sampling frequency.
All Layers are allowed to work with similar bitrates:

Layer-1: from 32 kbps to 448 kbps
Layer-2: from 32 kbps to 384 kbps
Layer-3: from 32 kbps to 320 kbps

The last two statements refer to MPEG-1; with MPEG-2, there is an extension for the sampling frequencies and bitrates (see below).

Q: What are the main differences between the three Layers, from a global view?

A: From Layer-1 to Layer-3, complexity increases (mainly true for the encoder), overall codec delay increases, and performance increases (sound quality per bitrate).

Q: What are the main differences between MPEG-1 and MPEG-2 in the audio part?

A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1, -2 and -3. The new audio features of MPEG-2 are a "low sample rate extension" to address very low bitrate applications with limited bandwidth requirements (the new sampling frequencies are 16, 22.05 or 24 kHz, the bitrates extend down to 8 kbps), and a "multichannel extension" to address surround sound applications with up to 5 main audio channels (left, center, right, left surround, right surround) and optionally 1 extra "low frequency enhancement (LFE)" channel for subwoofer signals; in addition, a "multilingual extension" allows the inclusion of up to 7 more audio channels.

Q: Is this all compatible to each other?

A: Well, more or less, yes - with the execption of the low sample rate extension. Obviously, a pure MPEG-1 decoder is not able to handle the new half sample rates.

Q: You mean: compatible!? With all these extra audio channels? Please explain!

A: Compatibility has been a major topic during the MPEG-2 definition phase. The main idea is to use the same basic bitstream format as defined in MPEG-1, with the main data field carrying two audio signals (called L0 and R0) as before, and the ancillary data field carrying the multichannel extension information. Without going further into details, two terms should be explained here: "forwards compatible": the MPEG-2 decoder has to accept any MPEG-1 audio bitstream (that represents one or two audio channels) "backwards compatible": the MPEG-1 decoder should be able to decode the audio signals in the main data field (L0 and R0) of the MPEG-2 bitstream "Matrixing" may be used to get the surround information into L0 and R0: L0 = left signal + a * center signal + b * left surround signal R0 = right signal + a * center signal + b * right surround signal Therefore, a MPEG-1 decoder can reproduce a comprehensive downmix of the full 5- channel information. A MPEG-2 decoder uses the multichannel extension information (3 more audio signals) to reconstruct the five surround channels.

Q: In your footnotes, you indicate the use of some "non-ISO" extension inside your Fraunhofer codec, called "MPEG 2.5", to further improve the performance at very low bitrates (e.g. 8 kbps mono). What do you mean by this?

A: Oh, yes. Well, the MPEG-2 standard allows bitrates as low as 8 kbps, for the low sample rate extension. At such a low bitrate, the useful audio bandwidth has to be limited anyway, e.g. to 3 kHz. Therefore, the actual sample rate could be reduced, e.g. to 8 kHz. The lower the sample rate, the better the frequency resolution, the worse the time resolution, and the better the ratio between control information and audio payload inside the bitstream format. As the MPEG-2 standard defines 16 kHz as lowest sample rate, we introduced a further extension, again dividing the low sample rates of MPEG-2 by 2, i.e. we introduced 8, 11.025, and 12 kHz - and we named this extension to the extension "MPEG 2.5". "Layer-3" performs significantly better with 8 kbps @ 8 kHz or 16 kbps @ 11 kHz than with 8 or 16 kbps @ 16 kHz.

Advanced Features of Layer-3 - or: Why does Layer-3 perform so well?

Q: Well, I read your statement about "CD-like" performance, achieved at a data reduction of 4:1 (or 384 kbps total bitrate) with Layer-1, 6..8:1 (or 256..192 kbps total bitrate) with Layer-2, and 12..14:1 (or 128..112 kbps total bitrate) with Layer-3. Can you explain a little further?

A: Well, each audio Layer extends the features of the Layer with the lower number. The simplest form is Layer-1. It has been designed mainly for the DCC (Digital Compact Cassette), where it is used at 384 kbps (called "PASC"). Layer-2 has been designed as a trade-off between complexity and performance. It achieves a good sound quality at bitrates down to 192 kbps. Below, sound quality suffers. Layer-3 has been designed for low bitrates right from the start. It adds a number of "advanced features" to Layer-2: the frequency resolution is 18 times higher, which allows a Layer-3 encoder to adapt the quantisation noise much better to the masking threshold only Layer-3 uses entropy coding (like MPEG video) to further reduce redundancy only Layer-3 uses a bit reservoir (like MPEG video) to suppress artefacts in critical moments and Layer-3 may use more advanced joint-stereo coding methods

Q: I see. Now, tell me more about sound quality. How do you assess that?

A: Today, there is no alternative to expensive listening tests. During the ISO-MPEG process, a number of international listening tests have been performed, with a lot of trained listeners. All these tests used the "triple stimulus, hidden reference" method and the "CCIR impairment scale" to assess the sound quality. The listening sequence is "ABC", with A = original, BC = pair of original / coded signal with random sequence, and the listener has to evaluate both B and C with a number between 1.0 and 5.0. The meaning of these values is: 5.0 = transparent (this should be the original signal) 4.0 = perceptible, but not annoying (first differences noticable) 3.0 = slightly annoying 2.0 = annoying 1.0 = very annoying

Q: Listening tests are certainly an expensive task. Is there really no alternative?

A: Well, at least not today. Tomorrow may be different. To assess sound quality with perceptual codecs, all traditional "quality" parameters (like signal-to-noise ratio, total harmonic distortion, bandwidth) are rather useless, as any codec may introduce noise and distortions as long as these do not affect the perceived sound quality. So, listening tests are necessary, and, if carefully prepared and performed, they lead to rather reliable results.
Nevertheless, Fraunhofer-IIS works on the development and standardisation of objective sound quality assessment tools, too. And there is already a first product available (contact OPTICOM), a real-time measurement tool that nicely supports the analysis of perceptual audio codecs. If you need more information about the Noise- to-Mask-Ratio (NMR) technology, see our NMR-Page or contact nmr@iis.fhg.de.

Q: O.K., back to these listening tests and the performance evaluation. Come on, tell me some results.

A: Well, for more details you should study one of these AES papers or the MPEG documents. For MPEG Layer-3, the main result is that it always performed superior at low bitrates (64 kbps per audio channel or below). Well, this is not completely surprising, as MPEG Layer-3 uses the same tool set as Layer-2, but with some additional advanced coding features that all address the demands of very low bitrate coding. One impressive example is the ISO-MPEG listening test carried out in September 94 at NTT Japan (doc. ISO/IEC JTC1/SC29/WG11 N0848, 11.Nov. 94). Another interesting result is the conclusion of the task group TG 10/2 within the ITU- R, which recommends the use of low bit-rate audio coding schemes for digital sound-broadcasting applications (ITU-R doc. BS.1115).

Q: Very interesting! Tell me more about this recommendation!

A: The task group TG 10/2 finished its work in 10/93. The recommendation defines three fields of broadcast applications and recommends Layer-2 with 180 kbps per channel for distribution and contribution links (20 kHz bandwidth, no audible impairments with up to 5 cascaded codec), Layer-2 with 128 kbps per channel for emission (20 kHz bandwidth), and MPEG Layer-3 with 60 (120) kbps for mono (stereo) signals for commentary links (15 kHz bandwidth).

ActiveWindows is not responsible for any links to third party sites.

Return to the MP3 Welcome