SHAI/SHAI API Proposals/Audio Telephony SHAI Specifications
From Symbian Developer Community
Title page
Audio Telephony
SHAI Specification
v1.0
Copyright (c) 2005-2009 Nokia Corporation and/or its subsidiary(-ies). All rights reserved.
This specification and the accompanying materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at http://www.eclipse.org/legal/epl-v10.html
Initial Contributors:
Nokia Corporation - initial contribution.
Contributors:
Table of contents:
References
| [1] OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2
| http://www.khronos.org/files/openmax_il_spec_1_1_2.pdf |
| [2] Speech Telephony Requirements Specification
| |
| [3] Audio Telephony Open Items
| |
| [4] ModemIP Audio Concept, WGmodem Documentation
| |
| [5] MSW-2.5-17: WGModem Audio Upper Interface Specification
| |
| [6] RTP: A Transport Protocol for Real-Time Applications
| http://tools.ietf.org/html/rfc3550 |
| [7] RTP Profile for Audio and Video Conferences with Minimal Control
| http://tools.ietf.org/html/rfc3551 |
| [8] Audio HW Control SHAI Specification
| |
| [9] Audio Transducer SHAI Specification
| |
| [10] Audio Render SHAI Specification
| |
| [11] Audio Codecs SHAI Specification
| |
| [12] 3GPP TS 26.226, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; General Description
| ftp://ftp.3gpp.org/Specs/latest/Rel-5/26_series/26226-500.zip |
| [13] 3GPP TS 26.230, Technical Specification Group Services and System Aspects; Cellular text telephone modem; Transmitter bit exact C-code
| ftp://ftp.3gpp.org/Specs/latest/Rel-5/26_series/26230-502.zip |
| [14] 3GPP TS 26.231, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; Minimum Performance Requirements
| ftp://ftp.3gpp.org/Specs/latest/Rel-5/26_series/26231-521.zip |
| [15] 3GPP TS 23.206, Voice Call Continuity (VCC) between Circuit Switched (CS) and IP Multimedia Subsystem (IMS)
| http://www.3gpp.org/ftp/Specs/html-info/23206.htm |
| [16] 3GPP TS 43.318, Technical Specification Group GSM/EDGE, Radio Access Network, Generic Access Network (GAN)
| ftp://ftp.3gpp.org/Specs/archive/43_series/43.318/ |
| [17] 3GPP TS 44.318, Technical Specification Group GSM/EDGE Radio Access Network, Generic Access Network (GAN), Mobile GAN interface layer 3 specification
| ftp://ftp.3gpp.org/Specs/archive/44_series/44.318/ |
Abbreviations
| Abbreviation | Description |
|---|---|
| ABE | Artificial Bandwidth Expansion |
| ADC | Analog to Digital Converter |
| AEC | Acoustic Echo Control |
| ANC | Active Noise Control |
| CS | Circuit Switched |
| CTM | Cellular Text Telephony Modem |
| DAC | Digital to Analog Converter |
| dBFS | Decibel digital Full Scale |
| DRC | Dynamic Range Control |
| DTX | Discontinuous transmission |
| GAN | Generic Access Network |
| HF | Handsfree |
| HP | Hand Portable |
| IHF | Integrated Handsfree |
| POC | Push To Talk |
| PS | Packet Switched |
| RTP | Real Time Protocol |
| SMP | Symmetric Multi Processing |
| SRC | Sample Rate Converter |
| TTY | TeleTYpe |
| UMA | Unlicensed Mobile Access |
| VAD | Voice Activity Detector |
| VCC | Voice Call Continuity |
| VoIP | Voice Over IP |
Purpose
Audio Telephony API is used for controlling SW components needed in speech calls. The client for the API is Nokia audio adaptation.
List of required interfaces
Data and control interface in circuit switched speech calls to Modem IP, for details see 1.1 MSW-2.5-17: WGModem Audio Upper Interface Specification.
Transducer processing components are specified in 1.1 Audio Transducer .
Mixer and Splitter components are specified in 1.1 Audio Render .
Codec needed in VoIP calls are specified in 1.1 Audio Codecs .
Product Platform and/or Symbian OS version constraints
Speech processing component can be integrated to MCU or to DSP. The algorithms can be supplier specific or Nokia specific.
The speech processing component provides control for sidetone attenuation. Sidetone howling control can be made also as separate component or integrated to source or to sink. The final location is dependent on how the sidetone is implemented. It can be implemented for example in HW, in HW driver or with a loop outside sink and source components.
The location of speech processing component is dependent on available cycles and memory for processing and the location of interfacing components. Depending on supported algorithms and the delay in control from speech processing via client to CS Sink – Source, direct control path may be needed between components.
Technical specification
Type of interface
Audio Telephony API is based on Open MAX IL specification, but no standard Open MAX components can be used, because of special requirements of audio telephony. For details about Open MAX, see 1.1 OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2.
Interface Class Structure
Classes used for implementing Audio Telephony API:
Speech Processing
Audio Capturer Class (Standard OpenMAX IL class)
PCM Audio Capturer (Standard OpenMAX IL class)
RTP Audio Source
Audio Renderer Class (Standard OpenMAX IL class)
PCM Audio Renderer (Standard OpenMAX IL class)
RTP Audio Sink
CS Audio Sink – Source
Audio Processor Class (Standard OpenMAX IL class)
CTM Decoder
CTM Encoder
In addition Audio mixer, Audio Splitter, Sinks, Sources and different transducer processing components are needed for implementing different combined use cases. For more information about the components, see 1.1 Audio Transducer , 1.1 Audio Render and 1.1 Audio Codecs .
The Interface
The OpenMAX IL API is a component-based media API that consists of two main segments: the core API and the component API.
Core
The OpenMAX IL core is used for dynamically loading and unloading components and for facilitating component communication. Once loaded, the API allows the user to communicate directly with the component, which eliminates any overhead for high commands. Similarly, the core allows a user to establish a communication tunnel between two components. Once established, the core API is no longer used and communications flow directly between components.
Components
In the OpenMAX Integration Layer, components represent individual blocks of functionality. Components can be sources, sinks, codecs, filters, splitters, mixers, or any other data operator. Depending on the implementation, a component could possibly represent a piece of hardware, a software codec, another processor, or a combination thereof.
The individual parameters of a component can be set or retrieved through a set of associated data structures, enumerations, and interfaces. The parameters include data relevant to the component’s operation (i.e., codec options) or the actual execution state of the component.
Buffer status, errors, and other time-sensitive data are relayed to the application via a set of callback functions. These are set via the normal parameter facilities and allow the API to expose more of the asynchronous nature of system architectures.
Data communication to and from a component is conducted through interfaces called ports. Ports represent both the connection for components to the data stream and the buffers needed to maintain the connection. Users may send data to components through input ports or receive data through output ports. Similarly, a communication tunnel between two components can be established by connecting the output port of one component to a similarly formatted input port of another component.
For more details, see OpenMAX IL Introduction and Architecture, in 1.1 OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2.
Usage
Controlled hardware resources
Via audio telephony API no HW resources can be controlled directly, but when components are created, required amount of memory and cycles are reserved into use. In addition inteface to modem has to be initialized when CS Sink – Source component is created and when modem SW runs in separate processor or HW.
Protocol
The OpenMAX IL API interfaces with a higher-level entity denoted as the IL client, which is typically a functional piece of a filter graph multimedia framework, OpenMAX AL, or an application. The IL client interacts with a centralized IL entity called the core. The IL client uses the OpenMAX IL core for loading and unloading components, setting up direct communication between two OpenMAX IL components, and accessing the component’s methods.
An IL client always communicates with a component via the IL core. In most cases, this communication equates to calling one of the IL core’s macros, which translates directly to a call on one of the component methods. Exceptions (where the IL client calls an actual core function that works) include component creation and destruction, queries about installed components and the roles they support, and connection via tunneling of two components.
Components embody the media processing function or functions. Component provider defines the functionality of a given component. Components operate on four types of data that are defined according to the parameter structures that they export: audio, video, image, and other (e.g., time data for synchronization).
An OpenMAX IL component provides access to a standard set of component functions via its component handle. These functions allow a client to get and set component and port configuration parameters, get and set the state of the component, send commands to the component, receive event notifications, allocate buffers, establish communications with a single component port, and establish communication between two component ports.
Every OpenMAX IL component shall have at least one port to claim OpenMAX IL conformance. Although a vendor may provide an OpenMAX IL-compatible component without ports, the bulk of conformance testing is dependent on at least one conformant port. The four types of ports defined in OpenMAX IL correspond to the types of data a port may transfer: audio, video, and image data ports, and other ports. Each port is defined as either an input or output depending on whether it consumes or produces buffers.
For more details, see OpenMAX IL Introduction and Architecture, in 1.1 OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2.
Error handling
Six kinds of events are sent by a component to the IL client:
- Error events are enumerated and can occur at any time
- Command complete notification events are triggered upon successful execution of a command.
- Marked buffer events are triggered upon detection of a marked buffer by a component.
- A port settings changed notification event is generated when the component changes its port settings.
- A buffer flag event is triggered when an end of stream is encountered.
- A resources acquired event is generated when a component gets resources that it has been waiting for.
Ports make buffer handling callbacks upon availability of a buffer or to indicate that a buffer is needed.
OMX_ERRORTYPE
The OMX_ERRORTYPE enumerations are defined in OpenMAX IL specification and the standard OpenMAX IL errors that all functions defined in the OpenMAX IL API return. These errors should cover most of the common failure cases. However, vendors are free to add additional error messages of their own as long as they follow these rules:
- Vendor error messages shall be in the range of 0x90000000 to 0x9000FFFF.
- Vendor error messages shall be defined in a header file provided with the component. No error messages are allowed that are not defined.
OMX_EventError
A component generates the OMX_EventError event when the component detects an error condition; the type of error detected is returned as an event parameter and will use values defined in OMX_ERRORTYPE. A component shall send the following errors via OMX_EventError:
- A component sends the OMX_ErrorInvalidState error if the component transitions to the OMX_StateInvalid state.
- A component sends the OMX_ErrorResourcesPreempted error if the component transitions from OMX_StateExecuting or OMX_StatePause to OMX_StateIdle due to the loss of a resource.
- A component sends the OMX_ErrorResourcesLost error if the component transitions from OMX_StateIdle to OMX_StateLoaded due to the loss of a resource.
For more details, see OpenMAX IL Introduction and Architecture, in 1.1 OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2.
Memory overhead
Memory overhead when compared to any other same kind of API is minimal.
Extensions to the API
An OpenMAX IL component may support any setting defined in the OpenMAX IL specification. Vendors can add to the list of parameters and configurations not included in the standard header files. These additions are referred to as extensions.
Any extensions approved by Khronos are considered OpenMAX IL extensions. Any extensions not approved by Khronos are vendor-defined extensions.
For speech processing component, the algorithm specific parameters will be added as vendor-defined extensions.
For more information about OpenMAX IL standard extensions usage, see Chapter 5. OpenMAX IL Component Extension APIs in document 1.1 OpenMAX™ Integration Layer, Application Programming Interface, Specification, Version 1.1.2.
Variation and configurability
No compile time variation is needed in API. All the configuration is done in run time.
Testing and tuning
Proprietary algorithm tuning parameters are specified as OpenMAX IL extensions for each algorithm speech processing component and they are not shown in this document. The algorithm specific information, like some internal variables in algorithm (e.g. AEC adaptive filter coefficients) needed in tuning and testing is also specified as OpenMAX IL extensions. Because typically the information about internal variables is not needed in client, they can be also delivered e.g. via P&S keys in Symbian environment.
Control between components
Some control information need to be delivered between components .Typically this information is not needed by client and it has to be delivered with very short delay. Proposed way is to make straight SetConfig call from component to another. If the components are not using tunneled communication and so do not have handle each to other, client need to deliver the component handle for them.
Control data ports have been also specified for control, which is needed between components, but this should be only used, if the SetConfig cannot be used for some reason.
Non-functional requirements
All the requirements for audio telephony are described in 1.1 Speech Telephony Requirements Specification.
Re-entrancy
All specified components have to be re-entrant, so that implementation does not restrict that several instances of them could not be initiated.
SMP safety
All specified components have to be SMP (Symmetric Multi Processing) safe. Components, which process more than one independent audio stream like speech call uplink and downlink streams, have to be implemented so that the different streams can be processes in different OS threads.
Callbacks
Callback is needed for all parameters, which can change without client action. OpenMAX IL standard will have specified callback mechanism for indicating the events, which are caused without client action, for example speech encoder configuration change in CS Source – Sink component.
Detailed description
This section describes the structures, parameters and configuration details for ports in the audio telephony. These parameter and configurations details are specified in the OMX_Audio_Telephony.h header.
Components
PCM Audio Mixer
Audio mixer mixes two or more audio streams to one stream. For audio telephony, the real time of uplink stream has to be ensured, so that other streams, which will be mixed, do not ever delay the uplink audio stream. The mixer component used in telephony is specified in 1.1 Audio Render .
PCM Audio Splitter
Audio splitter splits one audio stream to two or more streams. The splitter component used in telephony is specified in 1.1 Audio Render .
CS Audio Sink – Source
CS Audio Sink – Source receives CS audio call uplink stream and sends CS audio call downlink stream and timing control. It handles also speech encoding and decoding and the interface to wireless modem. For details about modem audio interface, see 1.1 MSW-2.5-17: WGModem Audio Upper Interface Specification
RTP Audio Sink
RTP Audio Sink receives uplink audio stream in packet switched audio call from speech encoder and handles the interface to component handling RTP.
RTP Audio Source
RTP Audio Source handles the interface to component handling RTP and sends downlink audio stream to speech decoder in packet switched audio call.
Speech Processing
Speech processing component receives microphone signal and forwards it to uplink. It also receives downlink signal and forwards it to downlink signal path. Speech processing enhances subjective quality of signal with processing optimized for speech signal. Component contains following functionalities.
- Acoustic Echo Control
- Artificial Bandwidth Expansion
- Automatic Gain Control
- Automatic Volume Control
- Background Noise Control
- Comfort Noise Generation
- Downlink Noise Control
- Sidetone Howling Control
- Wind Noise Control
- Transducer Dynamic Range Control
- Uplink Dynamic Range Control
- Uplink Noise Control
- Voice Clarity
CTM Encoder
CTM encoder modifies the signal produced by TTY device so that the information can be transmitted through digital speech coding and transmission channel, which can produce errors to signal.
For more information about CTM encoder algorithm, see 1.1 3GPP TS 26.226, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; General Description, 1.1 3GPP TS 26.230, Technical Specification Group Services and System Aspects; Cellular text telephone modem; Transmitter bit exact C-code and 1.1 3GPP TS 26.231, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; Minimum Performance Requirements
CTM Decoder
CTM decoder decodes the signal encoded by CTM encoder in the transmitting end, so that it can be decoded by TTY device.
For more information about CTM decoder algorithm, see 1.1 3GPP TS 26.226, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; General Description, 1.1 3GPP TS 26.230, Technical Specification Group Services and System Aspects; Cellular text telephone modem; Transmitter bit exact C-code and 1.1 3GPP TS 26.231, Technical Specification Group Services and System Aspects; Cellular Text Telephone Modem; Minimum Performance Requirements
RTP Audio Sink class
| Name | Audio_sink.rtp_sink | |||
| Description | Delivers the uplink stream to network. | |||
| Ports | Index | Domain | Direction | Description |
| APB+0 | Audio | Input | Accepts audio stream. | |
| Port Index | APB+0 | ||
| Description | Accepts audio stream. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamPortDefinition
| r/w | Specify/query the audio port settings.
eEncoding = OMX_AUDIO_CodingPCM OMX_AUDIO_CodingAMR OMX_AUDIO_CodingG711 OMX_AUDIO_CodingG729 OMX_AUDIO_CodingILBC | |
| OMX_IndexParamAudioPortFormat
| r/w | eEncoding =
OMX_AUDIO_CodingPCM OMX_AUDIO_CodingAMR OMX_AUDIO_CodingG711 OMX_AUDIO_CodingG729 OMX_AUDIO_CodingILBC | |
RTP Audio Source class
| Name | audio_source.rtp_source | |||
| Description | Receives the downlink stream from network. | |||
| Ports | Index | Domain | Direction | Description |
| APB+0
| audio | Output | Emits audio stream. | |
| Port Index | APB+0 | ||
| Description | Emits audio stream. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamPortDefinition
| r/w | Specify/query the audio port settings.
eEncoding = OMX_AUDIO_CodingPCM OMX_AUDIO_CodingAMR OMX_AUDIO_CodingG711 OMX_AUDIO_CodingG729 OMX_AUDIO_CodingILBC | |
| OMX_IndexParamAudioPortFormat
| r/w | eEncoding =
OMX_AUDIO_CodingPCM OMX_AUDIO_CodingAMR OMX_AUDIO_CodingG711 OMX_AUDIO_CodingG729 OMX_AUDIO_CodingILBC | |
CS Audio Sink – Source class
| Name | Cs_sink_source | |||
| Description | Sends CS downlink speech data stream and uplink data timing and receives the CS uplink speech data stream. Handles speech data and control interface towards wireless modem and executes speech codecs, VAD and DTX functionality | |||
| Ports | Index | Domain | Direction | Description |
| APB+0
| audio | Output | Emits speech data stream. | |
| APB+1 | audio | Input | Accepts speech data stream. | |
| OPB+0 | Other/time | Output | Emits the uplink timing information. | |
| OPB+1 | Other/time | Output | Emits the RF TX power information. | |
| Port Index | APB+0 | ||
| Description | Emits speech data stream. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamPortDefinition
| r/w | Specify/query the audio port settings.
eEncoding = OMX_AUDIO_CodingPCM | |
| OMX_IndexParamAudioPortFormat
| r/w | eEncoding = OMX_AUDIO_CodingPCM | |
| OMX_IndexParamAudioPcm
| r/w | Specify/query the sampling rate and number of channels.
nChannels = 1 (Mono) eNumData = OMX_NumericalDataSigned eEndian = « Native » bInterleaved = OMX_TRUE nBitPerSample = 16 nSamplingRate = 8000, 16000 ePCMMode = OMX_AUDIO_PCMModeLinear | |
| OMX_IndexParamAudioDecoderStatus | r | Information about speech decoder configuration and status. | |
| Port Index | APB+1 | ||
| Description | Accepts audio stream. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamPortDefinition
| r/w | Specify/query the audio port settings.
eEncoding = OMX_AUDIO_CodingPCM | |
| OMX_IndexParamAudioPortFormat | r/w | eEncoding = OMX_AUDIO_CodingPCM | |
| OMX_IndexParamAudioPcm | r/w | Specify/query the sampling rate and number of channels.
nChannels = 1 (Mono) eNumData = OMX_NumericalDataSigned eEndian = « Native » bInterleaved = OMX_TRUE nBitPerSample = 16 nSamplingRate = 8000, 16000 ePCMMode = OMX_AUDIO_PCMModeLinear | |
| OMX_IndexParamAudioEncoderStatus | r | Information about speech encoder configuration and status. | |
| OMX_IndexParamAudioUplinkTiming | r | Information about uplink speech timing information. | |
| OMX_IndexParamAudioRfTxPower | r | Information about power used by RF amplifier. | |
| Port Index | OPB+0 |
| Description | Provides the uplink timing information. See 3.4.4.3 Uplink timing. |
| Port Index | OPB+1 |
| Description | Provides the RF-TX power information. See 3.4.4.4 RF-TX Power. |
Speech decoder status
Speech decoder status and configuration information.
| typedef struct OMX_AUDIO_DECODER_STATUS {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion ; OMX_AUDIO_CODINGTYPE nCoding; OMX_U32 nBitrate; } OMX_AUDIO_DECODER_STATUS; |
The parameters for OMX_AUDIO_DECODER_STATUS are:
nCoding tells the used speech coding.
nBitrate tells the used bit rate.
Speech encoder status
Speech encoder status and configuration information.
| typedef struct OMX_AUDIO_ENCODER_STATUS {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion ; OMX_AUDIO_CODINGTYPE nCoding; OMX_U32 nBitrate; OMX_BOOL bDtx; OMX_BOOL bAudioActivityControl; OMX_BOOL bNsync; } OMX_AUDIO_ENCODER_STATUS; |
The parameters for OMX_AUDIO_ENCODER_STATUS are:
nCoding tells the used speech coding.
nBitrate tells the used bit rate.
nDtx tells the state of DTX.
nAudioActivityControl tells if uplink noise suppressor may be used.
bNsync is used Defines whether the NSYNC synchronization procedure should be used within 2G AMR calls to temporarily disable DTX after handover.
Uplink timing
Uplink timing information is used for telling, when the next uplink speech frame has to be in modem.
| typedef struct OMX_AUDIO_UPLINK_TIMING {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nDeliveryTime; OMX_U32 nModemProcessingTime; } OMX_AUDIO_UPLINK_TIMING; |
The parameters for OMX_AUDIO_UPLINK_TIMING are:
-
- nDeliveryTime tells the timing of encoded uplink speech data transmitted from the audio sub system to the interface of the modem. The delivery time is specified in microseconds and it tells when the encoded speech frame has to be in modem, so the processing delays and other delays has to be taken into account.
- nModemProcessingTime tells how much time in microseconds it takes for modem to process and transmit the data. This value is for information only.
For more details, see 1.1 MSW-2.5-17: WGModem Audio Upper Interface Specification.
RF-TX Power
RF-TX power information can be used for making fast attenuation for audio signal in order to prevent device shut down, when there it high total current drawn from battery.
| typedef struct OMX_AUDIO_RF_TX_POWER {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPower; OMX_U32 nTime; } OMX_AUDIO_RF_TX_POWER; |
The parameters for OMX_AUDIO_RF_TX_POWER are:
-
- nPower tells the power used by RF TX power amplifier in milliwatts.
- nTime tells after what time in microseconds the power is taken into use in modem.
Speech Processing class
Speech processing class shall not do sample rate conversions. When the sampling rate of the input port is changed, the output port sampling rate shall automatically change to the same value.
| Name | speech_processing | |||
| Description | Add speech processing to PCM audio stream. | |||
| Ports | Index | Domain | Direction | Description |
| APB+0 | Audio | Input | Accepts uplink speech. | |
| APB+1 | audio | Input | Accepts downlink Speech. | |
| APB+2 | audio | Input | Accepts reference signal for AEC. | |
| APB+3 | Audio | output | Emits uplink speech. | |
| APB+4 | Audio | output | Emits downlink speech. | |
| OPB+0 | Other/Control | output | Emits control for sidetone. | |
| Port Index | APB+0 | ||
| Description | Accepts uplink speech. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamAudioPcm
| r/w | Specify/query the sampling rate and number of channels.
4 (four channels) 3 (three channels) 2 (two channels) 1 (one channel)
| |
| Port Index | APB+1 | ||
| Description | Accepts downlink speech. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamAudioPcm
| r/w | nChannels = 1 (one channel)
nBitPerSample = 16 nSamplingRate = 8000, 16000 | |
| OMX_IndexParamAudioDownlinkSpeechInfo | r | Information about downlink speech frame. | |
| Port Index | APB+2 | ||
| Description | Accepts loudspeaker signal. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamAudioPcm
| r/w | nChannels = 1 (one channel)
nBitPerSample = 16 nSamplingRate = 8000, 16000 | |
| Port Index | APB+3 | ||
| Description | Emits uplink speech. | ||
| Required
Parameters/Configs | Index | Access | Description |
| OMX_IndexParamAudioPcm
| r/w | nChannels = 1 (one channel)
nBitPerSample = 16 nSamplingRate = 8000, 16000 | |
| OMX_IndexConfigUplinkAlgorithms | r/w | bEnable = False, True | |
| OMX_IndexConfigAcousticEchoControl | r/w | bEnable = False, True | |
| OMX_IndexConfigAutomaticGainControl | r/w | bEnable = False, True | |
| OMX_IndexConfigBackgroundNoiseControl | r/w | bEnable = False, True | |
| OMX_IndexConfigMultiMicrophoneNoiseControl | r/w | bEnable = False, True | |
| OMX_IndexConfigSidetoneHowlingControl | r/w | bEnable = False, True | |
| OMX_IndexConfigUplinkDynamicRangeControl | r/w | bEnable = False, True | |
| OMX_IndexConfigWindNoiseControl | r/w | bEnable = False, True | |
| OMX_IndexParamAudioDaAdTimingDifference | r | Timing difference between uplink and downlink frames. | |
| Port Index | APB+4 | ||
| Description | Emits downlink speech. | ||
| Required
Parameters/Configs
| Index | Access | Description |
| OMX_IndexParamAudioPcm | r/w | Specify/query the sampling rate and
number of channels.
| |
| OMX_IndexConfigDownlinkAlgorithms | r/w | bEnable = False, True | |
| OMX_IndexConfigAutomaticVolumeControl | r/w | bEnable = False, True | |
| OMX_IndexConfigArtificialBandwidthExpansion | r/w | bEnable = False, True | |
| OMX_IndexConfigComfortNoiseGeneration | r/w | bEnable = False, True | |
| OMX_IndexConfigDownlinkNoiseControl | r/w | bEnable = False, True | |
| OMX_IndexConfigTransducerDynamicRangeControl | r/w | bEnable = False, True | |
| OMX_IndexConfigVoiceClarity | r/w | bEnable = False, True | |
| OMX_IndexParamDlSpeechInfoType | r | bComfortNoiseFrame = False, True
bCorruptedFrame = False, True | |
Acoustic Echo Control
Acoustic echo control is used for attenuating the echo which is looped from loudspeaker or earpiece to microphone signal.
| typedef struct OMX_AUDIO_CONFIG_ACOUSTIC_ECHO_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_BS32 sEchoGain; OMX_BU32 nEchoMinDelay; OMX_BU32 nEchoMaxDelay; } OMX_AUDIO_CONFIG_ACOUSTIC_ECHO_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_ACOUSTIC_ECHO_CONTROL are:
-
- bEnable enables the acoustic echo control if set to OMX_TRUE or disables the acoustic echo control if set to OMX_FALSE.
- sEchoGain defines the gain in echo path. The value typically changes, when accessory is connected or disconnected or volume is adjusted.
- nEchoMinDelay defines the minimum delay for echo from sink to source.
- nEchoMaxDelay defines the maximum echo time, which has to be handled by AEC.
Artificial Bandwidth Expansion
Artificial bandwidth expansion adds frequency content to higher frequencies to signal, which has been originally sampled with 8 kHz.
| typedef struct OMX_AUDIO_ARTIFICIAL_BANDWIDTH_EXPANSION {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_BS32 nStrength; } OMX_AUDIO_ARTIFICIAL_BANDWIDTH_EXPANSION; |
The parameters for OMX_AUDIO_ARTIFICIAL_BANDWIDTH_EXPANSION are:
-
- bEnable enables the artificial bandwidth expansion if set to OMX_TRUE or disables the artificial bandwidth expansion if set to OMX_FALSE.
- nStrength defines the strength of the effect. Value range can be read.
Automatic Volume Control
Automatic volume control adjusts the output signal level so that it is not necessary to adjust volume in different background noise environments.
| typedef struct OMX_AUDIO_CONFIG_AUTOMATIC_VOLUME_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_AUTOMATIC_VOLUME_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_AUTOMATIC_VOLUME_CONTROL are:
-
- bEnable enables the automatic volume control if set to OMX_TRUE or disables the automatic volume control if set to OMX_FALSE.
Background Noise Control
Background noise control increases the signal to noise ratio of speech signal especially in noisy environment.
| typedef struct OMX_AUDIO_CONFIG_BACKGROUND_NOISE_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_BU32 nStrength; } OMX_AUDIO_CONFIG_BACKGROUND_NOISE_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_BACKGROUND_NOISE_CONTROL are:
-
- bEnable enables the background noise control if set to OMX_TRUE or disables the background noise control if set to OMX_FALSE.
- nStrength defines the how much the background noise control suppresses the noise in mB.
Comfort Noise Generation
Comfort noise generation hides short downlink muted periods caused by bad network connection.
| typedef struct OMX_AUDIO_CONFIG_COMFORT_NOISE_GENERATION {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_COMFORT_NOISE_GENERATION; |
The parameters for OMX_AUDIO_CONFIG_COMFORT_NOISE_GENERATION are:
-
- bEnable enables the comfort noise generation if set to OMX_TRUE or disables the comfort noise generation if set to OMX_FALSE.
Downlink Noise Control
Background noise control increase the signal to noise ration of speech signal especially if the phone in far end does not support noise control for microphone signal and it is in noisy environment.
| typedef struct OMX_AUDIO_CONFIG_DOWNLINK_NOISE_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_BU32 nStrength; } OMX_AUDIO_CONFIG_DOWNLINK_NOISE_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_DOWNLINK_NOISE_CONTROL are:
-
- bEnable enables the downlink noise control if set to OMX_TRUE or disables the downlink noise control if set to OMX_FALSE.
- nStrength defines the how much the noise control suppresses the noise in mB.
Multimicrophone Noise Control
Multimicrophone control attenuates the noise from signal by using input from more than one microphone.
| typedef struct OMX_AUDIO_CONFIG_MULTIMICROPHONE_NOISE_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_MULTIMICROPHONE_NOISE_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_MULTIMICROPHONE_NOISE_CONTROL are:
-
- bEnable enables the multimicrophone noise control if set to OMX_TRUE or disables the multimicrophone noise control if set to OMX_FALSE.
Sidetone Howling Control
Sidetone is signal looped from microphone and mixed downlink signal fed to earpiece. Depending on sidetone gain settings device acoustics and environment, the looped signal can start to circulate and cause howling. For preventing the howling situation, separate howling control may be needed.
| typedef struct OMX_AUDIO_CONFIG_SIDETONE_HOWLING_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_S32 nSidetoneAttenuation; } OMX_AUDIO_CONFIG_SIDETONE_HOWLING_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_SIDETONE_HOWLING _CONTROL are:
-
- bEnable enables the sidetone howling control if set to OMX_TRUE or disables the sidetone howling control if set to OMX_FALSE.
- sSidetoneAttenuation is read only parameter and provides information about sidetone attenuation need. If the sidetone attenuation is not made via direct route between components, the change has to go via client and client has to be informed with callback about the change in the sidetone attenuation.
Transducer Dynamic Range Control
Transducer dynamic range control is used for preventing the distortion caused by signal, which loudspeaker cannot produce properly.
| typedef struct OMX_AUDIO_CONFIG_TRANSDUCER_DYNAMIC_RANGE_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_TRANSDUCER_DYNAMIC_CONTROL; |
The parameters for OMX_AUDIO_CONFIG_TRANSDUCER_DYNAMIC_CONTROL are:
-
- bEnable enables the transducer dynamic range control if set to OMX_TRUE or disables the transducer dynamic range control if set to OMX_FALSE.
Uplink Dynamic Range Control
Uplink dynamic range control normalize the uplink speech signal level and prevents bad signal distortion cause by saturation.
| typedef struct OMX_AUDIO_CONFIG_UPLINK_LEVEL_NORMALIZATION {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_UPLINK_LEVEL_NORMALIZATION; |
The parameters for OMX_AUDIO_CONFIG_UPLINK_LEVEL_NORMALIZATION are:
-
- bEnable enables the uplink dynamic range control if set to OMX_TRUE or disables the uplink dynamic range control if set to OMX_FALSE.
Voice Clarity
Voice clarity feature change the downlink signal level and frequency content so that it can be easier heard in different kind of environments. Feature is especially useful, if the user has hearing loss cause e.g. by aging.
| typedef struct OMX_AUDIO_CONFIG_VOICE_CLARITY {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; } OMX_AUDIO_CONFIG_VOICE_CLARITY; |
The parameters for OMX_AUDIO_CONFIG_VOICE_CLARITY are:
-
- bEnable enables the voice clarity feature if set to OMX_TRUE or disables the voice clarity feature if set to OMX_FALSE.
Wind Noise Control
Wind noise control attenuates the wind noise from microphone signal.
| typedef struct OMX_AUDIO_CONFIG_WIND_NOISE_CONTROL {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_BOOL bEnable; OMX_BU32 nSensitivity; } OMX_AUDIO_CONFIG_WIND_NOISE_CONTROL; |
The parameters for OMX_AUDIO_ CONFIG_WIND_NOISE_CONTROL are:
- bEnable enables the wind noise control if set to OMX_TRUE or disables the wind noise control if set to OMX_FALSE.
- nSensitivity defines the algorithm sensitivity to wind noise. Value range can be read.
Buffer Payload Additional Information
Additional information needed with buffer payload.
Extra data base type
For including the buffer payload additional information to buffers, base type, which does not include the actual extra data is defined.
| typedef struct OMX_OTHER_EXTRADATABASETYPE {
OMX_U32 nSize; OMX_VERSIONTYPE nVersion; OMX_U32 nPortIndex; OMX_EXTRADATATYPE eType; OMX_U32 nDataSize; } OMX_OTHER_EXTRADATABASETYPE; |
Downlink speech info
Speech decoder delivers the information, is the delivered frame normal decoded speech, comfort noise or output from decoder error concealment. This information has to go all the way to speech processing component even if there is mixer or splitter in the signal path. The data is included as buffer payload additional information after the actual payload.
| typedef struct OMX_AUDIO_DLSPEECHINFOTYPE {
OMX_BOOL bComfortNoiseFrame; OMX_BOOL bCorruptedFrame; } OMX_AUDIO_DLSPEECHINFOTYPE; |
| typedef struct OMX_AUDIO_EXTRADATA_DLSPEECHINFOTYPE {
OMX_OTHER_EXTRADATABASETYPE base; OMX_AUDIO_DLSPEECHINFOTYPE dlspeechinfo; } OMX_AUDIO_EXTRADATA_DLSPEECHINFOTYPE; |
DA-AD data timing difference
Sources need to include DA-AD data timing difference to uplink data. It is calculated simply by using time stamps of AD and DA samples, so that when sample is written to DA converter and sample is read from AD converter, the difference between the time stamps of these is calculated (DAtimestamp - ADtimestamp). The calculation has to be done so that only full microseconds are used. The timestamps for samples/frames are always positive. The result from calculation can be negative or positive. When started from 0, the timestamp turns to negative after 292471 years, so it is not mandatory to reset the counter in the beginning of each call. Splitter has to copy the time stamp to each of split buffer. Mixer has to copy the timestamp from paced input port to paced output port and time stamps from other input ports can be ignored. The data is included as buffer payload additional information after the actual payload.
| Typedef struct OMX_AUDIO_DAADTIMINGDIFFTYPE {
OMX_TICKS nDaAdTimingDifference; } OMX_AUDIO_DAADTIMINGDIFFTYPE; |
| typedef struct OMX_AUDIO_DAADTIMINGDIFFTYPE {
OMX_OTHER_EXTRADATABASETYPE base; OMX_AUDIO_DLSPEECHINFOTYPE dlspeechinfo; } OMX_AUDIO_OMX_AUDIO_DAADTIMINGDIFFTYPE; |
OpenMAX IL Data API
This section describes the typical component usage for the audio telephony.
Audio Telephony use case examples
These examples show how the components could be connected in selected use cases: CS speech call, speech call and tone generation, speech call recording, speech call and audio playback and TTY. These are just some use cases with some example implementation. Many other use cases exist also and the components may be different in different implementations. In these examples sidetone gain attenuation is controlled via Earpiece Sink. Filter coefficients for Active Noise Control are as well updated via Earpiece Sink. The Active Noise Control loop and sidetone loop are not visible in the picture, but they are hidden behind Earpiece Sink. Microphone source outputs all microphone channels (primary microphone signal and secondary microphone signal(s)) for Multimicrophone Noise Control and reference, error signals and feedback loop signals for Active Noise Control.
Circuit Switched Speech call
Speech call and tone generation
Generated DTMF tones are not routed to uplink in CS speech call, but they are mixed only to downlink signal path so that they can be heard by phone user. Recording beeps are routed to uplink and to downlink signal path.
Speech call recording
Recording beeps are also generated when speech call recording is active, but they are not shown in this picture.
Ring tone
When phone is ringing and audio accessory is connected, ring tone is routed to IHF. Speech channels may be already open, when phone is ringing, but audio paths have to be muted, so that real speech signal is be transferred between the phone and the network.
Speech call and audio playback
TTY
VoIP call
GAN call
In GAN (Generic Access Network) call in handover between CS and PS bearer, the both streams may exist simultaneously for a short period.
Parameter and Configuration Indexes
The header OMX_Audio_Telephony_Index.h contains the enumeration OMX_INDEXTYPE, which contains all standard index values used with the core functions OMX_GetParameter, OMX_SetParameter, OMX_GetConfig, and OMX_SetConfig. Table below shows the index values that relate to audio telephony.
| OpenMAX IL Indices | Corresponding OpenMAX IL Audio Structures |
| OMX_IndexConfigUplinkAlgorithms | OMX_AUDIO_CONFIG_UPLINK_ALGORITHMS |
| OMX_IndexConfigDownlinkAlgorithms | OMX_AUDIO_CONFIG_DOWNLINK_ALGORITHMS |
| OMX_IndexConfigAcousticEchoControl | OMX_AUDIO_CONFIG_ ACOUSTIC_ECHO_CONTROL |
| OMX_IndexConfigAutomaticVolumeControl | OMX_AUDIO_CONFIG_AUTOMATIC_VOLUME_CONTROL |
| OMX_IndexConfigArtificialBandwidthExpansion | OMX_AUDIO_CONFIG_ARTIFICIAL_BANDWIDTH_EXPANSION |
| OMX_IndexConfigBackgroundNoiseControl | OMX_AUDIO_CONFIG_BACKGROUND_NOISE_CONTROL |
| OMX_IndexConfigComfortNoiseGeneration | OMX_AUDIO_CONFIG_COMFORT_NOISE_GENERATION |
| OMX_IndexConfigDownlinkNoiseControl | OMX_AUDIO_CONFIG_DOWNLINK_NOISE_CONTROL |
| OMX_IndexConfigMultiMicrophoneNoiseControl | OMX_AUDIO_CONFIG_MULTIMICROPHONE_NOISE_CONTROL |
| OMX_IndexConfigSidetoneHowlingControl | OMX_AUDIO_CONFIG_SIDETONE_HOWLING_CONTROL |
| OMX_IndexConfigTransducerDynamicRangeControl | OMX_AUDIO_CONFIG_TRANSDUCER_DYNAMIC_RANGE_CONTROL |
| OMX_IndexConfigUplinkDynamicRangeControl | OMX_AUDIO_CONFIG_UPLINK_DYNAMIC_RANGE_CONTROL |
| OMX_IndexConfigVoiceClarity | OMX_AUDIO_CONFIG_VOICE_CLARITY |
| OMX_IndexConfigWindNoiseControl | OMX_AUDIO_CONFIG_WIND_NOISE_CONTROL |
| OMX_IndexParamAudioEncoderStatus | OMX_AUDIO_ENCODER_STATUS |
| OMX_IndexParamAudioDecoderStatus | OMX_AUDIO_DECODER_STATUS |
| OMX_IndexParamAudioUplinkTiming | OMX_AUDIO_UPLINK_TIMING |
| OMX_IndexParamAudioRfTxPower | OMX_AUDIO_RF_TX_POWER |
| OMX_IndexOtherExtraDataBaseType | OMX_OTHER_EXTRADATABASETYPE |
| OMX_IndexParamDlSpeechInfoType | OMX_AUDIO_DLSPEECHINFOTYPE |
| OMX_IndexExtraDataDlSpeechInfoType | OMX_AUDIO_EXTRADATA_DLSPEECHINFOTYPE |
| OMX_IndexParamDaAdTimingDiffType | OMX_AUDIO_DAADTIMINGDIFFTYPE |
| OMX_IndexExtraDataDaAdTimingDiffType | OMX_AUDIO_OMX_AUDIO_DAADTIMINGDIFFTYPE |
Use case examples
This chapter describes the example use cases.
Initializing IL Component
Setting Tunnel between IL Components
De-Initializing IL Component using Tunneling
CS Speech Call
Initializing CS Speech Call
Tunneled control between speech processing component and microphone source is not included in this example. Initialization can happen also in different order.
Dataflow in CS Speech Call
Only one full round of data flow is shown. Uplink and downlink data flow do not have dependency with each other, so the timing between them can be totally different from that shown in the example sequence diagram. Buffer sharing should be used for minimizing the delay in signal path.
Ending Speech call
Comments
Janneuitto said…
Arunabha said…
Thanks for the feedback , they are fixed now
--Arunabha 12:52, 24 November 2009 (UTC)
Sign in to comment…















Images are not visible in the document.
--Janneuitto 06:43, 28 October 2009 (UTC)