Monitoring the simultaneous presentation of spatialized speech signals in a virtual acoustic environment

Nelson, W. T., Bolia, R. S., Ericson, M. A., & McKinley, R. L. (1998).Monitoring the simultaneous presentation of spatialized speech signals in a virtual acoustic environment (No. ASC-98-1409). AIR FORCE RESEARCH LAB WRIGHT-PATTERSON AFB OH HUMAN EFFECTIVENESS DIRECTORATE.

In this report the authors wanted to assess the effects of 3D audio information on an operator’s ability to detect, identify and monitor the presentation of a critical sign phrase among multiple simultaneous speech signals.

Things I considered worth of notice in this paper:

– Spatialization of simultaneous speech signals increased the percentage of correctly detected and identified critical speech signals and did not affect the response times of correctly detected signals.

– Multi-sensory presentation of spatial information can serve to enhance performance efficiency.

– Correct detections varied inversly with the number of simultaneous talkers and that female spoken critical signals were detected more often than male ones when four or more talkers were detected simultaneoulsy.



When talking about the modern fighter aircraft, the increasing perceptual, perceptual-motor and cognitive loads should motivate the exploitation of humans’ ability to perceive and process spatial auditory information.

On a previous study made by Bronkhorst, Veltman & van Breda (1996), they created a task that consisted on the localization and pursuit of a target aircraft as quickly as possible in one of the following four conditions: 1) No display 2) 3D audio display 3) Visual display 4) 3D audio + visual display. Once again, the average search time was reduced on the 3D audio plus visual display condition.

Inspired by the Cocktail party effect, stating that spatial separation of acoustic signals improves the intelligibility of signals in noise and assists in the segregation of multiple sound streams, the authors wanted to verify the same, only using localized 3D audio signals.

As for the stimuli used, they used speech phrases, although according to Ricard and Meirs (1994), the accuracy of localizing speech stimuli is comparable to that of non-speech stimuil presented via headphones using non individualized HRTFs.

In this experiment, the authors varied the number of simultaneous signals, the location and spatial separation of speech signals and the sex of the talker.

During each trial, participants monitored the simultaneous presentation of multiple spatialized speech signals. Their task was to listen for the  occurrence of a critical call sign (“Baron”) and to identify the color-number combination that appeared to emanate from the same spatial location as the critical call sign. This was done by pressing the key on the response device that was of the appropriate color and marked by the appropriate number. Thus, the appropriate response to “Ready Baron Go To Red Six Now” would have been to press the red key labeled with a number six. If the critical call sign were not present, the listener was required to press the “no-response” key.

Here are two examples of the five possible locations: (front right quadrant (RQ), front hemifield (FH), right hemifield (RH), full 360° (F), and a non- spatialized control (C))



Results revealed that:

– spatialized conditions were associated with higher detection scores as compared with the non-spatialized control condition;

– spatialized conditions didn’t differ from each other;

– identification scored associated with each of the four spatialized auditory conditions were superior to the on spatialized audio and the four didn’t differ from each other.

– increases in the number of simultaneous talkers produced dramatic decrements in performance efficiency;

– in the NASA-TLX questionnaire, no signifficant differences were found between the 360º and control conditions;


This last result is interesting. According to the authors, the data suggests that better performance was accompanied by additional information processing demans. Could it be that 3D audio busts workload?

Also, and in terms of future application, it’s interesting to verify that no differences of location were found, meaning that the design of audio interfaces has a certain freedom.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s