Category: 3D Audio

Concert-hall Acoustics (3/3)

Pätynen, J., & Lokki, T. (2016). Concert halls with strong and lateral sound increase the emotional impact of orchestra music. The Journal of the Acoustical Society of America, 139(3), 1214-1224.

After several studies and hypothesis, the following experiments were made using skin conductance as an objective measure of arousal or emotional impact. This is interesting to correlate with the previous findings – and something i might be able to do soon.

For the listening tests 28 subjects were chosen. They were either music consumers or music professionals.

In the first experiment, they’ve listened to stimuli the following way:

Pilot signal + 15s Silence + 12 Stimuli (each with 15s Silence)

In the second experiment, participants made paired comparisons between two stimuli, and they had to choose the one that produced a higher overall impact on “you”. Impact was described as thrilling, intense, impressing or positively striking. Again, participants could jump seamlessly between stimuli to make the comparison.

Connecting the results with he plants from the rated concert-halls, it was possible to define the following conclusions:

– Halls with rectangular typology have a more impressive sound (because more sounds reverberate from the lateral directions);

– Positions closer to the orchestra were found to elicit stronger emotional responses.

The methodological interest I bring fromt these three studies is the possibility to seamlessly navigate trhough the stimuli in order to make a rating. this, nevertheless, makes a very specific rating, only to that sample.

Concert-hall Acoustics (2/3)

Lokki, T., Vertanen, H., Kuusinen, A., Pätynen, J., & Tervo, S. (2010, August). Auditorium acoustics assessment with sensory evaluation methods. In Proc. ISRA (pp. 29-31).

After this, I got interested in the method used and looked for more details. It seemed very similar to what I’ve done before with Kansei.

The previous study was made using this graphic user interface, where assessors could seamlessly switch between audio clips (just like between wine sips). The continuous scale ranged from 0 to 120.


The assessors were recruited via an online questionnaire with three parts: a) a pure tone audiometric test; b) a test for vocabulary skills, c) triangle test for the discriminative skills of audio stimuli (FromWikipedia: The assessors are presented with three products, two of which are identical and the other one different. The assessors are asked to state which product they believe is the odd one out)

20 assessors were selected, all with music background, and each made four sessions in total. In the first two sessions they made the attribute elicitation and in the last sessions they’ve used the attributes and scales.

As for the analysis, the classification of the attributes could be made manually, but it was made with AHC – Agglomerative hierarchical clustering. Than, further analysis were made using Multiple Factor Analysis (MFA), which has a PCA as basis. The results are presented here.

Concert-hall Acoustics (1/3)

Lokki, T. (2014). Tasting music like wine: Sensory evaluation of concert halls. Physics Today, 67(1), 27.

A few months ago, I read this article called “Tasting Music like Wine: Sensory evaluation of concert halls” by Tapio Lokki and was fascinated by two things:

– The lightness of the article and how it introduced such a complex topic as concert-hall acoustics with an anecdotal situation;

– The methodological intrincacy with all the 3D sounds recorded in such an enginious way (all orchestra musicians were recorded solo, placed 24 columns in a stage, each column playing only one instrument, and recorded the full “orchestra” in several places of the venue. Very simply put.)

– After all, there were three interesting things: I loved the use of Wine tasting know-how for the evaluation of the subjective experience of concert-halls.

So the situation is the author and his wife are listening to a concert while drinking some wine. While the wife enjoys the concert but not so much the wine, the author felt totally the opposite. Both perceived wine and music differently. After some thought, the author concluded that both wine and music have a lot in common, because each can be characterized by a multidimensional array of perceptual attributes.

Both are a matter of personal taste, and each person may concentrate on different aspects of the taste or sound. The thing is, winemakers have a solution for this, and have since long developed techniques to determine what makes good or bad wine.

Like the aroma wheel.


The first question than is: could these methods be tailored for the perceptual evaluation of concert-halls?

The wine tasting methods like sensory profiling demand comparison of samples, that is, imagine you have a table with a line of glasses, all with wines different from each other, and you may and must drink a sip from one and the other as many times as you, as an assessor, find necessary. Could this be made with sound?

The answer is yes, and please read the original article to find out how.

In winetasting two methods are used to gather attributes of wines: consensus vocalubary profiling, when a number of assessors reach a number of consensual adjectives for each wine; and individual vocabulary profiling – the one used in this work – where a number of assessors (usually 15 or more) salient which charactersitics can be found in the wine.

The first experiment had 20 listeners, and all heard 3 recording positions out of 3 Finnish concert-hals. Together, they’ve suggested 102 attributes. After clustering the data, one cluster (overall volume and perceived distance) managed to explain more than 50% of the variance.

The second experiment had only one distance – 12m from the stage -, 9 halls and 17 assessors. They’ve suggested 60 attributes clustered in 7 groups.


  1. Definition
  2. Clarity
  3. Reverberance
  4. Loudness
  5. Envelopment
  6. Bassiness
  7. Proximity

After more analysis (hierarchical multiple-factor analysis), it was possible to distinguish two groups out of this last evaluation (after ordering by preference also): one group preferred intimate sound in which they could easily distinguish individual instruments and lines, and another group which preferred louder and more reverberant sound with good envelopment and strong bass.

Very impressive how it was possible to understand this information. Would portuguese listeners make the same evaluation?

Visual Search Performance With 3-D Auditory Cues: Effects of Motion, Target Location, and Practice

McIntire, J. P., Havig, P. R., Watamaniuk, S. N., & Gilkey, R. H. (2010). Visual search performancke with 3-D auditory cues: Effects of motion, target location, and practice. Human Factors: The Journal of the Human Factors and Ergonomics Society.

This was an interesting paper because it was the first to approach the facilitator effects of 3D audio cues applied to moving stimuli.

A lot of previous research has demonstrated that 3D audio does reduce the time of visual search, the subjective workload, etc. However, to the authors knowledge, all research on the subject was made using static targets among static distractors, and this paper addresses this gap by using an environment with dynamic stimuli.

The set-up was similar to a regular searching task. The participant’s head movements were monitored via a head-tracking system. The auditori stimuli were presented via headphones. The sound cue consisted of three consecutive 50-ms bursts of wideband white Gaussian noise separated by 25-ms gaps of silence and ending with 250 ms of silence, totaling 450 ms. The sample rate was 44,100 samples/s. The cue was repeated during each trial in the auditory conditions until a response was given and was presented at a comfortable listening level, approximately 50 to 60 dB SPL. The sound cue was filtered with the use of a generic set of head-related transfer functions (HRTFs) in the National Aeronautics and Space Administration’s Sound Lab software (see SLAB, n.d.; also see Miller & Wenzel, 2002).
When coupled with the head-tracking sys-tem, the SLAB software rendered the auditory cue so that it was collocated with the visual tar-get (dynamic or static) regardless of where the participant’s head was pointed.

During the task, the participants has to look at a display with 15 distractors and 1 target, find the target, and respond in which side of the target was the stimuli gap – it was a two-alternative forced choice procedure.



There were four conditions to the experiment: 1. static environment with no audio cues; 2. static environment with 3D audio cues; 3. dynamic environment with no audio cues and; 4. dynamic environment with 3D audio cues. In totality, the experiment recorded 2,816 trials per participant: 4 (sessions) × 4 (con- ditions) × 176 (trials per condition).


From the results, it was clear that conducting visual searches in moving environments was more difficult than searches in static environments, regardless of whether 3-D auditory cues were present. The significant main effect of auditory cue indicates that average search times decrease when 3-D auditory cues are provided, regardless of the search environment. The auditory cues reduced overall search times by an average of 430 ms (from 1,800 to 1,370 ms), an improvement of 24%.


No practice effects involving 3-D audio were found, and the beneficial effects of 3-D audio were evident in the first experimental session, supporting its purported ease of use.

As for the angle of the starting location of the target, it had a strong effect on search times and on the effectiveness of 3-D audio. Search times were generally faster when targets were located closer to the fixation point (smaller eccentricities) and when located on the horizontal plane. Importantly, 3-D audio provided the largest benefits to search performance when the target appeared at farther eccentricities and/or on the horizontal plane.

As a conclusion, the author write some ideas for future research. Objects in the real world often move on nonlinear paths in three dimensions and may appear anywhere in the spatial environment. So it may be appropriate to examine different types of motion and search-field sizes in future research.

Gorillas we have missed: Sustained inattentional deafness for dynamic events.

Dalton, P., & Fraenkel, N. (2012). Gorillas we have missed: Sustained inattentional deafness for dynamic events. Cognition124(3), 367-372.

The ability of selective attention is a crucial ability that allows us – and only that way  – to behave effectively in a world full of simultaneous stimuli.

Following the inattentional blindness paradigm, the authors focused on hearing,since it is considered an early warning system, tuned to detect unexpected stimuli. Is it rightly tuned?

In order to replicate the effect in hearing, Dalton and Fraenkel dissecated the inattentional blindness paradigm into three components:

1. A task relevant stimuli

2. A task irrelevant stimuli

3. An unexpected critical stimulus

This last ingredient should be similar to the irrelevant stimuli only in the dimension that differed them both from the relevant stimuli. However, they should differ from each other in other dimensions such as spatial location, speed, trajectory, shape, etc.

Having said this, the intriguing thing of the inattentional deafness effect is that “the similarity between the unexpected critical stimulus and the irrelevant stimuli on the dimension upon which relevant and irrelevant are defined, can prevent the detection of the critical stimulus, despite its salience on a number of other dimensions” (note to self -> is this somewhat molded by expectations? Are we tuned to expect some things based on experience (we are!), and does this speed up processing? Expectations as a tool to process the world faster.)

This doesn’t seem very efficient, because in real world situations, processing new and unexpected stimuli – fire alarms, unexpected movements – is likely to be more important than processing of continually present yet task irrelevant scene elements.

The twist on this experiment was that the authors used binaural sound to provide a realistic audio scene and the critical stimuli was dynamic. When thinking about the dichotic listening task, this set up makes spatial separation harder.


So, two men and two women were separately placed in a room, preparing a party. The dummy head was placed between these two tables. A man saying “I am a gorilla” passed near the men.This was the critical stimulus and it lasted for 19s.

In both experiments, the channels were reversed for half of the participants in order to balance for potential orientation effects.

In experiment 1, the gorilla passed near the men. Results showed that 90% of the participants attending to men’s conversation mentioned the gorilla. However, only 30% of the participants attending to the women’s conversation mentions the gorilla.

In experiment 2, the gorilla was presented in “mirror image”, such that it appeared on the other side of the screen, passing near the women. This was somewhat more flagrant than experiment 1 in the sense that the critical stimulus was near the relevant stimulus, and was different at least in the voice tone.

This time, 65% of the participants listening to men mentioned the gorilla, while only 45% listening to women mentioned it.

The results showed relevant evidence for the inattentional deafness effect with dynamic stimulus in 3D audio scenes. This finding can have serious implications in road safety.

Hopefully, more on that later.


Monitoring the simultaneous presentation of spatialized speech signals in a virtual acoustic environment

Nelson, W. T., Bolia, R. S., Ericson, M. A., & McKinley, R. L. (1998).Monitoring the simultaneous presentation of spatialized speech signals in a virtual acoustic environment (No. ASC-98-1409). AIR FORCE RESEARCH LAB WRIGHT-PATTERSON AFB OH HUMAN EFFECTIVENESS DIRECTORATE.

In this report the authors wanted to assess the effects of 3D audio information on an operator’s ability to detect, identify and monitor the presentation of a critical sign phrase among multiple simultaneous speech signals.

Things I considered worth of notice in this paper:

– Spatialization of simultaneous speech signals increased the percentage of correctly detected and identified critical speech signals and did not affect the response times of correctly detected signals.

– Multi-sensory presentation of spatial information can serve to enhance performance efficiency.

– Correct detections varied inversly with the number of simultaneous talkers and that female spoken critical signals were detected more often than male ones when four or more talkers were detected simultaneoulsy.



When talking about the modern fighter aircraft, the increasing perceptual, perceptual-motor and cognitive loads should motivate the exploitation of humans’ ability to perceive and process spatial auditory information.

On a previous study made by Bronkhorst, Veltman & van Breda (1996), they created a task that consisted on the localization and pursuit of a target aircraft as quickly as possible in one of the following four conditions: 1) No display 2) 3D audio display 3) Visual display 4) 3D audio + visual display. Once again, the average search time was reduced on the 3D audio plus visual display condition.

Inspired by the Cocktail party effect, stating that spatial separation of acoustic signals improves the intelligibility of signals in noise and assists in the segregation of multiple sound streams, the authors wanted to verify the same, only using localized 3D audio signals.

As for the stimuli used, they used speech phrases, although according to Ricard and Meirs (1994), the accuracy of localizing speech stimuli is comparable to that of non-speech stimuil presented via headphones using non individualized HRTFs.

In this experiment, the authors varied the number of simultaneous signals, the location and spatial separation of speech signals and the sex of the talker.

During each trial, participants monitored the simultaneous presentation of multiple spatialized speech signals. Their task was to listen for the  occurrence of a critical call sign (“Baron”) and to identify the color-number combination that appeared to emanate from the same spatial location as the critical call sign. This was done by pressing the key on the response device that was of the appropriate color and marked by the appropriate number. Thus, the appropriate response to “Ready Baron Go To Red Six Now” would have been to press the red key labeled with a number six. If the critical call sign were not present, the listener was required to press the “no-response” key.

Here are two examples of the five possible locations: (front right quadrant (RQ), front hemifield (FH), right hemifield (RH), full 360° (F), and a non- spatialized control (C))



Results revealed that:

– spatialized conditions were associated with higher detection scores as compared with the non-spatialized control condition;

– spatialized conditions didn’t differ from each other;

– identification scored associated with each of the four spatialized auditory conditions were superior to the on spatialized audio and the four didn’t differ from each other.

– increases in the number of simultaneous talkers produced dramatic decrements in performance efficiency;

– in the NASA-TLX questionnaire, no signifficant differences were found between the 360º and control conditions;


This last result is interesting. According to the authors, the data suggests that better performance was accompanied by additional information processing demans. Could it be that 3D audio busts workload?

Also, and in terms of future application, it’s interesting to verify that no differences of location were found, meaning that the design of audio interfaces has a certain freedom.