I’ve bumped into this really interesting project that sonifies data.
It is called Data Driven DJ and every month the author launches a video pairing algorithmically generated tunes with visualization, based in real data.
Heres is an example:
On the 28th of January 1986 the space shuttle Challenger exploded and seven astronauts died due to the erosion of two O-rings. These rings lost resiliency because the launch happened on a very cold day.
On the previous day, after seeing the weather forecast for launch day, the space shuttle engineers, responsible for its construction, advised the responsible agency, NASA, to postpone the event.
These engineers had several indicators that the launch could go wrong: history of O-ring deterioration on previous launches, the physics of resiliency, and other experimental data. All this information was faxed to NASA in 13 tables.
Although it was the only cancelling recommendation in twelve years, NASA officials were quite surprised, they pointed several flaws on the presented tables and suggested a reconsideration.
And it was reconsidered.
And in the next morning, the Challenger space shuttle exploded 73 seconds after launch, due to the low temperatures affecting the O-rings.
Tufte enumerates several mistakes on those 13 tables. Some had no ID and it was hard to track the responsability of who made the observation, sometimes the same shuttle had three different names. We can see some examples on the images below. They are note very clear about the already existent erosion on the O-rings under low temperatures.
At a given time, NASA officials and engineers were focusing not on the erosion data but on the blow-by data. They’ve noticed that blow-by happened on the day with the highest temperature, just like it happened on the day with the lowest. The blow-by data were irrelevant in this case, and they’ve changed the focus of attention that should’ve been given to other crticial elements.
These 13 tables didn’t manage to avoid launch, but who made them was right: They were thinking casually, but they were not illustrating casually.
Tufte gathered the information in the tables and in other reports sent afterwards, and organized them as a function of temperatures, like so:
This way we can clearly see that there are more erosionn cases when temperatures were low.
The same information presented on a matrix
After the incident, presidential commissions and investigation were convened. Their illustrations made the same mistake: they lacked labels, the cause-effect relation is not clear, and they are out of order.
See what would happen if the information was ordered by temperature instead of cronologically.
Would anyone dare to launch the Challenger shuttle if the data was presented differently?
Although we often hear that data speak for themselves, their voices can be soft and sly.
– Frederick Mosteller, Stephen E. Fienberg, and Robert E.K. Rourke, Beginning statistics with Data Analysis (Reading, Massachusetts, 1983), 234
How to communicate information the best way, having on account limitations we naturally have (perception, attention, memory) is no easy task, especially if you want to do it in a clean, transparent and comprehensible manner.
In this case, Tufte refers to two situations to illustrate a good and a bad way to communicate information: the cholera epidemic in a 1854 London and the Challenger shuttle disaster in 1986.
In the first situation, Jon Snow made a wonderful detective work trying to discover the source of the epidemic.
According to Tufte, he placed his data in the adequate context to draw cause and effect relations. The original data listed the victim’s names according to their time of death. This could originate displays based n time, or epidemic cronologies like the following graphics. However, time passing is not an explanatory variable, it is even useless when the goal is to discover a strategy of intervention.
What Jon Snow made was to mark each death on a map, where he also signaled the location of the 13 water pumps of the neighbourhood (in the following image, each trace means the number of deaths in each home, and the water pums are the circles. See next to the D in BroaD Street.)
The association between cholera and the water pump localization is evident in this map, and it allows us to compare this scenario with other places with water pumps and no cholera-related deaths.
Besides analysing why people were dying, Snow wondered why other people from the same neighborhood were not dying. He mentioned the case of the brewery, painted in yellow on the map.
You guessed it. The brewery owner allowed his employees to drink some glasses of beer…this way none drank water when thirsty!
Faced with all these evidences, Snow suggests the removal of the Broad Street water pump.
This is where the funny graphical manipulations begin.
In this daily quantification of deaths, we can see that when the water pump handle was removed, there was already a tendency to the reduction of deaths occuring each day. This happened because people were running away from the epidemic, thus the neighbourhood had less and less inhabitants. So…it’s not clear if the reduction of deaths was exclusively due to the handle removal.
What if the information was plotted this way?
In a weekly aggregation, it really looks like the reduction of daily deaths is due to the water pump handle removal.
The “time” variable is really sensitive to the choice of intervals. And we can see here how easy it is to manipulate the display in order to communicate certain points of view…