A few recent happenings at work along some recent internet threads have had me pondering this recently, and I'd like to share my thoughts.
Here in part one I want to get into some of what I believe to be the root causes of these types of communication problems. Understanding the causes can help us navigate to the solutions.
First and foremost, people's ears are very highly tuned in comparison to their eyes. It takes tens of thousands of audio samples per second to convince the ears that something real is happening around them. By contrast it only takes about 20 or so video samples per second to achieve the same effect. This is why film standards have 48,000 audio samples playing back every second and only 24 frames of video in that same second.
Our ears also far outpace out eyes in terms of dynamic range.
A human is capable of hearing (and usefully discerning) anything from a quiet murmur in a soundproofed room to the sound of the loudest heavy metal concert. Such a difference can exceed 100 dB which represents a factor of 10,000,000,000 in power. A human can see objects in starlight (although colour differentiation is reduced at low light levels) or in bright sunlight, even though on a moonless night objects receive 1/1,000,000,000 of the illumination they would on a bright sunny day: that is a dynamic range of 90 dB*wikipedia
That last 10db of dynamic range perception that ears have over eyes is at the far end of a logarithmic scale, which means that the difference = a LOT.
Then there is the general function of sound in creative contexts, which is to create emotion. The visuals set the stage and to some degree tell the story, but it is the sounds that get the pulse racing or tug at the heart strings or otherwise cause the audience to feel what the directors wants them to feel. This means that sound decisions are fundamentally gut-level decisions that require technical means to execute.
Finally there is the unique relationship to time that sound has. Visuals can move fast or slow without changing intelligibility to a degree that sound cannot. Big sonic moments have to be set up with space in order to achieve full impact. Low frequency things played at high rates of speed (you know, frequencies) cease to be low frequency things. These things are part of the physics of sound and have no real correlation in the visual side of things. Also, you cannot pause a sound and examine it the way you can a moving picture. You can't just point to something and say "change that."
What all of this means is that it really takes a deep understanding of how sound works and can be manipulated for clients and directors to communicate with their audio personnel, and it truly takes years of work and study and thought to develop that level of understanding.
The end result is that in the sound world the vocabulary tends to be under-developed - especially in comparison to the visual arts. Directors have an easy time saying "brighter, darker, redder, grittier" but have an incredibly difficult time expressing parallel concepts in sound because of the degree to which sound is an abstract concept to people that don't spend all of their waking hours manipulating it. Things like dynamic range, frequency content, imaging, tempo, melody, and the effect of blank space are difficult concepts to pull out of thin air, and if your client isn't incredibly skilled in this form of communication then they are very likely to request things that are bad ideas, or to have good ideas and be unable to articulate them clearly. They're also used to being able to do things visually (like cut many things quickly for visual impact) that can have unintended and unpleasing consequences if attempted sonically.
In the next few posts I'll do my best to break down the few stereotypical "bad" requests that we sound people get, and lay out some ways to decode the client's meaning and deliver what they're looking for.