starts at 9am Orlando EST

Wednesday, May 15, 2013

Text Version ALAN R. REICH, Ph.D. Acoustic Constultants Report

Alan R. Reich, Ph.D.
Forensic Acoustics Consultant
May 9, 2013

Richard Mantei
Assistant State Attorney
220 East Bay Street
Jacksonville,  Fl.  32202

Dear Mr.  Mantei:

May this letter serve as a partial summary of my ongoing aural and

digital acoustical examination of two 911 recordings re: State of Florida

v.  George Zimmerman.  The supplied recordings were represented as

unredacted digital copies of original digital audio recordings.  You

requested that I process and analyze two 911 Dispatch recordings, 

hereafter referred to as CALL1 and CALL3.  Immediately after receiving

them,  I archived the zip-extracted files on magnetic and lazer media. 

In addition, several other digital recordings were supplied as possible

sources of voice exemplars for George Zimmerman and Trayvon Martin.  They

are described briefly in a subsequent section of this summary.

Technical Considerations Regarding the 911 Recordings

The monderate-fidelity 911 recordings presumably were the stereo output

of a 24-hour,  digital-audio recording system.  The sampling rate of the

911 recordings was only 8,000 samples/sec,  compared to the 44,100

samples/sec associated with the audio CD quality.  The frequency

bandwidth of CALL1 and CALL3 thus was estimated to be only 40 Hz to 4,000

Hz compared to an audio CD bandwidth of 10 Hz to 22,050 Hz.  Howver, 

this high-frequency insensitivity is not particularly troublesoe in the

present investigatiaon context,  since telephone systems are designed to

be relatively unresponsive to frequencies above 3,500 Hz.

Audio CD and 911 data-logging recording both have 16-bit amplitude

resolution,  which divides the vertical amplitude scae of the digital

signal into 2^16 =65,526 amplitude gradations.  Although 8-bit amplitude

resolution is attractive for situations requiring small data files,  it's

vertical scale has only 2^8=256 amplitidue gradations.  The 911-Dispatch

System's 16-bit resolution was critical to the success of this

investigation,  in which the recorded signlas had a very wide dymanic

range (from very distant speech to softly wishpered speech to a single

loud gunshot to several heart-poundingly loud screams.)

General Structure and Scope of the Present Investigation

In this summary, I will try to: (a) answer general questions regarding

the nature, usefulness, and scope of the materials on the CALL1 AND CALL3

recordings, (b) provide some illustrative examples of the approach that I

took to analyze selected words and phrases,  (c) discuss the complexities

and obstacle that one encounters when trying to decode highly distorted, 

emotionally driven,  overlapping speech,  and (d) provide an analytic

framework for arriving at trustworthy and perceptually stable

transcriptions and demo recordings of the most difficult-to-understand

speech on the CALL1  and CALL3 wave files.

page 2

Nature,  Usefulness, and Scope of the Material on the CALL1 and CALL3

Wave Files

CALL1 represents the digital audio record of George Zimmerman's 911 call

to report his seeing a young male whome he thought was acting

suspiciously.  The two speakers are Mr. Zimmerman and a male 911

Dispatcher.  The fidelity of CALL1 is reasonably good but the recording

has a number of puzzling acoustic anomalies.  There are numerous

instances of "nonconforming speech"  on CALL1,  e.g.,  whispered speech, 

pitch break,  garbled or unintelligible speech,  vocal impressions, 

tremulous speech,  and very rough voice quality.  The observed behaviours

were outside the customary speech modes of both the dispatcher and Mr. 


Those nonconforming segments indicate that Mr.  Zimmmerman frequently

shifts or switches voice modes or speaking styles.  His first utterance

on CALL1,  he syas " best...address I can give you is

one-eleven Retreat View Circle."  During the four-second untterance,  he

shifts from whispered voice to customary voice to detective impression

back to customary voice.  At 97 seconds,  the voiced but tremulous "These

assholes,  they always getaway."  is preceded by a whispered "Dear God"

and followed by a whispered "but not on me."

Mr.  Zimmerman's speech patterns periodically show measurable effects of

psychological stress (e.g., vocal tremor,  pitch breaks,  rapid speech). 

This latter finding is not to be contrued necessarily as negaltive since

perpetrator pursuits by enforcement officers typically are accomplained

by increased levels of adrenaline and excitatory neurochemicals.  In any

case,  Mr Zimmerman's vocal-mode switching behaviors need to be examined

in greater detail and correlated with relevant physical and behavioral

events on both recordings.

CALL3 principally represents the digital audio record of an unidentifed

woman caller,  a female 911 Dispatcher,  and two males involved in a very

loud but somewhat distant confrontation just outside the  woman caller's

home.  One of the male speakers appears to be George Zimmerman,  whose

idiosyncratic "voice-mode switching" behaviors,  vocial impressions, 

whispering, and tremulous voice are present on both CALL1 and CALL3.

For example,  approximately one second after the start of CALL3, Mr.

Zimmerman makes a seeminly religious proclamation,  "These shall be." 

His speech is characterized by the low pitch and exaggerated pitch

contour reminiscent of an evangelical preacher or carnival barker.  The

statement is challenging for the untrained listener to detect as it

occurs simultaneously with Trayvon Martin's loud,  high-pitched, 

distressed, and tremulous "I'm begging you." and the 911 Dispatcher's

"Nine-one-one."  Many of Mr.  Zimmerman's  "side-bar" utterances are

subject to such multiple-talker masking effects and to low signal levels.

The other male speaker was identified tentatively as Trayvon Martin from

the audio track of a digital video file present on Mr. Martin's cell

phone.  His voice is younger and he generates much of what some observers

have called screams.  If a scream is defined in operational terms as

speech with a very high pitch and loudness level,  then my findings would

support that conclusion.  The two males are engaged in a loud, 

purposeful,  mostly "turn-taking"  linguistic dialogue.  The speech

associated with the confrontation is often is quite difficult to

understand,  but is amenable to individualized digital enhancement and

computer-aided transcription,  using an interactive, segment-by-segment


Example of the Analytic and Scientific Approach

It is often helpful in scientific investigations to begin at the end and

work backwards,  slogging through the inevitably complex details to

arrive at a more complete understanding of multifaceted physical or

 Page 3

behavioural events.  Thus, my investigation began by addressing questions

about the last "scream,"  the very high-pitched,  very loud production of

a single monosyllabic word on the CALL3     wave file.

Speech and Hearing Scientists often characterize speech as a "series of

rapid,  complex, overlapping movements that have been made audible."  The

"final"cry" on the CALL3 recording is the result of very high-effort

speech movements,  but,  regrettably,  the large distance between the

highly distressed talker and the microphone of the 91 caller's phone

markedly attenuates or reduces the speech's amplitude.

Consequently,  the resulting sound pressure level of the final male

pre-gunshot utterance is 30.4 decibles(dB) below the Woman Caller's

"Yes."  When the amplitude level of the final word before the shot was

digitally gained or amplified by a factor of ten,  the word appears to be

"stop" not "help," as previously perceived by some listeners. 

Perceptually, the two monosyllabic words are quite similar and easily

confused,  especially within the context of a high-effort production.

Nonetheless,  digital spetrographic examination of the word's component

frequencies supports a "stop" transcription.  On CALL3,  the first

Formant or Resonant Frequency of the /a/vowlel in /stap/is 870 Hz,  about

10% above the adult male average.  This value is highly appropriate for a

17-year-old male who likely still had 10% more growth remaining before

reaching his "adult-male" vocal-tract length,  diameter, and tonicity. 

The resonant frequency position (largely related to oral,  nasal,  and

pharyngeal anatomy),  the fundamental frequency location (a physical

measure of pitch related principally to laryngeal anatomy),  and glottal

source spectrum (voice quality resulting from the complex, rapid

vocal-fold valving of exhaled lung air) suggest sthat the speaker had not

completed his homornally-driven,  anotomical and physiological tansition

into adult-male voice production.  In addition, the acoustic voice data

are consisten with the audio/video samples extracted from Mr. Martin's

cell-phone.  They are inconsistent with audio/video samples from Mr.

Zimmerman's crime-simulation video recording and from an audio recording

of a telephone conversations with his wife during his incarceration.

Taken together,  the above scientific observations of the recorded

pre-gunshot word allowed me to conclude tentatively that the word was

produced by the younger of the two male speakers,  Trayvon Martin.  The

scientific data may also explain why some witnesses have characterized

the final utterance as a "boy crying."  Of course,  the fact that the

speaker of the final word was rendered silent by the weapon's discharge

and George Zimmerman was not,  also suggests the identity of the "boy"

who was crying.

To illustarate my analytic approach to these acoustic data,  I am

attaching air pressure-versus-time waveforms and corresponding

frequency-versus-time spectrograms (KAY Pentax Multi-Speech) of the

interval that includes and closely surrounds the word "stop."  These

acoustical plots and a corresponding wave file comprise the raw speech

interval,  followed by the fully processed and enhanced version.  The

word "stop" on the raw interval,  followed by the fully processed and

enhance version.  The word "stop" on the raw intervale is very soft on

the wave demo,  very low in amplitude on the time waveform,  and lacking

complexity on the spectrogram.

Feasibility of Using Global Enhancement Strategies on CALL1 and CALL3

To explore the feasiblilty of find a less-time-comsuming approach to

analysing CALL1 and CALL3,  numerous global digital-enhancement

algorithmys (SONY Sound Forge Pro) were applied to the Microsoft Windows

WAV files,  with varying degrees of success.  Global enhancement

strategies are designed to improve the overall fidelity of a noisy,

distored,  and/or unbalanced recording.  In the

Page 4

present investigation,  the enhanced signals often were rendered somewhat

less noisy but the speech intelligibility was compromised or unchanged

rather than improved.

Thank you for allowing me to consult on this interesting case.  If you

have questions or need further information,  please feel free to call or



I declare under penalty of perjury under the laws of the State of New

Jersey that the foregoing is true and correct.  Dated at Oakland,  New

Jersey on May 9, 2013.
Alan R. Reich,  Ph. D.
Forensic Acoustics Consultant

No comments:

Post a Comment

Keep it Civil. Ignoring the evidence will not be allowed!
Thank you.