Forensics 11 min read

Audio Forensics: How Police Analyze Voice Recordings and Sound Evidence

How forensic audio analysis works — from voiceprints and spectrograms to the real criminal cases solved by sound evidence. The science behind what you hear.

February 7, 2026

On February 13, 2017, two teenage girls — Abby Williams and Libby German — went hiking on the Monon High Bridge Trail near Delphi, Indiana. They never came home. Their bodies were found the next day. But Libby, who was 14, did something remarkably brave in what were likely the last minutes of her life. She turned on her phone's video camera.

The footage captured a man walking toward them on the bridge. And it captured his voice — just three words: "Down the hill."

That brief, grainy audio clip became the single most important piece of evidence in the case. Indiana State Police released it publicly, hoping someone would recognize the voice. It was played on national news, shared millions of times, analyzed by forensic audio experts who tried to extract every possible detail — accent, age, speech patterns, even the emotional state of the speaker. For over five years, those three words haunted an entire community. In 2022, Richard Allen was arrested and eventually stood trial, with that audio playing a role in one of the most closely watched murder cases in recent American history.

Three words. A phone in a pocket. And an entire investigation shaped by what a microphone picked up.

This is audio forensics. And it's far stranger and more powerful than most people realize.

What audio forensics actually is (it's not "enhance")

If your entire understanding of audio forensics comes from crime shows, I need to reset your expectations. There is no magical "enhance" button. Nobody is going to take a garbled, blown-out recording and suddenly produce crystal-clear studio-quality audio. That's Hollywood, and it has done real damage to jury expectations — forensic analysts have actually complained about this, calling it the "CSI effect."

What audio forensics actually involves is the scientific examination of sound recordings for use as legal evidence. It's been accepted in U.S. courts since the 1960s, and the field has expanded enormously since digital recording became ubiquitous. There are four main branches, and each one is its own deep rabbit hole.

Authentication answers the question: is this recording real? Has it been edited, spliced, or fabricated? With deepfakes getting scarily good, this has become one of the most critical areas in forensic science. Analysts look for discontinuities in background noise, irregular edit points, and metadata anomalies that reveal tampering.

Enhancement is about making the inaudible audible. Not by inventing information that isn't there, but by using filtering, noise reduction, and spectral processing to pull a voice or sound out from under layers of interference. Think of it like cleaning a muddy painting — the image was always there, you're just removing what was covering it.

Speaker identification matches a voice to a person. This is the branch most people think of, and it's genuinely fascinating. More on this in a moment.

Sound classification identifies specific sounds — gunshots, glass breaking, vehicle types, even specific weapon calibers from the acoustic signature of the shot. The ShotSpotter system deployed in over 150 U.S. cities uses microphone arrays to detect gunfire, triangulate the location, and alert police within 60 seconds. It classifies the sound and can estimate the weapon type before officers even arrive on scene.

Voiceprints: your voice is as unique as your fingerprint

Here's something that genuinely surprised me when I first dug into this. Your voice is shaped by the physical structure of your body in ways that are almost impossible to disguise. The length and thickness of your vocal cords, the size and shape of your throat, mouth, and nasal cavities — all of these create what are called formants, which are resonance frequencies unique to your anatomy. You can change your pitch. You can fake an accent. You can whisper. But your formants are determined by your bones and tissue, and they're incredibly difficult to alter.

Forensic analysts visualize these patterns using spectrograms — graphical representations that show frequency on the vertical axis, time on the horizontal axis, and intensity as color or brightness. When you look at a spectrogram of someone speaking, you're essentially seeing a visual fingerprint of their voice. The patterns of formant frequencies, the transitions between sounds, the micro-hesitations and breathing patterns — no two people produce identical spectrograms.

The term "voiceprint" was actually coined in the 1960s by Lawrence Kersta at Bell Labs, who was one of the first to argue that voice patterns were as individually distinctive as fingerprints. That claim was controversial at the time, and honestly, the accuracy debate continues today. Voice identification is not as reliable as DNA or actual fingerprints. Environmental conditions, emotional state, illness, and aging all affect the voice. A cold can shift your formants. Stress changes your speech patterns.

But the technology has gotten dramatically better. Modern systems using AI and machine learning can analyze hundreds of vocal characteristics simultaneously — not just formants, but pitch contour, speech rhythm, pronunciation habits, and even the way someone breathes between words. The accuracy rates reported by leading systems now exceed 95% under controlled conditions. Not perfect. But good enough that courts in dozens of countries admit voiceprint evidence, typically alongside other corroborating evidence.

I find this genuinely unsettling, if I'm being honest. Your voice is biometric data that you broadcast every time you speak. And unlike a fingerprint, you leave it behind in voicemails, phone calls, social media videos, and every Zoom meeting you've ever been on.

DetectiveOS lets you analyze audio evidence with SignalPro — isolate frequencies and uncover what's hiding in the noise.

Try SignalPro

The tools and techniques that make it work

Forensic audio analysts work with specialized software — tools like iZotope RX (the industry standard for audio restoration), Adobe Audition, CEDAR (used by police forces worldwide), and open-source tools like Audacity and Praat (a phonetics analysis program used heavily in academic forensics). These aren't toys. iZotope RX alone costs several hundred dollars and is used by Hollywood studios, but it's also a frontline forensic tool.

Spectral analysis is the bread and butter. By converting audio into a visual frequency representation, analysts can literally see sounds that are inaudible to the naked ear buried under noise. A whispered conversation behind loud music. A voice beneath traffic noise. The click of a gun's safety being disengaged. On a spectrogram, these show up as distinct frequency patterns that can be isolated and amplified.

Noise reduction has come a long way from the crude filters of decades past. Modern adaptive algorithms can identify the "noise profile" of a recording — the consistent hum of an air conditioner, the rumble of an engine, wind interference — and surgically remove it while preserving the speech. It's not magic, but the results can be startling. Recordings that sound like pure static to the human ear can yield intelligible speech after expert processing.

But the technique that absolutely blew my mind is ENF analysis. This is where it gets wild. The electrical power grid operates at a specific frequency — 60 Hz in North America, 50 Hz in Europe. That frequency isn't perfectly constant; it fluctuates slightly based on load and generation patterns, creating a unique signature that changes every second of every day. And here's the thing: any recording made near an electrical source — which is virtually any indoor recording — picks up this hum, even if it's inaudible to you. It's embedded in the audio like a hidden timestamp.

Forensic analysts maintain databases of ENF fluctuations going back years. By extracting the ENF pattern from a recording and matching it against the database, they can verify exactly when a recording was made — sometimes down to the minute. If someone claims a phone call happened on Tuesday but the ENF signature matches Wednesday's grid pattern, the recording is either mislabeled or fabricated. This technique can also detect edits: if two segments of a recording have different ENF patterns, they were recorded at different times and spliced together.

Yeah, really. The hum of your light bulbs is ratting you out.

Gunshot acoustics is another remarkable sub-field. The sound of a gunshot carries an enormous amount of information. The initial muzzle blast, the ballistic shockwave of the bullet (which travels faster than sound with supersonic rounds), and the reflections off nearby surfaces all create a complex acoustic signature. Experts can determine the caliber of weapon, the approximate distance from the microphone, the direction of fire, and sometimes even the specific firearm model. In the 2017 Las Vegas shooting investigation, acoustic analysis of concert-goers' phone recordings helped establish the rate of fire and confirmed the shooter's location on the 32nd floor of the Mandalay Bay hotel.

Real cases where sound cracked the investigation

Audio evidence has played a pivotal role in more criminal cases than most people realize. Here are some that stuck with me.

The Delphi murders (2017)

I mentioned this at the top, but it's worth emphasizing the forensic work involved. The audio from Libby German's phone was just a few seconds long and heavily contaminated with ambient noise — wind, footsteps on the bridge, rustling fabric from the phone being in a pocket. Forensic analysts worked extensively to clean and enhance the recording. The voice analysis helped narrow the profile: adult male, likely local based on accent features, estimated age range. When Richard Allen was charged in 2022, the audio was a key exhibit, played for the jury alongside the brief video clip. Three words that took five years to lead to an arrest.

George Zimmerman / Trayvon Martin (2012)

The 911 calls from neighbors during the fatal confrontation between George Zimmerman and Trayvon Martin contained screams in the background. The central question: who was screaming for help? Two forensic audio experts testified for the prosecution that the screams did not match Zimmerman's voice. The defense challenged the methodology, and the judge ultimately ruled the voice identification testimony inadmissible, calling the scientific techniques too unreliable. This case became a landmark in the debate over voiceprint evidence admissibility and highlighted the genuine limitations of the technology. It's not a silver bullet, and courts know it.

Phil Spector murder trial (2007)

When actress Lana Clarkson was found dead in music producer Phil Spector's mansion, the 911 call made by Spector's driver became crucial evidence. Audio forensic analysis of the call — examining background sounds, timing, and Spector's audible statements — helped prosecutors reconstruct the timeline of events. The enhancement of barely audible words on the recording contributed to the prosecution's case. Spector was convicted of second-degree murder in 2009.

Background sounds as evidence

This is honestly one of the most fascinating applications. In kidnapping and ransom cases, forensic analysts have identified victims' locations from background sounds in phone calls. A specific train schedule passing at predictable times. The call to prayer from a nearby mosque, which follows a precise daily schedule. The distinctive hum of a particular factory's machinery. Bird species that only exist in certain regions. In one documented case, analysts identified the specific intersection where a phone call was made based on the pattern of traffic light changes audible in the background. Not the voice — the environment gave away the location.

Most people have no idea how much a recording can reveal. It's not just what someone says. It's everything happening around them.

Some cases in DetectiveOS hide crucial evidence in audio recordings. Can you hear what others missed?

Start a Case

The Hollywood myth vs. the real thing

We need to talk about the "enhance" problem, because it genuinely undermines public understanding of forensic science.

In movies and TV shows, someone plays a terrible recording, and a tech wizard types furiously for three seconds, hits enter, and suddenly the audio is pristine. Background noise vanishes. A whispered conversation from across a crowded room becomes perfectly clear. A voice recorded through a wall sounds like it was captured in a studio.

Not even close.

You cannot create information that doesn't exist in a recording. If a voice is completely masked by louder sounds at the same frequencies, no amount of processing will fully recover it — those sound waves are physically intertwined. You can't "zoom in" on audio any more than you can zoom in on a 50-pixel photograph and get a clear face. The data has to be there to begin with.

But — and this is important — what real forensic analysts can do is still remarkable. They can isolate frequency bands where speech exists and reduce energy in bands dominated by noise. They can use adaptive filtering to subtract consistent background interference while preserving transient sounds like speech. They can apply forensic spectral repair to fill in gaps caused by brief interruptions. They can slow audio down, shift frequencies, and apply phase analysis to separate overlapping sound sources.

The real thing is less dramatic than the movie version. There's no three-second miracle. An analyst might spend 40 hours on a single recording, trying dozens of different filter combinations, comparing results, documenting every step for the chain of evidence. But the output — a previously unintelligible recording that now yields a clear phrase, or a speaker identification that matches a suspect — can be just as case-breaking as anything Hollywood invents. It just takes patience, expertise, and a lot of math.

This is honestly terrifying if you think about it. Not the movie version, where enhancement is a magic trick. The real version, where a skilled analyst with the right tools can spend weeks pulling information out of audio you'd swear contained nothing but noise. The recording on your phone right now — the ambient sound in the room you're sitting in — contains more information than you'd ever guess.

8 forensic tools. 6 cold cases. Every piece of evidence matters — including what you hear.

Browse Cases

Hearing what changes everything

Audio forensics sits at this strange intersection of physics, linguistics, computer science, and criminal investigation. It's a field where the hum of a light bulb can prove someone is lying about when they made a recording, where three whispered words from a pocket can identify a killer, and where the ambient noise of a city street can pinpoint a location more precisely than a dropped pin on a map.

DetectiveOS includes a tool called SignalPro that lets you work with audio evidence — isolating frequency bands, filtering noise, revealing sounds buried in distorted recordings. It's simplified for gameplay, but it captures that electric moment when you strip away the interference and suddenly hear something that reframes the entire case. If you've ever wanted to know what that feels like, it's a pretty good taste.

"The ear is the only witness that cannot lie." That's not quite true — recordings can be fabricated, and ears can deceive. But the science of sound has become one of the most quietly powerful tools in criminal justice, and most people have never heard of it. Pun very much intended.

Ready to Investigate?

6 cold case mysteries. Forensic tools. Suspect interrogations. See if you can find the killer.

Start Investigating Learn More