What is Audio Visualization?

Music visualization is a feature found in media player software and electronic music visualizers. It produces animated images that correspond to the piece of music that’s playing.

These images are rendered in real-time and synchronize with the music being played.

Visual techniques can be simple such as simulations of an oscilloscope display, or they can be more complex ones that combine several composited effects.

Generally, visualization techniques depend on the changes in the volume and frequency spectrum to generate the images.

The higher and stronger the correlation between a musical track’s spectral characteristics (such as amplitude and frequency) and the objects or components of the visual image being generated, the more effective the visualization is.

As a song is being played on a visualizer, the audio data is read in extremely short time slices (usually spanning less than 20 milliseconds).

The visualizer, then, does a Fourier transform on each slice. This means that it extracts the frequency of components and produces visual display according to the frequency of information.

The programmer is responsible for how the visual display responds to the frequency info.

To update the visuals at appropriate times with the music without overclocking the device, the graphics methods have to be extremely fast and lightweight.

Music visualizers used to (and still do) modify the color palette in Windows directly to get the most awesome effects.

One of the trickiest parts about music visualization is that the frequency-components-based visualizers don’t usually respond very well to the beats of music such as percussion hits and similar sounds.

However, you can write more responsive –and often complicated- visualizers to combine the frequency-domain information that is “spikes-conscious” in the audio that is responsible for corresponding to percussion hits.

Basically, what you do is you take a certain amount of the audio data and analyze the frequency of its components. Then, you use this data and modify some graphic that is later displayed repeatedly.

The most obvious way to run this frequency analysis is by using an FFT. But you can use a simple tone detection with a lower computational overhead.

For example, you write a routine that draws a series of shapes arranged in circles over and over. Then, you should determine the color of the circles using the dominant frequencies and set the size using the volume.

Wait a second, what’s FFT?

When you search for FFT or “Fast Fourier Transform,” you’ll get tons of complicated math equations and graphs.

When you’re building a music visualizer, the most common way the music will be represented digitally is through a standard waveform.

This typically shows how loud the song gets over time. That’s why we see lots of big spikes around the middle of the song, and these get smaller as the sounds get lower.

The representation of the amplitude (loudness) over time is referred to as the time domain.

However, this only gives us information about how loud a song is at specific points. To learn more information and identify more components, we use a Fourier transform.

So rather than representing audio in the time domain, a Fourier transform enables users to represent audio in the frequency domain. And instead of only showing us amplitude and time, we’ll be seeing both amplitude and frequency.

Different Audio Representations

Audio data can be processed in multiple ways, the simplest of which is displaying it as waveforms that change quickly, and then applying some graphical effects to that.

In the same manner, components like volume can be calculated and passed as information that feeds a graphics routine without needing a Fast Fourier Transform to get frequencies –simply calculating the average amplitude of the signal.

To convert the data to frequency domain using an FFT allows you to use more complicated and advanced effects, including things like spectrograms.

It’s admittedly quite tricky to detect some things that may be a little obvious such as the timing of drum beats or the pitch of notes from the FFT output.

Generally, reliable beat-detection and tone-detection are quite tough, especially when you want those in real-time.You can see a simple representation of amplitude-based audio-detection on this site. Just play your favorite song and see it visually represented.