Accessories Bicycles Parts Specials Tools

Synchronizing Bicycle Video
find us on FB

John Allen photo
by John "Kitsch 'n' Sync" Allen
Spoke Divider

When shooting video using multiple cameras, you must synchronize their output so you can cut back and forth between them, or show a picture in picture. In my bicycle videos, I use forward-facing and rearward-facing cameras at the same time, and cameras mounted on different bicycles. Synchronizing 2D video is relatively easy. 3D video and multi-channel sound require tight synchronization and special equipment or techniques.

Connected cameras

Control of some of the newer action cameras, more than one at a time, is possible using wi-fi with a dedicated controller or smartphone. Control may extend beyond simultaneous starting and stopping, to multiple views on a screen. The range of wi-fi is limited though: it will work with cameras on a single bicycle, controlled by the rider, or with people riding within a few meters of one another. If you are in the market for cameras, you might check for this feature.

A high-tech way to synchronize video and audio recordings is called time code. Time code traditionally uses recorders a timing signal sent over cables and recorded along with the audio and video. Time code can control the playback speed of analog audio tape, and can fast-forward and rewind tapes to synchronize them with each other.

Time code also may be sent and received using wireless devices, but as of now, no consumer-grade action camera supports this feature. .

Professional-grade equipment which derives time code from GPS satellite signals is available. It only works where there is a view of the open sky, and is expensive and complicated to use. GPS can synchronize any number of cameras, anywhere on Planet Earth. GPS can also identify the location of the shoot to within a few meters, and the direction in which a camera is pointing, if it is moving.

Some of the newer action cameras record a GPS track and time stamp, which may be used to align recordings in post-processing. This is generally a high-end feature.

I expect that a smartphone app will be offering GPS timing sooner rather than later, but smartphone cameras are rather limited, and this feature would be more useful in a dedicated camera.

Spoke Divider

Spoke Divider

Put Your Hands Together

How do we synchronize when shooting on a budget, lacking time code?

We fall back on the classic synchronizing technique used in film: a marker visible in the image and audible in the soundtrack. Bicycle video takes are generally rather long – you start recording, and then ride – so setting this marker isn't much of an inconvenience.

A traditional wooden clapperboard

The clapperboard – with its chalkboard to indicate a take number – is the time-honored time-alignment tool in the motion-picture industry. The clapperboard provides a timing reference at the start of a take. The speed of all the cameras is synchronized to the power-line frequency. Once clips are aligned on the editing table, they remain in sync.

As I use video cameras which record audio, there's no need for a clapperboard's written identification of the take. I make a verbal announcement instead. For synchronization, I use a hand clap, visible to every camera and recorded in every soundtrack. If I am using two cameras on my helmet to get front and rear views, I back up to a window so the rear-facing camera shows the reflection, or tap the side of the helmet to shake both images and make a sound.

In the short video clip above, the main image is from a camera on my helmet. The picture-in-picture is from a camera duct-taped to the rear rack of a friend's bicycle. My camera is looking forward at him and his is looking back at me. You see and hear the hand clap, and then hear my short announcement identifying the take. Here is a still of the frame of the video where my hands come together:

Hand clap still

The timing of all modern video cameras and digital audio recorders is set by quartz-crystal electronic clocks. Timing standards are very precise. Cameras will typically run for several minutes without a noticeable loss of synchronization. We can thank documentary filmmaker Richard Leacock for introducing the use of crystal control. He used it with modified 8mm film and audio cassette recorders at the Massachusetts Institute of Technology in the early 1970s -- but there's no need any more to modify equipment.

You should run a test on your cameras to determine how well the timing between them agrees. You do this by recording a hand clap at the beginning and end of a long clip and counting the number of frames the timing drifts, if at all. I have had to add a duplicate frame once every 6 minutes to keep two of my cameras, a GoPro and a Contour, in sync.

Cameras generally record 4 gigabyte files, so their memory cards can use FAT32 data format, which is readable on a computer running MacOS, Windows or Linux. This means that every 10 minutes or so with HD 1080 video at 30 frames per second (longer, or shorter time, for other formats), a file must be saved and a new one started. Some cameras overlap the files or start the new one without skipping a frame. Others leave a gap of a few seconds. If you have a camera which leaves gaps, you are going to have to synchronize clips after the first one with additional hand claps or on random audio or video events -- which can be frustrating and time-consuming. My one camera which has this annoying deficiency is the Git2 Pro, otherwise a very capable camera.

One example of a random event is the sight and sound of a car rolling over a manhole cover. Footsteps also are good. Aligning on speech is harder, though the sound of the letter P begins when mouth opens suddenly, and the waveform in the soundtrack shows a sudden increase in volume. The letters B and M are less good, because the vocal cords are vibrating before the mouth opens, but may be usable. If you do much aligning on speech, you will find that you are beginning to learn to read lips!

In post-processing, I align the clips as described below.

Spoke Divider

Spoke Divider

Some Technical Background

Now, for a bit of technical background to help you use the hand-clap technique when editing video.

A video consists of a series of frames (individual images) shown one after another quickly enough to create the illusion of motion. The rate is nominally 30 frames per second (actually, 29.97 for color signals) in countries which use 60 Hertz (cycle per second) AC power; the rate is 25 frames per second in countries which use 50 Hertz AC power. The frame rate is nominally ½ the power-line frequency so that any effect of imperfect power-supply filtering stays in the same place, or nearly, from one frame to the next, rather than causing annoying flickering or wobbling.

Video editing is only by one-frame, 1/25 or 1/30 second, increments. Special software can adjust audio more precisely; and that may be desirable for some purposes. I'll discuss that later.

Some digital video storage formats align the audio to each frame. Other formats use keyframing, where the audio is only tacked to the video once every few frames. Between keyframes, these formats store the differences between images, resulting in smaller digital files. If a video clip doesn't start with a keyframe, the audio and video may start slightly out of step – and then they stay that way.

All of the better video editing software applications give you a way to realign tracks to adjust timing. To do this, you need a clear cue in every track -- preferably, your hand clap.

Speed-of-Sound Issues

For good synchronization on an audible cue, all microphones need to be within a few feet of the cameras. Sound travels only about 1100 feet (300 meters), per second, so a distant camera or microphone will record delayed audio – like when you see a dribbled basketball out of sync with its sound. A distance of only 30 feet or 10 meters will throw synchronization off by one video frame.Align a distant camera on the visual cue, rather than the audible one. An additional hand clap near the distant camera will align its audio and video tracks.

Aligning while Editing

When editing, I first turn on the application's “audio scrubbing” function so I can hear the hand clap or other cue. I check the hand clap to see whether the audio and video from each camera are in sync. If not, I detach the audio so it is in a separate track. I zoom in on the editor's timeline until I can move back and forth one frame at a time and stop the motion at the hand clap. I set a marker for the hand clap separately for each audio and video track, then slide the tracks until all the markers align.

Moving video forward one frame at a time in Pinnacle Studio, the editing suite I use, plays the audio for the frame that is being displayed. So, as I reach the same frame where the hands come together, I hear the sound of the clap, and see its sharp peak in the audio waveform display.

The image below is of the tracks from the video example above, in Pinnacle Studio 12. In the thumbnail images, you can see the first frame of each video. The lower thumbnail holds an icon indicating that it contains a picture in picture. In the orange line near the top, each tick represents an individual frame of video. The cursor is aligned over four markers, which I placed at the hand clap frame in each video and audio track, then slid left or right until they lined up. The video in the lower track is grayed out because it is locked, so I can move the audio independently of it.

Hand clap editing in Pinnacle Studio v. 15
hand clap editing in Pinnacle Studio

I align a take this way, save the resulting file, and then when I am editing, I save under a new filename so I still have the original full-length take to re-use.

Note: Pinnacle 12 is an older version. Iin Pinnacle Studio 16 and up, you can insert markers in an individual clip if it is open in the Effects Editor. Markers in the timeline will stay at the same time, not move with clips. You may also fake a marker by trimming or splitting a clip at the hand clap. Editing tricks will be different in other apps.

Spoke Divider

Spoke Divider

How Close Does Timing Have to Be?

You might ask: with timing only to the nearest 1/25 or 1/30 second, will there be an echo? No! The sense of hearing and the frame rate of video are nicely matched. Humans can't hear an echo unless it is delayed by 1/30 of a second or more, a phenomenon called the “precedence effect” or the “Haas effect”, after the scientist who identified it. Two clips aligned to the nearest frame will be out of sync by no more than ½ frame, 1/50 or 1/60 of a second. The error may either increase or decrease slowly over time if the camera speeds are very slightly different. If the timing of two cameras is off by 1/2 frame, you might choose the alignment so drift decreases the error as the take progresses.

If you are especially picky atout timing, then having one of your two cameras (not necessarily both) running at 50 or 60 frames per second allows you to bring the accuracy of synchronization to 1/100th of a second or less.

Sometimes, synchronization requires conversion of the video format due to peculiarities of the camera, editing software, or both. The Aiptek digital video recorder with my (early) camera recorded at the standard 29.97 frames per second, but Pinnacle Studio reported it as 25. Similarly, in Pinnacle Studio, an Insight POV HD helmet camera reports speeds slightly different from the 29.97 at which it records. Clips from these cameras do not stay in sync with others. Converting the files into another format solves the problem, and then clips will stay in sync. I use AVS4YOU software and convert to AVI with the XVid/DivX Mpeg-4 codec, which keeps the file sizes down -- though this codec uses keyframing, so I sometimes will have to realign the audio to the video.

Closer Alignment for 3D Video and Multi-Channel Audio

There are two important applications in which closer synchronization is important. A very small timing error can seriously disturb 3D imaging and multi-channel (stereo or surround) sound.

Let's say that a pair of cameras recording 3D are panning from left to right. Then if the timing of the right-eye camera is slightly late, its image will be displaced to the left and objects will appear closer -- and so on, with the opposite delay and/or direction of panning. If the cameras are panning up or down, the images will become displaced vertically, resulting in eyestrain and/or failure of the images to fuse. The problem also occurs with objects in motion in the picture.

Timing must be very precise -- to a small fraction of a millisecond -- to avoid these problems. As a practical matter, the same timing source must initiate the scanning of each frame in both cameras.

GoPro used to offer a 3D system with a case for two cameras and a wired connection between them. This held them tightly in sync. There is now an 3D attachment for GoPro cameras which uses mirrors, totally avoiding the timing issue because it records a split image using one camera.

Now let's consider multi-channel audio. If you are feeding the right and left channels from each recorder to the right and left channels of a stereo mix, then each stereo pair keeps its timing. You need to adjust levels to select which stereo pair you will use at a particular time. Even with nicely-synchronized video, though, two pairs of stereo microphones recording at the same time can create odd stereo perspectives or echoes if one pair of microphones is much farther from the sound source than the other. As already mentioned, if the sound source visible in the image is distant, you may deliberately advance the audio so it appears in sync with the video. That makes sense, for example, for a telephoto shot of a basketball being dribbled. If realism is important, though, beware! In many war documentaries, a mortar shell or rocket lands in the distance, and the dubbed sound of the explosion occurs at the same instant as the flash. It Just Doesn't Happen That Way!!!

Surround sound from a stereo microphone on each of two bicycles one behind the other can be very compelling. A front/rear bicycle spacing of 20 feet or so will result in a good surround image in a typical listening room. Shadowing the microphones with the riders' bodies -- front microphone ahead, rear microphone behind -- will increase the front/rear separation. You then of course need to take extra trouble to screen the front microphones from the wind. Use manual level control when using multiple recorders, and adjust levels later. Different automatic level between channels of a recording causes shifts in the auditory image.

Frame-by-frame timing is not accurate enough to establish a stereo or surround-sound image, and slight speed differences between recorders also become a problem. The simplest solution is to use a single, multitrack recorder. You might use a wireless microphone one bicycle, connected to a multitrack recorder on the other, but then signal dropouts and interference could be a problem. Many small digital audio recorder can record four channels at once, for full surround.

It's more cumbersome, but possible, to export soundtracks to an audio editing application to adjust their speed. The free audio editing application Audacity -- available for Windows, the Macintosh and Linux -- can do this. You will want a hand clap or other similar cue at both the beginning and the end of a take the first time you adjust speeds to match one another. The hand clap should be at the same distances from the microphones both times, to avoid timing errors.

Calculating speed adjustment in Audacity

The Audacity screen shot at the right is from near the end of an 8-minute-long clip. A hand clap near the beginning has already been aligned so that it falls at the same time in both stereo tracks. Alignment is easiest if you delete audio up to the exact time of the hand clap in each track. Find your hand clap (or another transient sound) near the end of the tracks, and select the time span between its occurrences, as in the selection shown in the image. The numerical readout at the bottom of the image gives the time at the start and end. It is easiest to set the time display to read in samples, avoiding the need to convert from minutes and seconds to a single number. Some simple math tells us the ratio by which the speed of the lower track must be adjusted:

Selection End - Hand Clap Time
Selection Start - Hand Clap Time


Selection Start + Length - Hand Clap Time
Selection Start - Hand Clap Time

Adjust the speed using the Change Speed effect in Audacity. Save the result from your computer's calculator applet for later use (a text file is fine for this purpose) and also paste the result into the "Percent Change" field in Audacity's Change Speed dialog box -- but then Audacity audio speed adjustmentscroll left to make sure you have the correct number, because the dialog box will at first display only trailing digits. Readjust the sync if needed and trim the start of each track to the same time, so they will align themselves identically to frames of video.

Though Audacity can display multiple tracks, it can only output one pair of stereo channels at a time. Export each track separately as a .WAV file (mute the other track when exporting). Alignment of the front and rear recordings within one or two 1000ths of a second is easy to achieve, only a fraction of the delay due to the speed of sound from one bicycle to the other.

Import the tracks into your video editor, aligning their start to the same frame. Sounds which the front loudspeakers produce first will be heard at the front, and vice versa. As already described, no echo will be audible if all the speakers produce the same sound within about 30 milliseconds (on video frame) of each other.

Once you have determined the drift between two cameras, you probably can apply the same correction to additional clips from the same cameras, so you then need only a hand clap at the start of each clip.

This procedure will work with digital recordings. Analog recorders' speed stability is not good enough. It is generally good enough though to align an an analog audio track to video.


The Garmin VIRB 360 camera

The Sony AS100V helmet camera

the Mobius M800 action camera/dashcam

The Contour HD1080 helmet camera

The GoPro Helmet Hero HD helmet camera

Synchronizing multi-camera shoots

Image stabilization for bicycle video

VirtualDub video processor

Image stabilization plugin for VirtualDub

Deinterlacing in VirtualDub

Saving to MP4 in VirtualDub

Using VirtualDub to improve video from VHS tape

Pinnacle and Avid editing software

Five Ways to Create a Picture in Picture in Pinnacle Studio Ultimate

Pinnacle overwrites voiceovers...

Techmoan Web site -- reviews of action cams

Spoke Divider

Spoke Divider

Articles by Sheldon Brown and Others

Accessories Bicycles Parts Specials Tools

Copyright © 1997, 2007 Sheldon Brown

Harris Cyclery Home Page

If you would like to make a link or bookmark to this page, the URL is:

Last Updated: by John Allen