Stampede II is a real-time program for granular sound processing. It allows both drastic and subtle manipulations of recorded and live sounds, including
The program provides access to all functionality via a single-window graphic user interface which can be manipulated during re-synthesis. Snaphots of the program state can be captured and recalled at any time; a bank of snapshots can be refined by text-editing into a score that is rendered offline by stampede II to circumvent real-time processing limitations.
This manual is organised as follows: chapter 1 provides background knowledge to help understanding Stampede II; chapter 2 covers parametric access; chapter 3 serves as a reference of the user parameters.
No theoretical scholarship is necessary to achieve satisfying results with Stampede II; however, it's potential can be exploited increasingly when the basic principles are thoroughly understood.
Re-synthesis Methods employed by Stampede II
Stampede II provides implementations of various re-synthesis techniques in its four re-synthesis modes: quasi-sync, phase-sync, pitch-sync and fix-sync. These modes can be toggled during re-synthesis to find the most accomodating mode. All modes allow independent parametric control over a sound's speed, pitch, dynamics and stereo positioning. The pitch-sync and fix-sync modes permit independent relative shifting of a sound's pitch and it's spectral envelope; fix-sync mode allows absolute pitch specification; and all modes provide various referencing techniques for modulation and randomization of the granulation process.
In the following, an introduction of each re-synthesis method preceeds a discussion of its implementation in Stampede II.
Granular sound processing synthesizes new sound by re-assembling and overlaying excerpts taken from the time-domain representation of a source-sound (Truax 1994). Prior to re-synthesis, the excerpt-waveforms may be subjected to any sort of modification (filtering, sampling-rate conversion, etc.). Scaling the waveform amplitude, by a local envelope, is necessary to avoid waveform discontinuities during re-assembly. The duration of a sound grain constructed in this manner is a key parameter. This is typically allowed to be long enough to expose minute spectral properties while suppressing temporal structure.
Quasi-synchronous granular synthesis organizes grains in voices. Each voice outputs grains with a variable synthesis-period, which determines the time delay between the onset of succesive grains. The ratio of the synthesis period and the grains' duration determines the amount of grain overlap in a voice. If the grain-duration is smaller than the synthesis period, grains do not overlap in a voice; consequently, this regular interruption of the audio stream will produce amplitude modulation (AM) artefacts. Random deviation of the initial synthesis period may be applied to bias the actual onset-delay between successive grains.
Synthesis is carried out in parallel voices to increase grain density. A single set of user parameters control all voices, which behave indentically in the absence of random deviations, hence the term quasi-synchronous. However, the modulation-depth parameters determine the amount of deviation from the mean when modulation is applied; accordingly, each voice behaves differently. Such efficiency of control and perceptual meaningfullness of the user-parameters, are among the prime reasons for the popularity of granular methods.
Arbitrary alterations of time-flow can be achieved by appropriately choosing the positions from which successive grains are extracted. Suppose an analysis period determines the temporal expanse between the onset of source-sound excerpts used for successive grains; then in synthesis, the ratio between the analysis period and the synthesis period corresponds to a speed factor. If playback at half-speed is desired, the analysis period is half the synthesis period (figure 1). Logically, sound deceleration involves repetition of excerpts, while acceleration involves skipping excerpts. The granulation process has its own periodicity, one completely unrelated to the periodicity of the signal.
Predictably, this operation produces artefacts in the re-synthesized sound, determined by the speed factor, synthesis period, envelope shape, and degree of grain-overlap. For instance, when the relationship between the synthesis period and the signal period is excessively small, as is the case in the example from figure 1, interference in both periods results. In time-stretching applications, long synthesis periods give rise to an echo effect which results from the repetition of transient source-sound material. (Behles and Starke and Roebel 1998) provides a straightforward analytic rendering of the process. It reveals a very regular spectral structure of the artefacts which makes this method valueable for goal-driven artistic use. In fact, by deliberately choosing "wrong" parameter values, this method becomes a dedicated synthesis tool. When "clean" acceleration or deceleration is explicitely desired, these artefacts are perceived as distortion.
An awkwardness of this approach is that the speed parameter and the synthesis period (which is controlled by the grain frequency parameter in Stampede II) consistently evade feeling orthogonal; both parameters interact to describe an inharmonic spectrum and neither is perceptually related to a unique timbral property.
Stampede II's quasi-sync-mode implements all these features. Additionaly, sampling-rate conversion can be applied to achieve coupled pitch and timbre transposition (controlled by the pitch / formant shift-parameter); the grain envelope shape is determined by the grain shape user-parameter, as is the ratio of grain-duration to synthesis period (grain fade). Extremely small synthesis periods allow the creation of dense disharmonic spectra.
High-quality audio time-stretching or compression according to (Roucos and Wilgus 1985; Jones and Parks 1988) addresses the problem of "out of phase" source-sound excerpts in overlapping grains (figure 1). An in-phase overlap with the preceding grain is achieved by shifting the extraction onset within the source-sound. Precise calculation of an adequate shift is achieved with cross-correlation, which is computationally expensive.
A somewhat different approach has been taken for Stampede II's phase-sync-mode. The source signal is analysed once for periodicity; this produces a list of pitchmarkers, which reference each pitch period (the inverse of the local fundamental frequency). The onset of the source-sound excerpt is quantisized to the closest pitchmarker (figure 2). The local synthesis period is modified so that the grain-in-progress will have used an integer amount of the pitch periods when the next grain begins. Thusly, where the source signal is periodic, this period is taken into account by the granulation process. Where there is no period, the pitchmarkers will be distributed randomly and re-synthesis will occur devoid of artificial periodicity. This does not imply universal accommodation of all kinds of audio signals. Source signals with inharmonic spectra or mixtures of periodic signal components can be modified more accurately with frequency-domain methods, like the digital phase vocoder (Portnoff 1976) or the McAuley-Quatieri-Method (McAuley and Quatieri 1986). However, with judicious adjustment of the parameters, good results can be readily achieved in the phase-sync-mode. Apart from phase synchronization, processing and parameters are identical in both phase-sync-mode and quasi-sync-mode.
Like the FOF-method for sound synthesis (Rodet 1980), the Pitch-Synchronous Overlap-Add method (Moulines and Charpentier 1991) is motivated by a source-filter model for speech production, which can also be applied to a large range of instrumental sounds. According to this model, a speech signal is the response of a time-varying filter (the vocal tract) to an excitation function (the vocal cords' movement). If the excitation function is assumed to be a pulse train, then the waveform resulting from speech can be regarded as a concatenation of filter impulse responses. Furthermore, if the filter's impulse response is assumed finite and shorter than the period of excitation (the pitch period), then cutting a sound-source's waveform at the instances of excitation effectively isolates each response whithin a single excerpt. Stretching the sound then by a factor of two is accomplished through simple duplication of each excerpt before reassembling the waveform; likewise, a downward octave pitch transposition is achieved by replacing every-other excerpt by the corresponding amount of silence. The resulting waveform's spectral envelope is identical to the orignal's; in other words, the formants' positions are maintained. This remarkable property distingishes this approach from transposition by sampling-rate conversion.
One certainly cannot expect "real-life" signals to conform to the assumptions espoused by the over-simplyfication above. If however a smooth local envelope is used to scale the excerpts' amplitude, and if the excerpts are about twice as long as the local pitch period, one can overlap-add the excerpts to obtain the desired modification of voice and instrument sounds. In pitch-sync mode, pitch shift, downward by a perfect fifth, is achieved by setting the synthesis period to three halves (3 : 2) of the local pitch period (figure 3). As in the phase-sync-mode, the onset of the source-sound excerpt of each grain in progress corresponds to a pitchmarker's position.
In contrast to operation in quasi-sync or phase-sync-mode, the grains' duration is proportional to the local pitch period, rather than to the synthesis period. In fact, downward pitch-shifting by more than an octave results in silence between the grains which inherently produces artefacts. Moulines and Charpentier have examined this method's artefacts and report that pitch manipulation alters details in the source-sound's spectral envelope, although its general contour is maintained.
Another interesting property of this method is that relative pitch-change, i. e. transposition, is just one option. As the fundamental frequency is determined by the synthesis period, arbitrary pitches can be chosen. Stampede II's fix-sync-mode allows an absolute specification of pitch from a chromatic scale which is imposed on the source-sound. When fix-sync-mode is entered from pitch-sync mode, Stampede II freezes the pitch being synthesized; the prior pitch setting is used as the new reference pitch when pitch-sync mode is called from fix-sync mode.
Low-level Granulation Mechanism
All of the re-synthesis modes described above use the same low-level mechansim to generate audio samples; however, each mode maps the user-parameters differently to the low-level parameters and makes different use of the data obtained from source signal analysis.
The granulation mechanism conducts re-synthesis in a number of parallel voices which is specified by the voice count parameter. A maximum of two grains may overlap in each voice at a given time. A trapezoid-shaped envelope is employed and grain-overlap in a single voice is constrained to the grains' fade-in and fade-out phases.
Stampede II offers a grain feedback facility. Past audio output may be mixed into the grain in progress by an amount which is determined by the grain feedback parameter. The delay time is held identical to the current synthesis period; thus, grain feedback enhances the periodicity of the granulation process. When the synthesis frequency is varied by modulation, the delay time varies accordingly. Smoothing by the grain envelope inhibits waveform discontinuity which otherwise arises when the delay time changes. Randomised grain feedback in parallel voices results in dense, lively echo structures when the grain frequency falls below the audio range. Higher grain frequencies give rise to a resonant sound quality. In pitch-sync and fix-sync mode, grain frequency and delay time is coupled to the source signal's fundamental frequency. Here, feedback results in a temporal averaging of the waveform.
All controllers on Stampede II's graphic user interface (gui) can be manipulated during re-synthesis for experimentation and live-performance (figure 4). The leftmost column is labelled "modes and actions". It contains a bank of switches to toggle the re-synthesis mode, a switch to enable live-input mode (see next section), a button that saves the contents of the current audio input buffer to a sound-file and a switch that toggles recording of the program's output to a sound-file on and off. Stampede II automatically assigns unique names to the files that it writes. Their names are displayed in the top-line.
The center columns contain continous controllers that regulate the re-synthesis process. Chapter 3 serves as a reference of these parameters. Western chromatic scales or dB-scales are used for values where applicable. Some controllers are specific to a particular synthesis mode and change appearance accordingly. The manipulation of continous controllers may be lagged by a duration which is specified by the controller labelled lag time, allowing target values to be approached more or less slowly.
Selecting Sound for Processing
The Input Buffer
Stampede II maintains an input-buffer capable of storing several minutes of mono source signal. If live-input mode is enabled, the buffer is continously re-filled with the signal from an audio-input; disabling live-input mode instantenously freezes the current buffer content. When Stampede II is launched with no arguments, for instance by double-clicking its icon on the desktop, or by typing
in a shell window, the program comes up in live-mode; the audio source and the sampling-rate to be used is defined by the settings in the audio control -panel, which is accessible from the desktop-menu on Silicon Graphics computers. Stampede II handles conflicts like disagreements of input and output sampling-rate by overriding the settings in the audio-control panel. If digital is chosen as the audio input source, but there is no digital signal available, Stampede II will output an error message and refuse operation.
Stampede II's input buffer can be initialised with the contents of a sound-file, and live-input is disabled by default by supplying an argument to the command-line invocation:
prompt> stampedeII IN.aiff
This is equivalent to mouse-dragging IN.aiff's icon onto Stampede II's icon. The computer's output sampling-rate must coincide with the sound-file's sampling-rate and Stampede II will override the settings from the audio-control panel if necessary. Notice that synchronization with audio from the digital input is necessary for using Stampede II in a digital audio environment, like a digital mixing console. Disagreements of the input signal's sampling-rate and the sound-file sampling-rate result in Stampede II giving up synchronization with the digital input.
Choosing an Input-Buffer Loop for Processing
The program grabs material for re-synthesis from the input-buffer as it traverses an adjustable loop at a specifiable speed. The speed parameter determines the rate of progression of a pointer which determines which signal is played in the grains. Hence, this parameter controls a pitch-independent modification of the playback rate, according to one of the methods described in chapter 1.
The loop's position is adjusted by the loop start parameter, relative to the end of the signal held in the input buffer. In live-input mode, the end of the signal in the buffer is identical to the actual time, so the loop can be regarded as a range of delay times. The speed parameter can be negative if backward progression through the loop is desired.
Hierarchy of User-parameters
Primary parameters and secondary parameters are the two classes of user parameters in Stampede II. The former determine mean values and the latter control the depths of modulations which produce deviation from the mean. This distinction is clearly reflected on Stampede II's gui. Most continuous controllers are organised in a matrix (figure 4); the top row contains the primary parameters, each successive rows contains the secondary parameters for one modulation source. Stampede II currently has three modulation sources: random, a voice-proportional offset and intensity-modulation.
The use of random modulation for grain-to-grain deviation of synthesis periods has been described here in the context of quasi-synchronous granular synthesis; it can also be applied to other parameters to disrupt periodicity and to effectively achieve an animated texture. For instance, a brassage-effect can be achieved by applying random modulation to the excerpt selection process; jitter can be imposed on the pitch of a sound by modulating the grain frequency in both pitch-sync and fix-sync mode. The character of randomness can be determined by specifying the amount of inertia to be imposed on the selection of random numbers.
Voice-proportional modulation adds a constant DC-offset for every voice. Applying voice-proportional modulation of seven (semitones) to pitch detunes voice 1 by 7 semitones, voice 2 by -7 semitones, voice 3 by 14 semitones, voice 4 by -14 semitones, etc.. This type of modulation provides an effective means for specifying beats and clusters.
Intensity modulation can be used to adapt the re-synthesis process to the properties of the material being processed. The peak amplitude in the grain in progress is compared to an adjustable reference amplitude. This ratio, for example, can be used to influence the speed parameter. Complex speech time-warping can be achieved in this manner, as louder sections, typically vowels, are stretched by a larger amount than quieter sections, often consonants. When applied to the gain parameter, intensity modulation can also be used for dynamics modification.
Automation and Scores
The rightmost column on Stampede II's gui provides access to a number of snapshot locations which store the state of controllers, the lag-time, and the re-synthesis mode. Clicking on one of the left buttons labelled s saves the current state in the corresponding snapshot. Saved states can be retrieved instanteniously by clicking on one of the r-labelled buttons; alternatively, clicking on one of the numbered buttons recalls a snapshot with the lag-time that has been stored with the snapshot. The saved state will be restored with intermediate interpolation.
The contents of the snapshot locations can be written to a text file by clicking the save snapshots-field. Stampede II assigns a unique name to the file and displays it in the top-line. A snapshot file can be edited offline, in a text editor like jot or emacs. The format of snapshot files is self-explanatory. Notice that lines which start with a #-sign are comments which are not evaluated by the program. Supplying the name of a snapshot-file at program invocation using the -s command-line option makes the snapshots in the file available to stampede:
prompt> stampedeII -sMY_SNAPSHOTS.stampede
Notice there must be no space between the flag and the filename.
A snapshot-file may be used as a score which defines a section of music. In this mode of operation, every snapshot defines a state, and the associated lag-time determines the duration of an interpolation from the previous state to that state. Every snapshot can be considered a collection of breakpoints and the snapshot-file defines a set of breakpoint-envelopes, each of which defines the state of a user parameter at any time in the piece. The total duration of a piece of music defined by a score file is the sum of the lag-times of all snapshots contained in the file. To use a snapshot-file as a score, Stampede II must be invoked from a shell-window with command-line arguments, like:
prompt> stampedeII -sMY_SCORE_FILE.stampede -o IN.aiff
The graphic user interface will be suppressed in this mode of operation, and the result of calculation is a stereo sound-file named MY_SCORE_FILE.aiff.
Alternatively, the command
prompt> stampedeII -sMY_SCORE_FILE.stampede -oOUT.aiff IN.aiff
will leave the results of calculation in the sound-file OUT.aiff.
The following table is an overview of the user parameters available in Stampede II's four re-synthesis modes. Parameters are listed top-to-bottom according to their left-to-right appearance on Stampede II's graphic user interface (Random Inertia, Voice Count and Reference Gain make an exception. These parameters appear below the Input Gain parameter). Most parameters are available in all four modes; some parameters however are mode-specific. The table also lists the range and scale of each parameter. Absolute pitches are specified as midikeys according to the MIDI-convention. Dimension-less parameters use an arbitrary scale.
|LOOP START 0..1||LOOP START 0..1||LOOP START 0..1||LOOP START 0..1|
|LOOP LENGTH 0..1||LOOP LENGTH 0..1||LOOP LENGTH 0..1||LOOP LENGTH 0..1|
|INPUT GAIN -96..18 dB||INPUT GAIN -96..18 dB||INPUT GAIN -96..18 dB||INPUT GAIN -96..18 dB|
|RANDOM INERTIA 0..1||RANDOM INERTIA 0..1||RANDOM INERTIA 0..1||RANDOM INERTIA 0..1|
|VOICE COUNT 0..32||VOICE COUNT 0..32||VOICE COUNT 0..32||VOICE COUNT 0..32|
|REFERENCE GAIN -96..0 dB||REFERENCE GAIN -96..0 dB||REFERENCE GAIN -96..0 dB||REFERENCE GAIN -96..0 dB|
|SPEED -2..2||SPEED -2..2||SPEED -2..2||SPEED -2..2|
|PITCH / FORMANT SHIFT -96..+24 semitones||PITCH / FORMANT SHIFT -96..+24 semitones||PITCH-SHIFT -96..+24 semitones||PITCH 0..90 midikey|
|GRAIN FREQUENCY 0..90 midikey||GRAIN FREQUENCY 0..90 midikey||FORMANT SHIFT -96..+24 semitones||FORMANT SHIFT -96..+24 semitones|
|GRAIN WIDTH 0..1||GRAIN WIDTH 0..1||GRAIN WIDTH -1..1||GRAIN WIDTH -1..1|
|GRAIN FADE 0..1||GRAIN FADE 0..1||GRAIN FADE 0..1||GRAIN FADE 0..1|
|GRAIN RESONANCE -1..1||GRAIN RESONANCE -1..1||GRAIN RESONANCE -1..1||GRAIN RESONANCE -1..1|
|GRAIN PANNING -1..1||GRAIN PANNING -1..1||GRAIN PANNING -1..1||GRAIN PANNING -1..1|
|GRAIN VOLUME -96..18 dB||GRAIN VOLUME -96..18 dB||GRAIN VOLUME -96..18 dB||GRAIN VOLUME -96..18 dB|
This parameter determines the beginning of a loop of the sound material conained in the input buffer which is continuously traversed by Stampede II. The length of the slider corresponds to the total duration of the input buffer. The bottom end corresponds to the most recent signal contained in the buffer. Notice that in live-mode, the buffer contents are continously re-filled with new sound from the audio input. The loop can be compared to a window in a train which reveals a part of the landscape (the sound) to the viewer. The image is constant when the train stands still (live-mode is off), and moves along as the train moves (live-mode is on).
This parameter determines the length of the loop; see the discussion of the loop start parameter. When this parameter is zero, Stampede II will choose grains from a constant position in the input buffer.
The signal from the audio input is mixed into Stampede II's audio output by an amount which can be controlled using this parameter.
This parameter controlls the amount of inertia to apply to the choice of the random numbers used for random modulation. For details, see the discussion of modulation sources in chapter 2.
This parameter determines how many parallel voices sound at a time. Since every voice consumes a certain amount of processing time, large values may exceed the computer's capabilities and the audio output will be interrupted more or less regularly. This parameter is closely related to voice-proportional mdoulation. For details, see the discussion of modulation sources in chapter 2.
This parameter determines a reference amplitude to use for intensity modulation. For details, see the discussion of modulation sources in chapter 2.
This parameter determines the speed of progression through the input buffer. Negative values indicate progression in reverse direction. Values correspond to speed-factors, i.e. 1 is normal speed, 2 is double speed, etc.
Pitch / Formant Shift
This parameter determines the amount of pitch shift in quasi-sync mode and in phase-sync mode. Since pitch-shifting is accomplished by sampling-rate conversion in these modes, the spectral envelopes is shifted with the pitch, resulting in a timbral modification that is sometimes referred to as the "Mickey-Mouse-effect". For details, see the discussion of re-synthesis modes in chapter 1.
This parameter controls the amount of pitch shift in pitch-sync mode. Pitch-shifting in pitch-sync mode does not incur a shift of the spectral envelope. Formants keep their position. However, radical shifts incur other artefacts. For details, see the discussion of re-synthesis modes in chapter 1.
This parameter determines a constant pitch which is imposed on the input sound in fix-sync mode. Pitch-changes in fix-sync mode do not incur shifts of the spectral envelope. Formants keep their position. However, radical changes incur other artefacts. For details, see the discussion of re-synthesis modes in chapter 1.
This parameter controls the exact frequency of grain production in quasi-sync mode and the average frequency of grain production in phase-sync mode. For details, see the discussion of re-synthesis modes in chapter 1.
This parameter controls the shift of the input sound's spectral envelope in pitch-sync mode and in fix-sync mode. It allows a pitch-independent shifting of a sound's formants. For details, see the discussion of re-synthesis modes in chapter 1.
This parameter controls the duration of grains, relative to the frequency of grain production. This parameter has slightly different meanings according to the re-synthesis mode chosen:
Quasi-sync mode and Phase-sync mode
If the maximum value (1) is chosen, the grain duration is chosen so that two grains overlap (sound at the same time) in every voice. Lower values reduce the grain duration proportionally.
Pitch-sync mode and Fix-sync mode
When the value 0 is chosen, the duration of grains is chosen equal to twice the pitch period of the sound that is played by the grain. Greater values increase the grain duration towards a maximum of two grains overlapping in a voice. Smaller values decrease the grain duration towards zero.
This parameter controls the shape of the grain envelope by determining the ratio of the fade-in and fade-out portion of the envelope to the total grain duration. Small values result in "edgy" envelopes, large values result in "smooth" envelopes.
This parameter controls the amount of feedback to use for grains. Feedback can be positive or negative. For details, see the discussion in chapter 2.
This parameter controls the positioning of grains in the stereo field.
This parameter controls the gain of the material being played by grains. Notice that intensity modulation can be applied to this parameter to obtain dynamics modification.
Behles, G. and S. Starke and A. Roebel. 1998. "Quasi-Synchronous and Pitch-Synchronus Granular Sound Processing with Stampede II". Computer Music Journal. Forthcoming
Jones, D. and T. Parks. 1988. "Generation and Combination of Grains for Music Synthesis" Computer Music Journal 12 (2): 27-34
McAuley, R. J. and T. F. Quatieri. 1986. "Speech Analysis / Synthesis based on a Sinusoidal Representation". IEEE Transactions on Acoustics, Speech, and Signal Processing 34 (4): 744-754.
Moulines, E. and F. Charpentier. 1990. "Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis using Diphones". Speech Communications (9): 453-467
Portnoff, M. R. 1976. "Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform." IEEE Transactions on Acoustics, Speech and Signal Processing 24 (3): 243-248
Rodet, X. 1980. "Time-domain formant-wave-function synthesis". Computer Music Journal 8 (3): 9-14
Roucos, S. and A. Wilgus. 1985. "High Quality Time-Scale Modification of Speech". In Proceedings of the 1985 International Conference on Acoustics, Speech and Signal Processing. New York: IEEE, pp. 493-496
Truax, B. 1994. "Discovering inner complexity: Time-shifting and transposition with a real-time granulation technique". Computer Music Journal 18 (2): 38-48
Gerhard BehlesThe Electronic Music Studio at Technical University Berlin Mail: EN-8, Einsteinufer 17, D-10587 Berlin