Audio Tools Reference
All 33 tools the AI agent can call to edit your audio session.
Tools are deterministic functions the agent calls to manipulate your audio session. You do not invoke tools directly — instead, describe what you want in natural language and the agent selects the right tool chain. Every tool call creates a new session node (non-destructive).
Prompt tips
- Name the track when you have multiple:
normalize track 1not justnormalize. - Use minutes:seconds for time:
cut from 1:30 to 2:00. - Chain operations in one message — the agent plans the full sequence before executing.
- Correct inline — if the agent misunderstood, say what was wrong:
not that track — the second one.
File and Track Management
loadDecode an audio file (MP3, WAV, FLAC) and create a new track in the session.
Example prompt: load /path/to/file.wav
Returns: track_id, duration_sec
add_trackAdd a new empty track to the session.
Example prompt: add an empty track called "drums"
Returns: track_id
remove_trackRemove a track. Does not delete the source file on disk.
Example prompt: remove track 2
Returns: node_id
Region Editing
cut_rangeRemove a time range. Audio after the cut point shifts left.
Example prompt: cut from 1:30 to 2:00 on track 1
Returns: node_id
copy_regionCopy a time region to the clipboard.
Example prompt: copy the section from 0:30 to 1:00
Returns: duration_sec of copied region
paste_regionInsert clipboard contents into a track. Audio shifts right at the insert point.
Example prompt: paste at 2:00 on track 1
Returns: node_id
trimRemove silence from the start and/or end of a track.
Example prompt: remove the silence at the start of track 1
Returns: node_id, trimmed_start_sec, trimmed_end_sec
insert_silenceInsert a gap of silence at a position. Audio shifts right.
Example prompt: add 2 seconds of silence at 0:30
Returns: node_id
reverseReverse a region (or the full track).
Example prompt: reverse track 1
Returns: node_id
Volume and Dynamics
gainApply a static dB gain to a region of a track. Range: −60 to +12 dB.
Example prompt: boost the vocals by 3 dB
Returns: node_id
set_track_gainSet the overall gain level for an entire track.
Example prompt: set track 2 gain to -3 dB
Returns: node_id
normalizeNormalize a track to an integrated LUFS target or true peak limit.
Example prompt: normalize to -14 LUFS for Spotify
Returns: node_id, applied_gain_db
Common targets: −14 LUFS Spotify/YouTube, −16 LUFS Apple Podcasts, −23 LUFS broadcast.
fadeApply a fade-in or fade-out envelope. Curve options: linear, exponential, logarithmic.
Example prompt: add a 3-second fade-out
Returns: node_id
set_clip_envelopeSet a per-clip volume automation curve. Provide (time_sec, gain_db) pairs and the engine linearly interpolates between them.
Example prompt: set a volume fade: track 0 clip 0, from 0s at -20dB to 2s at 0dB
Returns: node_id
Effects
eqApply a parametric EQ to a track using a chain of biquad peak filters. Specify frequency, gain (dB), and Q for each band.
Example prompt: boost the highs on track 1 by 3 dB at 8 kHz
Returns: node_id
compressorApply a dynamic compressor with configurable threshold, ratio, attack, and release. Uses an envelope follower for smooth gain reduction.
Example prompt: compress track 1: threshold -18 dB, ratio 4:1
Returns: node_id
noise_reductionRemove broadband noise via spectral subtraction (realFFT + overlap-add). Estimates the noise floor from a silent region and subtracts it from the signal.
Example prompt: reduce background noise on track 1
Returns: node_id
Time and Pitch
time_stretchChange the duration without changing the pitch.
Example prompt: stretch track 1 to 4 minutes
Returns: node_id, new_duration_sec
pitch_shiftChange the pitch without changing the duration. Range: −12 to +12 semitones.
Example prompt: shift the vocals up 2 semitones
Returns: node_id
Analysis
analyze_trackDetect BPM, musical key, integrated loudness (LUFS), true peak, and transient count.
Example prompt: analyze track 1
Returns: bpm, key, loudness_lufs, peak_dbfs, transient_count
align_to_beatShift the start of a track to align with the nearest beat grid.
Example prompt: align track 2 to the beat
Returns: node_id, shift_sec
ML Tools
separate_stemsRun Demucs stem separation on-device. Produces 4 tracks: vocals, drums, bass, other. Model: htdemucs (~80 MB). Processing: ~45 sec/min audio on CPU.
Example prompt: separate the stems on track 1
Returns: node_id, stem track IDs
First use downloads the model automatically. htdemucs_6s adds guitar and piano stems at ~2× the processing time.
transcribeTranscribe spoken audio using Whisper large-v3 on-device. Stores word-level timestamps in the session. Model: ~1.5 GB. Processing: ~4–8 min per 60 min on CPU.
Example prompt: transcribe track 1
Returns: node_id, word_count, language
First use downloads the model automatically. CoreML (macOS) and CUDA significantly reduce processing time.
DAG Operations
fork_nodeFork the current node to create an independent branch. The fork becomes the new head.
Example prompt: fork the session and call it "take-2"
Returns: node_id
revert_toMove the session head to an earlier node. Does not delete any nodes.
Example prompt: revert to before the reverb
Returns: node_id
compare_nodesGenerate a diff between two nodes: tracks added/removed, gain changes.
Example prompt: compare the current version with the one before normalization
Returns: tracks_added, tracks_removed, tracks_changed
apply_diffApply a computed diff from compare_nodes to the current session.
Example prompt: (used internally by the agent)
Returns: node_id
name_nodeSet a human-readable label on the current head node.
Example prompt: name this state "final mix"
Returns: node_id
Annotations
labelAdd a named point marker or region annotation to the timeline.
Example prompt: mark the chorus at 1:05
Returns: annotation_id
Rendering
render_finalRender the full session to a WAV file at 16, 24, or 32-bit depth.
Example prompt: export to /Users/me/Desktop/final.wav
Returns: path, duration_sec, peak_dbfs, sample_rate
render_previewRender a preview WAV to a temp file. Valid for the current app session.
Example prompt: (used internally for playback)
Returns: path