Audio Tools Reference

All 33 tools the AI agent can call to edit your audio session.

Tools are deterministic functions the agent calls to manipulate your audio session. You do not invoke tools directly — instead, describe what you want in natural language and the agent selects the right tool chain. Every tool call creates a new session node (non-destructive).

Prompt tips

Name the track when you have multiple: normalize track 1 not just normalize.
Use minutes:seconds for time: cut from 1:30 to 2:00.
Chain operations in one message — the agent plans the full sequence before executing.
Correct inline — if the agent misunderstood, say what was wrong: not that track — the second one.

File and Track Management

load

Decode an audio file (MP3, WAV, FLAC) and create a new track in the session.

Example prompt: load /path/to/file.wav

Returns: track_id, duration_sec

add_track

Add a new empty track to the session.

Example prompt: add an empty track called "drums"

Returns: track_id

remove_track

Remove a track. Does not delete the source file on disk.

Example prompt: remove track 2

Returns: node_id

Region Editing

cut_range

Remove a time range. Audio after the cut point shifts left.

Example prompt: cut from 1:30 to 2:00 on track 1

Returns: node_id

copy_region

Copy a time region to the clipboard.

Example prompt: copy the section from 0:30 to 1:00

Returns: duration_sec of copied region

paste_region

Insert clipboard contents into a track. Audio shifts right at the insert point.

Example prompt: paste at 2:00 on track 1

Returns: node_id

trim

Remove silence from the start and/or end of a track.

Example prompt: remove the silence at the start of track 1

Returns: node_id, trimmed_start_sec, trimmed_end_sec

insert_silence

Insert a gap of silence at a position. Audio shifts right.

Example prompt: add 2 seconds of silence at 0:30

Returns: node_id

reverse

Reverse a region (or the full track).

Example prompt: reverse track 1

Returns: node_id

Volume and Dynamics

gain

Apply a static dB gain to a region of a track. Range: −60 to +12 dB.

Example prompt: boost the vocals by 3 dB

Returns: node_id

set_track_gain

Set the overall gain level for an entire track.

Example prompt: set track 2 gain to -3 dB

Returns: node_id

normalize

Normalize a track to an integrated LUFS target or true peak limit.

Example prompt: normalize to -14 LUFS for Spotify

Returns: node_id, applied_gain_db

Common targets: −14 LUFS Spotify/YouTube, −16 LUFS Apple Podcasts, −23 LUFS broadcast.

fade

Apply a fade-in or fade-out envelope. Curve options: linear, exponential, logarithmic.

Example prompt: add a 3-second fade-out

Returns: node_id

set_clip_envelope

Set a per-clip volume automation curve. Provide (time_sec, gain_db) pairs and the engine linearly interpolates between them.

Example prompt: set a volume fade: track 0 clip 0, from 0s at -20dB to 2s at 0dB

Returns: node_id

Effects

eq

Apply a parametric EQ to a track using a chain of biquad peak filters. Specify frequency, gain (dB), and Q for each band.

Example prompt: boost the highs on track 1 by 3 dB at 8 kHz

Returns: node_id

compressor

Apply a dynamic compressor with configurable threshold, ratio, attack, and release. Uses an envelope follower for smooth gain reduction.

Example prompt: compress track 1: threshold -18 dB, ratio 4:1

Returns: node_id

noise_reduction

Remove broadband noise via spectral subtraction (realFFT + overlap-add). Estimates the noise floor from a silent region and subtracts it from the signal.

Example prompt: reduce background noise on track 1

Returns: node_id

Time and Pitch

time_stretch

Change the duration without changing the pitch.

Example prompt: stretch track 1 to 4 minutes

Returns: node_id, new_duration_sec

pitch_shift

Change the pitch without changing the duration. Range: −12 to +12 semitones.

Example prompt: shift the vocals up 2 semitones

Returns: node_id

Analysis

analyze_track

Detect BPM, musical key, integrated loudness (LUFS), true peak, and transient count.

Example prompt: analyze track 1

Returns: bpm, key, loudness_lufs, peak_dbfs, transient_count

align_to_beat

Shift the start of a track to align with the nearest beat grid.

Example prompt: align track 2 to the beat

Returns: node_id, shift_sec

ML Tools

separate_stems

Run Demucs stem separation on-device. Produces 4 tracks: vocals, drums, bass, other. Model: htdemucs (~80 MB). Processing: ~45 sec/min audio on CPU.

Example prompt: separate the stems on track 1

Returns: node_id, stem track IDs

First use downloads the model automatically. htdemucs_6s adds guitar and piano stems at ~2× the processing time.

transcribe

Transcribe spoken audio using Whisper large-v3 on-device. Stores word-level timestamps in the session. Model: ~1.5 GB. Processing: ~4–8 min per 60 min on CPU.

Example prompt: transcribe track 1

Returns: node_id, word_count, language

First use downloads the model automatically. CoreML (macOS) and CUDA significantly reduce processing time.

DAG Operations

fork_node

Fork the current node to create an independent branch. The fork becomes the new head.

Example prompt: fork the session and call it "take-2"

Returns: node_id

revert_to

Move the session head to an earlier node. Does not delete any nodes.

Example prompt: revert to before the reverb

Returns: node_id

compare_nodes

Generate a diff between two nodes: tracks added/removed, gain changes.

Example prompt: compare the current version with the one before normalization

Returns: tracks_added, tracks_removed, tracks_changed

apply_diff

Apply a computed diff from compare_nodes to the current session.

Example prompt: (used internally by the agent)

Returns: node_id

name_node

Set a human-readable label on the current head node.

Example prompt: name this state "final mix"

Returns: node_id

Annotations

label

Add a named point marker or region annotation to the timeline.

Example prompt: mark the chorus at 1:05

Returns: annotation_id

Rendering

render_final

Render the full session to a WAV file at 16, 24, or 32-bit depth.

Example prompt: export to /Users/me/Desktop/final.wav

Returns: path, duration_sec, peak_dbfs, sample_rate

render_preview

Render a preview WAV to a temp file. Valid for the current app session.

Example prompt: (used internally for playback)

Returns: path