User Guide

Complete guide to every feature in edytlab.

Sessions and the Session Graph

Every edit you make in edytlab creates a new session node — a snapshot of your audio session at that point in time. Nodes are linked to their parent, forming a directed acyclic graph (DAG).

This means nothing is ever destructive. Your original audio files are never modified. You can always navigate back to any previous state.

Graph view — click the graph icon in the toolbar to see all session nodes visualised as a tree. Each node shows its label, the tool that created it, and its timestamp.
Revert to any node — click a node in the graph and select "Set as head" to jump back to that state.
Fork — branch from any node to explore a different edit direction without losing your current work.

Tracks

A session contains one or more tracks. Each track holds a sequence of audio clips pointing to source files on disk. Clips are non-destructive — they describe a region of a source file, not a copy of it.

Load audio — drag a file onto the timeline, click "Open Audio", or type load /path/to/file.mp3 in chat. Supported formats: MP3, WAV, FLAC.
Multi-track sessions — load multiple files or ask the agent to add a track: add an empty track called drums.
Track gain — adjust the volume of any track independently.
Mute — silence a track without removing it from the session.

Session Templates

When no audio is loaded, click Start from template to choose from a set of pre-configured session layouts:

Podcast — host + guest tracks, pre-configured for voice normalization and noise reduction.
Music — lead, harmony, bass, and drums tracks.
Interview — interviewer + subject tracks.

Templates create the track structure and inject matching skills automatically. You can still modify the session freely after choosing a template.

The Chat Interface

The chat panel on the right is where you direct the agent. Type your editing instruction in plain English — no commands to memorise.

Type / in the chat input to see an inline dropdown of all available commands — press arrow keys to navigate, Enter to select.

The agent:

Streams its response in real time as it processes.
Shows a tool badge for each operation it executes.
Tells you what it did and why after each set of operations.
Understands follow-up instructions in the same conversation: actually make it 2 dB higher instead.

What makes a good prompt

Be specific about the track when you have multiple: normalize track 1 to -14 LUFS.
Use time references: cut from 1:30 to 2:00 or remove the first 5 seconds.
Chain multiple operations in one message: the agent plans and executes the full chain.
Reference markers you have set: cut everything before the chorus marker.

Playback

Space — play / pause from the current position.
L — toggle loop mode. When active, playback loops within the selected region.
Click the waveform — jump to that position.
Ctrl+scroll or +/−/0 — zoom the waveform in/out/reset.
Drag on the waveform — create a selection range (used for range export, loop playback, and as agent context).

Markers and Selections

Markers let you annotate points and regions in your session. The agent can reference them by name.

Ask the agent to add markers: mark the verse start at 0:30.
Drag on the waveform to select a region. The selection is passed as context to the agent so it knows where to operate.
Use Esc to clear the selection.

A/B Compare

Compare any two session nodes side by side with audio playback of both. Useful for deciding between two mixes or comparing before/after an edit.

Ask the agent: compare the current version with the one before the reverb, or use the Graph view to select two nodes and click "Compare".
The A/B compare bar appears at the top. Toggle between A and B to hear each version.
Click Accept B to make the B node the new session head, or Dismiss to keep the current head.

Stem Separation

edytlab integrates Demucs to separate a mixed track into individual stems — vocals, drums, bass, and other instruments — running entirely on your device.

Ask the agent: separate the stems on track 1. The model downloads automatically on first use (~80 MB). Processing takes roughly 45 seconds per minute of audio on a modern CPU. Apple Neural Engine and NVIDIA CUDA acceleration reduce this significantly.

The four stems appear as new tracks in your session. You can then edit each independently.

Transcription

edytlab uses Whisper large-v3 to transcribe spoken audio to text with word-level timestamps. The model runs on-device (~1.5 GB, downloaded on first use).

Ask the agent: transcribe track 1. A 60-minute file transcribes in approximately 4–8 minutes on CPU. The transcript is stored in the session and can be referenced by the agent.

Export

Export the session to a WAV file:

Full session: export to /Users/me/Desktop/final.wav
Selection only: select a region on the waveform, then export the selection to /path/output.wav
Specific range: export seconds 30 to 90 to /path/chorus.wav

Exports are non-destructive — the session graph is not changed. You can export multiple versions from different nodes without re-doing work.

Memory

edytlab has a persistent memory system that lets you store notes the agent remembers across all sessions (global) or just for the current project (project-level).

Global memory — preferences, your name, typical workflows, instrument preferences.
Project memory — BPM, key, speaker names, style notes, deadlines.

Access memory from Settings → Memory. The agent can also write to memory directly: remember that the BPM is 128.

Skills

Skills extend the agent with domain-specific instructions that activate based on your message content. For example, a "podcast" skill might inject compression and normalization preferences whenever you mention "podcast" or "voice" in a message.

Manage skills from Settings → Skills.

Trigger types: always (injected every turn), keywords, or regex pattern.
Body: Markdown instructions added to the agent's system prompt when the skill matches.

Agent Profiles

Profiles let you configure the agent's model, tool set, and behaviour for different workflows. Examples:

Podcast profile — uses a fast, cheap model; restricts tools to load, cut, normalize, trim, transcribe, render.
Mastering profile — uses Claude Sonnet; has access to all tools; injects mastering instructions into the system prompt.

Set the active profile from Settings → Agent Profiles.

Tips and Keyboard Shortcuts

Press ? at any time to open the full keyboard shortcut overlay.
Type / in the chat to browse all commands with autocomplete — navigate with arrow keys, confirm with Enter.
Drag a selection on the waveform, then press L to loop just that region during playback.
Drag multiple audio files onto the timeline at once to create a multi-track session in one step.

LLM Provider and Model

Switch providers or models at any time from Settings → Provider. No restart needed. Your conversation history carries over.

Anthropic Claude Sonnet 4.6 — best for complex multi-track arrangements and long planning chains.
Anthropic Claude Haiku — fastest and cheapest; good for simple edits.
OpenRouter — access to 50+ models including open-weight options at lower cost.
OpenAI GPT-4o — reliable tool use; good general performance.