Introduction

Symusic (Symbolic Music) is a high-performance toolkit for symbolic music data. It provides a C++20 core that parses and manipulates MIDI at the note level and exposes ergonomic Python factories via nanobind. Typical operations—loading a multi-track score, shifting the pitch of entire sections, exporting piano rolls, or rendering audio with SoundFonts—complete hundreds of times faster than the pure-Python stacks traditionally used in MIR and deep-learning pipelines.

What makes Symusic different?

  • Speed with fidelity – MIDI is decoded into note-level structures directly in C++ and exposed to Python without copies, so workflows that used to require custom C++ extensions now run inside a notebook.

  • Time-unit flexibility – Score/Track/Event objects are available in tick, quarter, or second space, letting you choose between raw MIDI timing, musically normalized beats, or wall-clock seconds depending on the task.

  • Cross-language friendly – The core lives entirely in standard C++, so bindings for other languages (Julia, Lua, etc.) can be built on the same foundation.

  • End-to-end pipeline – Beyond parsing, Symusic includes transformations (filtering, trimming, resampling), beat/downbeat extraction, structured NumPy exports (SoA), piano-roll conversion, and synthesis via the Prestosynth engine.

Architecture at a glance

Layer

Purpose

Key technologies

C++ engine

MIDI/ABC parsing, time-unit conversions, vectorized containers

C++20, fmt, minimidi, zpp_bits

Python bindings

Factories (Score, Track, Note, …) + helper utilities

nanobind, stubs in python/symusic/core.pyi

Synthesizer

SoundFont 2/3 rendering and WAV dumping

Prestosynth, NumPy

Documentation

Canonical versioned docs on Read the Docs; legacy mdBook kept as an archive

Sphinx, MyST, Read the Docs

Installation overview

pip install symusic

We publish CPython wheels for Python 3.9 through 3.14 across the Linux, macOS, and Windows targets listed in .github/workflows/wheel.yml, including Windows ARM64. PyPy wheels are currently published for pp311 on manylinux_x86_64, macosx_x86_64, and macosx_arm64.

When you need to build from source, clone the repository with submodules and install it through pip:

git clone --recursive https://github.com/Yikai-Liao/symusic
cd symusic
pip install .

Symusic requires a C++20 compiler (GCC ≥ 11, Clang ≥ 15, MSVC 2022). If you lack system compiler permissions on Linux, a conda-forge::gcc toolchain plus the CC=/path/to/gcc override works well.

Note

The canonical documentation lives on Read the Docs at https://symusic.readthedocs.io/en/stable/. The legacy mdBook remains online at https://yikai-liao.github.io/symusic/ for historical links, but the Read the Docs site and the codebase are the primary references for current behavior.

For doc migration history and local build instructions, see Documentation Notes.

Benchmarks

Symusic’s decoding core is optimized for note-level MIDI workloads and outperforms the common MIDI libraries shared across MIR tooling:

  • midifile (C++) emits both event- and note-level data but spends significant time in iostream.

  • mido (Python) only parses event-level structures and serves as the foundation for many Python stacks.

  • pretty_midi and miditoolkit build on top of mido to expose note-level abstractions.

  • Python-accessible libraries are timed with timeit, C++ projects use nanobench, and Julia libraries use BenchmarkTools.

The end-to-end scripts live in symusic-benchmark and currently run on GitHub Actions M1 runners.

Benchmark comparison

Motivation

Symusic aims to provide a fast symbolic music preprocessing backend for MIDI and ABC workflows, with room to grow into additional formats over time.

The former dominant MIDI parsing backend is mido (used by pretty_midi and miditoolkit), which is written in pure python. However, it is too slow for the large-scale symbolic music data preprocessing task in the deep learning era, which makes it impossible to tokenize the needed data in real-time while training.

Out of that need, we developed this library. It is written in C++, exposes a Python binding through nanobind, and is over 100 times faster than mido. We parse MIDI files to note level (similar to miditoolkit) instead of event level (mido), which is more suitable for large-scale symbolic music preprocessing. ABC support is already integrated, while formats such as MusicXML remain future work.

We separated the event-level MIDI parsing code into a lightweight and efficient header-only C++ library minimidi for those who only need to parse MIDI files to event level in C++.

In the future, we will also bind symusic to other languages like Julia and Lua for more convenient use.