Sound with XAudio2 on Windows

References

We'll initialize a low-latency audio output buffer into which we can feed blocks of bytes representing any sound. It also automatically handles device switching when the default device changes, and recovers and restarts when errors occur. Running the code below gives a console program which outputs a square wave sound. Press up/down on your keyboard to adjust the frequency, and escape/q/k to quit the program.

I'm using XAudio2, which is Microsoft's API for game audio. In earlier versions, it supported C, but it's made exclusively with C++ support now. It is built on top of WASAPI - for absolute lowest latency, you may be interested to look into using WASAPI directly, but XAudio2 allows us to specify the format of our audio (48KHz, single channel, 32-bit float) and let the system convert it to whatever the output hardware needs. With WASAPI, you have to handle conversion yourself and conform to whatever the available formats are (such as, 192KHz, 5-channel, 24-bit float).

Build with:

g++ main.c -lole32 -lAvrt

int main () {
        DO_OR_QUIT (CoInitializeEx(NULL, COINIT_MULTITHREADED), "Sound thread failed to CoInitializeEx");
        
        { DWORD task_index = 0; AvSetMmThreadCharacteristics(TEXT("Pro Audio"), &task_index); }

XAudio2 relies on the Windows COM library, which must be initialized (and before closing, uninitialized). We set the thread priority to "Pro Audio". "Games" is another option, but Pro Audio guarantees lower latency if available, and higher priority over other threads.

while (restart_audio && !quit) {

I wrapped all the audio setup code in this loop so that it can be automatically re-run just by setting this restart_audio variable on critical errors. This allows sound recovery without restarting the program.

        if (xaudio_source_voice != nullptr) { xaudio_source_voice->DestroyVoice(); xaudio_source_voice = nullptr; }
        if (xaudio_master_voice != nullptr) { xaudio_master_voice->DestroyVoice(); xaudio_master_voice = nullptr; }
        if (xaudio != nullptr) { xaudio->Release(); xaudio = nullptr; }
        if (xaudio_dll) { FreeLibrary (xaudio_dll); xaudio_dll = nullptr; }

On first iteration, all of these if statements will be false, but on restarts this is where we release all resources before recreating them.

typedef HRESULT (WINAPI *XAudio2Create_t)(IXAudio2**, UINT32, XAUDIO2_PROCESSOR);

        xaudio_dll = LoadLibraryA("xaudio2_9.dll");
        if (!xaudio_dll) { LOG("Failed to load xaudio2_9.dll"); restart_audio = true; continue; }
        XAudio2Create_t XAudio2Create = (XAudio2Create_t)GetProcAddress (xaudio_dll, "XAudio2Create"); assert (XAudio2Create);
        if (!XAudio2Create) { LOG ("Failed to get XAudio2Create"); restart_audio = true; continue; }

I don't link to XAudio2 in the usual way with a linker flag, like -lxaudio2. This is just because it isn't usually included in MinGW, so I load it manually in my program. We only need to retrieve one function so it's not a big deal, but if you want to link against it the normal way, check the instructions in the comment below these lines

DO_OR_RESTART (XAudio2Create(&xaudio, 0, XAUDIO2_DEFAULT_PROCESSOR), "Sound thread failed to create XAudio2");

This initializes XAudio2.

class EngineCallback : public IXAudio2EngineCallback {
public:
    void OnCriticalError(HRESULT Error) override {
        restart_audio = true;
        LOG ("Critical XAudio2 Engine error [%lx] [%s]", Error, HResultToStr(Error));
    }
    void OnProcessingPassEnd() {}
    void OnProcessingPassStart() {}
};
VoiceCallback xaudio_voice_callback;

        DO_OR_RESTART (xaudio->RegisterForCallbacks(&xaudio_engine_callback), "Failed to register for engine callbacks");

We create a class inheriting IXAudio2EngineCallback to implement the engine callback functions. The only one we're interested in is OnCriticalError. We print the error and flag the whole audio system to restart.

        DO_OR_RESTART (xaudio->CreateMasteringVoice(&xaudio_master_voice, 1, SAMPLING_RATE, 0, NULL, NULL, AudioCategory_GameEffects), "Failed to create mastering voice");

XAudio2 uses multiple source voices, then mixes them with one mastering voice. The mastering voice need to be configured with the number of channels, sampling rate, and has a few other settings. Check the documentation for all possible values.

const WAVEFORMATEX wave_format = {
    .wFormatTag = WAVE_FORMAT_IEEE_FLOAT,
    .nChannels = 1,
    .nSamplesPerSec = SAMPLING_RATE,
    .nAvgBytesPerSec = SAMPLING_RATE * sizeof (float),
    .nBlockAlign = sizeof (float),
    .wBitsPerSample = sizeof (float) * 8,
    .cbSize = 0
};

        DO_OR_RESTART (xaudio->CreateSourceVoice(&xaudio_source_voice, &wave_format, 0, XAUDIO2_DEFAULT_FREQ_RATIO, &xaudio_voice_callback, NULL, NULL), "Failed to create source voice");

        DO_OR_RESTART (xaudio_source_voice->Start(0, XAUDIO2_COMMIT_NOW), "Failed to start source voice");

We specify an audio format and initialize a source voice. I've chosen single-channel, 48KHz, 32-bit float. See the WAVEFORMATEX and WAVEFORMATEXTENSIBLE documentation for all possible values. We also pass in the voice callback object, which implements all the voice callback functions we want XAudio2 to call.

static XAUDIO2_BUFFER xaudio_buffers[2];

        xaudio_buffers[0].AudioBytes = xaudio_buffers[1].AudioBytes = sizeof(float) * PERIOD_SIZE;
        xaudio_buffers[0].pAudioData = (const BYTE*)&audio_buffers[0];
        xaudio_buffers[1].pAudioData = (const BYTE*)&audio_buffers[1];

        xaudio_source_voice->SubmitSourceBuffer (&xaudio_buffers[0], nullptr);
        xaudio_source_voice->SubmitSourceBuffer (&xaudio_buffers[1], nullptr);

Audio data is sent to XAudio in XAUDIO2_BUFFERs. Here we set two buffers up, pointing at the two areas of memory where we'll store our audio samples. Then we submit both buffers, currently filled with zeroes, to the source voice to begin playback. It's important to queue two buffers so that, when one completes playback, we can refill the used buffer and queue it again while the other buffer is playing. From here on, XAudio will be calling our callback functions.

class VoiceCallback : public IXAudio2VoiceCallback {
public:
    void OnBufferEnd(void* pBufferContext) override {
        static bool sample_buffer_swap = 0;
        float *data = audio_buffers[sample_buffer_swap];
        for (int i = 0; i < PERIOD_SIZE; ++i) {
            static uint16_t sound_counter = 0;
            float sample = sound_counter < wavelength/2 ? 1 : -1;
            if (++sound_counter >= wavelength) sound_counter = 0;
            *data++ = sample;
        }
        xaudio_source_voice->SubmitSourceBuffer (&xaudio_buffers[sample_buffer_swap], nullptr);
        sample_buffer_swap = !sample_buffer_swap;;
    }

    void OnStreamEnd() {}
    void OnVoiceProcessingPassEnd() {}
    void OnVoiceProcessingPassStart(UINT32 SamplesRequired) {}
    void OnBufferStart(void* pBufferContext) {}
    void OnLoopEnd(void* pBufferContext) {}
    void OnVoiceError(void* pBufferContext, HRESULT Error) {
        restart_audio = true;
        LOG ("Critical XAudio2 voice error [%lx] [%s]", Error, HResultToStr(Error));
    }
};

To pass the voice callbacks to XAudio2, we inherit from the IXAudio2VoiceCallback class and implement all the callback functions. We just need to handle errors by restarting the audio, and pass new data in OnBufferEnd. Depending on the complexity of your audio generation, you may not want to delay sending a buffer to generate it here. For a simple square wave, though, this works just fine.

while (!quit && !restart_audio) {

This section just loops to keep the program alive, and provides a console-based input for testing the program - up/down arrow keys to change frequency, and q/k/escape to quit. The actual audio processing won't happen on this thread - XAudio2 will start its own threads for that and use our provided callbacks.