Dear Sursounders,

in January this year I asked the people on this list, if they think that
it's possible to broadcast b-format over WiFi to an ambisonics speaker
system. I got a lot of replies and a lot of helpful hints and opinions.

My thesis is now written and approved, maybe some of you are interested in
the results.

The setting of the work has changed. There is no longer an audio streaming
solution and the prototype was build with five Raspberry Pi SoC-Computers.
Main interest was focused to timing aspects. (Maybe a few readers left ...
:-)

The timing constraints for the ambisonics decoding are optimal, when the
relative latency of the decoders are smaller than 20µs (shorter than the
duration of a one 48kHz digital audio sample), derived from psychoacoustic
localisation blur (Lokalisationsunschärfe).
If you consider wavefield reconstruction at the sweet spot and you accept
some audible degradation in wavefield reconstruction, you can go up to
0.5ms (thanks Fons Adriaensen for the hint) in relative latency.
Presumptions: Common-mode violations leads to incoherent phase at the sweet
spot and inevitably to level change. If 2dB level change are at the border
of inaudibility, this means a phase shift of about 72 degrees and leads to
0,5 ms at 380Hz, if you accept the separation frequency for a dual band
decoder, where you leave the correct wavefield reconstruction at higher
frequencies anyway.

The developed prototype looked like:
One control computer, five Raspberry Pi clients and one access point form
together a closed 802.11n network. BT-Sync (thanks to Sampo Syreeni for
mention it here) synchronizes one folder on every computer. This folder
contains all audio data, decoders and control applications. PTPd, a
Precision Time Protocol daemon synchronizes the realtime clocks on every
computer. A simple flashgun triggers the start/stop/rewind functions of
ambisonics decoding and audio playback. A simple light sensitive transistor
circuit detects the flash and is connected to the GPIO17 of the RasPis. The
ambisonic decoders are implemented in Csound, I did not used the
bformdec-opcode, because I wanted a dual band decoder. The rest of the code
is done in C, with direct register access capabilities. The ambisonic setup
was quadratic, but I needed an extra subwoofer because of the bandlimited
speakers (they were to tiny).

Different system design concepts were studied: Trigger method (IRQ-Handling
vs. polling, GPIO direct register access vs. use of function libraries),
Scheduler (Realtime, non-RT), interprocess communication (TclCsound,
CsoundAPI) and playback location (SD card, RAM).

In 28 series of measurement, these were the results:
PTPd synchronizes the clocks to an accuracy of 4ms.
GPIO events are detected within 20µs (11µs on average). Min-Max differences
are at 6 µs.
The response time from GPIO detection to the first audio sample on audio
output connector takes around 13ms.
Relative latency of two decoders are at average 4.8ms (worst 7.6ms).
Relative latency of four decoders are at average 6.6ms (worst 12.4ms).
Best system design resulted in: GPIO direct register access (what news for
an embedded engineer!), realtime scheduler (FIFO) and wavetable playback
(RAM). But all series of measurement were relatively close together (the
most significantly difference were to establish the GPIO direct register
access and get rid of IRQ at user space domain and GPIO function libraries).

It can be followed:
PTPd can't be wisely used for highly accurate synchronizing clocks in
802.11n. But it helped to avoid drifting in audio playback.
GPIO-Events are detected within the latency of <20µs.
But the relative latency of the decoders among themselves are about 500
times and more higher.
The timing constraints have not been complied.
The more decoders are in the system, the greater the relative latencies.

I think, the best way to get the relative latencies down, is to build a
more efficient decoder. I can't overlook the effectiveness of the Csound
internals, but maybe you are better with a native-C approach. If I'll find
the time I will take a look into FAUST (as Aaron mentions this from time to
time).
Using the flash trigger is good for working in a laboratory environment,
but not for a real world application. It's error-prone and easy to sabotage.
But as long as you don't get the clocks better synchronized than some
several microseconds, you don't need to think about establishing a
"presentation time" like in AVB (IEEE 802.1 and IEEE 1722), where you send
a data package to the decoder clients, which contains the time when they
should start their playback (simply said). Why PTPd synchronizes so bad is
because of the "Distributed Coordination Function" in 802.11 which leaves
the media access control self-governing, no master tells the clients when
to send their data packages. Even QoS functions doesn't help here.

Is there anyone who owns one of the Raumfeld-Systems by Teufel? Seems they
manage sample-accurate playback wirelessly, but only stereo.
I used the RPi computer because it is cheap. Sure, the situation is
completely different if I would use a different SoC like the better
Pandaboards or the SabreLite. But If you think about the costs if you need
like ten decoders or more, then the price matters a lot.

To mention the 'not representative' hearing tests: After getting everything
under control and when I realised what I've been doing, the system sounded
not bad at all. Even in the diploma presentation/examination it was astonishing
how well the instruments could be located. I played the Toccata for
percussion by Ney Rosauro, recorded by Paul Hodges.  It could also be that
I simply had luck with the timing.

Thanks,
Sven Thebert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20130925/05ba846d/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to