Dear Sursounders, in January this year I asked the people on this list, if they think that it's possible to broadcast b-format over WiFi to an ambisonics speaker system. I got a lot of replies and a lot of helpful hints and opinions.
My thesis is now written and approved, maybe some of you are interested in the results. The setting of the work has changed. There is no longer an audio streaming solution and the prototype was build with five Raspberry Pi SoC-Computers. Main interest was focused to timing aspects. (Maybe a few readers left ... :-) The timing constraints for the ambisonics decoding are optimal, when the relative latency of the decoders are smaller than 20µs (shorter than the duration of a one 48kHz digital audio sample), derived from psychoacoustic localisation blur (Lokalisationsunschärfe). If you consider wavefield reconstruction at the sweet spot and you accept some audible degradation in wavefield reconstruction, you can go up to 0.5ms (thanks Fons Adriaensen for the hint) in relative latency. Presumptions: Common-mode violations leads to incoherent phase at the sweet spot and inevitably to level change. If 2dB level change are at the border of inaudibility, this means a phase shift of about 72 degrees and leads to 0,5 ms at 380Hz, if you accept the separation frequency for a dual band decoder, where you leave the correct wavefield reconstruction at higher frequencies anyway. The developed prototype looked like: One control computer, five Raspberry Pi clients and one access point form together a closed 802.11n network. BT-Sync (thanks to Sampo Syreeni for mention it here) synchronizes one folder on every computer. This folder contains all audio data, decoders and control applications. PTPd, a Precision Time Protocol daemon synchronizes the realtime clocks on every computer. A simple flashgun triggers the start/stop/rewind functions of ambisonics decoding and audio playback. A simple light sensitive transistor circuit detects the flash and is connected to the GPIO17 of the RasPis. The ambisonic decoders are implemented in Csound, I did not used the bformdec-opcode, because I wanted a dual band decoder. The rest of the code is done in C, with direct register access capabilities. The ambisonic setup was quadratic, but I needed an extra subwoofer because of the bandlimited speakers (they were to tiny). Different system design concepts were studied: Trigger method (IRQ-Handling vs. polling, GPIO direct register access vs. use of function libraries), Scheduler (Realtime, non-RT), interprocess communication (TclCsound, CsoundAPI) and playback location (SD card, RAM). In 28 series of measurement, these were the results: PTPd synchronizes the clocks to an accuracy of 4ms. GPIO events are detected within 20µs (11µs on average). Min-Max differences are at 6 µs. The response time from GPIO detection to the first audio sample on audio output connector takes around 13ms. Relative latency of two decoders are at average 4.8ms (worst 7.6ms). Relative latency of four decoders are at average 6.6ms (worst 12.4ms). Best system design resulted in: GPIO direct register access (what news for an embedded engineer!), realtime scheduler (FIFO) and wavetable playback (RAM). But all series of measurement were relatively close together (the most significantly difference were to establish the GPIO direct register access and get rid of IRQ at user space domain and GPIO function libraries). It can be followed: PTPd can't be wisely used for highly accurate synchronizing clocks in 802.11n. But it helped to avoid drifting in audio playback. GPIO-Events are detected within the latency of <20µs. But the relative latency of the decoders among themselves are about 500 times and more higher. The timing constraints have not been complied. The more decoders are in the system, the greater the relative latencies. I think, the best way to get the relative latencies down, is to build a more efficient decoder. I can't overlook the effectiveness of the Csound internals, but maybe you are better with a native-C approach. If I'll find the time I will take a look into FAUST (as Aaron mentions this from time to time). Using the flash trigger is good for working in a laboratory environment, but not for a real world application. It's error-prone and easy to sabotage. But as long as you don't get the clocks better synchronized than some several microseconds, you don't need to think about establishing a "presentation time" like in AVB (IEEE 802.1 and IEEE 1722), where you send a data package to the decoder clients, which contains the time when they should start their playback (simply said). Why PTPd synchronizes so bad is because of the "Distributed Coordination Function" in 802.11 which leaves the media access control self-governing, no master tells the clients when to send their data packages. Even QoS functions doesn't help here. Is there anyone who owns one of the Raumfeld-Systems by Teufel? Seems they manage sample-accurate playback wirelessly, but only stereo. I used the RPi computer because it is cheap. Sure, the situation is completely different if I would use a different SoC like the better Pandaboards or the SabreLite. But If you think about the costs if you need like ten decoders or more, then the price matters a lot. To mention the 'not representative' hearing tests: After getting everything under control and when I realised what I've been doing, the system sounded not bad at all. Even in the diploma presentation/examination it was astonishing how well the instruments could be located. I played the Toccata for percussion by Ney Rosauro, recorded by Paul Hodges. It could also be that I simply had luck with the timing. Thanks, Sven Thebert -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.music.vt.edu/mailman/private/sursound/attachments/20130925/05ba846d/attachment.html> _______________________________________________ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound