On 8/11/21 06:04, Bill Somerville via wsjt-devel wrote:


another option that requires no code changes is as follows:

1) Create wisdom as above but without the '-T 1' option,
2) Rename the newly created wisdom to wspr_wisdom.dat in the expected directory,
3) Make the wspr_wisdom.dat file non-writable,
4) Symlink or hardlink the file into every location needed by multiple instances of wsprd.

The existing plans will be "upgraded" from their default of FFTW_ESTIMATE to FFTW_PATIENT for the specified FFT types and sizes, and there will be no file locking requirement as attempts by wsprd to update the wisdom file on exit will silently fail.


Hi Bill, that's certainly one good and simple way to get wsprd to use "wiser" wisdom. But I use FFTW a lot in my own applications (mainly for fast convolution, which I use for frequency shifting, filtering and downsampling) and it just seems nice and elegant to have a common wisdom file for everything I do. I don't see any downside in having wsprd import system wisdom, as it's a harmless no-op on Windows and wsprd will still import "local" wisdom if you want.

The only complication, as I mentioned, is that non-threaded "wisdom" is incompatible with threaded wisdom, even a single thread. I haven't dug into FFTW to find out why, but it was easy enough to just invoke threading in all my applications and set the number of threads to 1 if I don't really want more. To be honest, adding threads isn't a huge win; speed scales much less quickly than linearly with additional threads. But it can make a difference when you have a realtime deadline on a slower CPU (e.g., Raspberry Pi.)

I've begun to look at the wspr_timer.out file. The 69% that my system now spends in my Fano decoder really jumped out. Whoa! But I shouldn't be terribly surprised. One of the reasons sequential decoding (including Fano) isn't used much anymore is that it's a poor match to modern CPUs. Sequential decoding does a lot of data-dependent program branching, and this defeats the branch prediction that modern CPUs rely on to keep their deep pipelines full. Every time a prediction is wrong, the CPU comes to a screeching halt as the pipelines are flushed and reloaded from the correct execution point. I've gotten much better results with my Viterbi decoder, which I hand-optimized to make full use of the vector hardware. And there's no data-dependent branching.

Hmm. I did a k=24 Viterbi decoder for the ISEE-3 recovery project in 2014. (The complexity of Viterbi decoding increases exponentially with k, and k=7 is a more typical number, so k=24 is huge.) IIRC, I had it running at about 230 b/s on the computer I had then. Maybe, just for fun, I could try one for the k=32 code in WSPR. It has two full minutes to decode only 50 bits...!

There are still ways to speed up Fano decoding, though. The most obvious would be multithreading to take advantage of multicore CPUs. Create a set of "server" threads, each running a Fano decoder, and have them work in parallel on various frequency/time hypotheses. To keep a really deep search with a high limit from bogging things down, a decoder thread could begin each attempt at normal CPU priority and then progressively reduce it (make itself less important) as it spends much more than 1 decoder move/bit. Transmissions with good SNR would continue to decode quickly while the system would still try to dig out the really weak ones on an as-available basis, limited only by your tolerance for false decodes.

73, Phil





_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to