On 8/11/21 06:04, Bill Somerville via wsjt-devel wrote:
another option that requires no code changes is as follows:
1) Create wisdom as above but without the '-T 1' option,
2) Rename the newly created wisdom to wspr_wisdom.dat in the expected
directory,
3) Make the wspr_wisdom.dat file non-writable,
4) Symlink or hardlink the file into every location needed by multiple
instances of wsprd.
The existing plans will be "upgraded" from their default of
FFTW_ESTIMATE to FFTW_PATIENT for the specified FFT types and sizes,
and there will be no file locking requirement as attempts by wsprd to
update the wisdom file on exit will silently fail.
Hi Bill, that's certainly one good and simple way to get wsprd to use
"wiser" wisdom. But I use FFTW a lot in my own applications (mainly for
fast convolution, which I use for frequency shifting, filtering and
downsampling) and it just seems nice and elegant to have a common wisdom
file for everything I do. I don't see any downside in having wsprd
import system wisdom, as it's a harmless no-op on Windows and wsprd will
still import "local" wisdom if you want.
The only complication, as I mentioned, is that non-threaded "wisdom" is
incompatible with threaded wisdom, even a single thread. I haven't dug
into FFTW to find out why, but it was easy enough to just invoke
threading in all my applications and set the number of threads to 1 if I
don't really want more. To be honest, adding threads isn't a huge win;
speed scales much less quickly than linearly with additional threads.
But it can make a difference when you have a realtime deadline on a
slower CPU (e.g., Raspberry Pi.)
I've begun to look at the wspr_timer.out file. The 69% that my system
now spends in my Fano decoder really jumped out. Whoa! But I shouldn't
be terribly surprised. One of the reasons sequential decoding (including
Fano) isn't used much anymore is that it's a poor match to modern CPUs.
Sequential decoding does a lot of data-dependent program branching, and
this defeats the branch prediction that modern CPUs rely on to keep
their deep pipelines full. Every time a prediction is wrong, the CPU
comes to a screeching halt as the pipelines are flushed and reloaded
from the correct execution point. I've gotten much better results with
my Viterbi decoder, which I hand-optimized to make full use of the
vector hardware. And there's no data-dependent branching.
Hmm. I did a k=24 Viterbi decoder for the ISEE-3 recovery project in
2014. (The complexity of Viterbi decoding increases exponentially with
k, and k=7 is a more typical number, so k=24 is huge.) IIRC, I had it
running at about 230 b/s on the computer I had then. Maybe, just for
fun, I could try one for the k=32 code in WSPR. It has two full minutes
to decode only 50 bits...!
There are still ways to speed up Fano decoding, though. The most obvious
would be multithreading to take advantage of multicore CPUs. Create a
set of "server" threads, each running a Fano decoder, and have them work
in parallel on various frequency/time hypotheses. To keep a really deep
search with a high limit from bogging things down, a decoder thread
could begin each attempt at normal CPU priority and then progressively
reduce it (make itself less important) as it spends much more than 1
decoder move/bit. Transmissions with good SNR would continue to decode
quickly while the system would still try to dig out the really weak ones
on an as-available basis, limited only by your tolerance for false decodes.
73, Phil
_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel