Hi Brian, On 17.02.21 16:55, Brian Padalino wrote: > On Wed, Feb 17, 2021 at 10:05 AM Marcus Müller <muel...@kit.edu > <mailto:muel...@kit.edu>> > wrote: > > Oh, sorry, didn't mean to imply that! FFT interpolation might work well > (if you can live > with the sinc sidelobes). > > > If the bandwidth is already constrained to 20MHz/23MHz, then there would be > no sidelobes - > correct?
What the FFT resampler does is, in the end, interpolation with 23 sincs, each at the "original" spectral position. These sincs' sidelobes of course extent left and right beyond the 23 original "carriers" (thinking of OFDM here)! The question here is whether that matters: if the bandwidth of interest is -10 to +10 MHz, what do I care about what happens beyond that? The answer to that quesiton lies in Mark's application :) > I do have a question, though: Why do you go for a 4600-FFT? why not > simply a 23 FFT? > > Larger n yields greater efficiencies, right? Doing lots of small calls isn't > necessarily > as efficient as doing less large calls, so long as you can handle the latency > and the > processor can stay fed with data. I figure ~4000 samples is a good > compromise, but maybe > larger ones work out better, too. Well, quick calc, and acting as if O-notation was actually meaningful on hardware (see David's answer, he's got a point about caches: on my CPU, a 32×32 bit multiplication instruction (non-SIMD) takes about 1/400 of getting memory from random positions in RAM (cache misses), on average, IIRC. If we've got M samples to process, and we're doing N_1 or N_2=100·N_1 length transforms, the complexity of the first is a = O( (M/N_1) · N_·log(N_1) ) = O( M · log(N_1) ) of the second, it'd be O( (M/100/N_1) · 100·N_1 log(100·N_1) ) = O( M · log(100·N_1) ) = O( M · log(N_1) + M log(100) ) which is worse, but: > > For reference, here's some performance benchmarks for FFTW (randomly chosen > processor): > > http://www.fftw.org/speed/Pentium4-3.06GHz/ > <http://www.fftw.org/speed/Pentium4-3.06GHz/> > > Single precision, non powers of 2, 1d transforms are some good benchmarks to > look at for > something like this. Obviously some numbers are better than others, but the > general trend > seems to skew towards the mid-thousands as being a decent number to target > for maximizing > throughput. Real-world data, and I fully agree, always trumps O-notation (which simply is a bad model for any real-world problem, usually). Luckily, I've had this same problem before, so... See attachment! You should be able to call it like fft_rate -l 46 to set the FFT length to 46, and add a -n 10 to make the FFT do 10 threads (which usually hurts more than it helps). (Need to multiply the reported rate by the FFT length, it counts transforms, not samples.) The single-threaded 2²⁴-FFT I do runs at ca. 34.5 MS/s throughput. If I run fft_rate -l $(( 23*2*100 )) I get rougly 87.4 MS/s. If I run at fft_rate -l $(( 23*2 )) I get ca 92 MS/s. I'd call this a clear "meh" result! Best regards, Marcus
fft_rate.grc
Description: application/gnuradio-grc
#!/usr/bin/env python3 # -*- coding: utf-8 -*- # # SPDX-License-Identifier: GPL-3.0 # # GNU Radio Python Flow Graph # Title: FFT Rate Tester # Author: Marcus Müller # GNU Radio version: 3.9.0.0 from gnuradio import blocks from gnuradio import fft from gnuradio.fft import window from gnuradio import gr from gnuradio.filter import firdes import sys import signal from argparse import ArgumentParser from gnuradio.eng_arg import eng_float, intx from gnuradio import eng_notation class fft_rate(gr.top_block): def __init__(self, fftlen=2**20, threads=1): gr.top_block.__init__(self, "FFT Rate Tester", catch_exceptions=True) ################################################## # Parameters ################################################## self.fftlen = fftlen self.threads = threads ################################################## # Blocks ################################################## self.fft_vxx_0 = fft.fft_vcc(fftlen, True, [], True, 1) self.blocks_vector_source_x_0 = blocks.vector_source_c(list(range(fftlen))*4, True, fftlen, []) self.blocks_probe_rate_0 = blocks.probe_rate(gr.sizeof_gr_complex*fftlen, 250, 0.025) self.blocks_null_sink_0 = blocks.null_sink(gr.sizeof_gr_complex*fftlen) self.blocks_message_debug_0 = blocks.message_debug(True) ################################################## # Connections ################################################## self.msg_connect((self.blocks_probe_rate_0, 'rate'), (self.blocks_message_debug_0, 'print')) self.connect((self.blocks_vector_source_x_0, 0), (self.blocks_probe_rate_0, 0)) self.connect((self.blocks_vector_source_x_0, 0), (self.fft_vxx_0, 0)) self.connect((self.fft_vxx_0, 0), (self.blocks_null_sink_0, 0)) def get_fftlen(self): return self.fftlen def set_fftlen(self, fftlen): self.fftlen = fftlen self.blocks_vector_source_x_0.set_data(list(range(self.fftlen))*4, []) def get_threads(self): return self.threads def set_threads(self, threads): self.threads = threads def argument_parser(): parser = ArgumentParser() parser.add_argument( "-l", "--fftlen", dest="fftlen", type=intx, default=2**20, help="Set FFT Length [default=%(default)r]") parser.add_argument( "-n", "--threads", dest="threads", type=intx, default=1, help="Set Number of Threads [default=%(default)r]") return parser def main(top_block_cls=fft_rate, options=None): if options is None: options = argument_parser().parse_args() tb = top_block_cls(fftlen=options.fftlen, threads=options.threads) def sig_handler(sig=None, frame=None): tb.stop() tb.wait() sys.exit(0) signal.signal(signal.SIGINT, sig_handler) signal.signal(signal.SIGTERM, sig_handler) tb.start() tb.wait() if __name__ == '__main__': main()
smime.p7s
Description: S/MIME Cryptographic Signature