Hey Nick,
> Yes, that's kind of my fault.
"Dault" is kind of a hard word when it's your achievement that a C API VOLK got as far as
it took us!
> C++20 finally makes C++ a much less Lovecraftian nightmare
We're going to have template metaprogramming! SFinaE fhtagn!
Seriously, though. Operations as base classes, kernels / hardware specializations as
subclasses, and the actual call being a virtual operator() call...
Thinks about this: Instead of the dispatcher bending the addresses symbols from shared
libraries point to (as we do now), there could be a dispatcher object (possibly, but not
necessary a singleton) with members of the operation class type, which get assigned
instances of the optimal (according to prior volk_profile and/or heuristics) kernel
implementation subclass.
I.e., instead of using our current nice trick to save the address of the correct
&volk_sig_in_kernel_sig_out_arch_alignment in volk_sig_in_kernel_sig_out_aligment, we just
use the standard vtable/C++ polymorphism. Same performance at runtime - one CALL.
Immediate benefit: all these things suddenly become self-aware.
Let's add a self-documenting call that returns a const char* describing what this thing
does. That makes people very happy when they wrap things for Python, because now the type
comes with documentation in your IDE through little effort.
We can tell the user that this kernel prefers but doesn't need aligned memory. Or, much
more bug-relevant, we could communicate the acceptable input multiples, and stop doing the
cute "for the rest, we do the _general approach after the main loop is through in every
single _arch_alignment" implementation.
(of course, I have far more somber dreams, don't assume the Lovecraftian horrors are too
far from here. If each kernel implementation is a type, we can make these types have the
capability (optional trait) to give us the type encapsulating what the VOLK kernel does in
its "inner loop" (if applicable). Because C++ allows us to pass things like __mm256& and
const __mm256&, we can then simply compose new inner loops. And of course, instead of
implementing the same loop skeleton 200 times, we could just, for those kernels where the
inner loop is "simple", have one templated
template<typename nucleus_operation>
class loop_kernel {
using op = nucleus_operation;
operator()(std::span<op::in_type> in, ...) {
for(auto ptr = in.begin(); ptr < in.end(); ptr += op::simd_width)
op::operate(*ptr, ...)
}
};
or so.)
Cheers!
Marcus
On 21.12.21 20:25, Nick Foster wrote:
On Tue, Dec 21, 2021 at 3:29 AM Marcus Müller <mmuel...@gnuradio.org
<mailto:mmuel...@gnuradio.org>> wrote:
Hi Johannes,
I, for one, like it :) Especially since I honestly find void
volk_32fc_x2_s32fc_multiply_conjugate_add_32fc to be a teeny tiny bit
clunky and would
rather call a type-safe, overloaded function in a volk namespace called
multiply_conjugate_add.
Yes, that's kind of my fault. It was the best option we could come up with to be
rigorously type-specific in C, kind of a bespoke implementation of name mangling. The
original motivation, of course, was the VOLK dispatcher. C was a hard requirement at the
time, and I confess I don't remember why. I think it came down from namccart's original
donation of vectorized code.
I would be very happy to see VOLK move to C++ (or at least provide wrappers). I strongly
advocate for using C++20 -- std::span, variadic arguments, lambdas etc. seem tailor-made
for VOLK. Runtime dispatching could be positively elegant, compared to how it must be done
in C. And C++20 finally makes C++ a much less Lovecraftian nightmare of a language than
the one I learned from Stroustrop.
Nick
Re: RFC: can we have something like a wiki page (maybe on the VOLK repo?)
to collect
these
comments?
You mention spans, so C++-VOLK would be >= C++20?
Cheers,
Marcus
On 21.12.21 10:55, Johannes Demel wrote:
> Hi everyone,
>
> today I'd like to propose an idea for the future of VOLK. Currently,
VOLK is a C
library
> with a C++ interface and tooling that is written in C++.
>
> I propose to make VOLK a C++ library. Similar to e.g. UHD, we can add a
C interface if
> the need arises.
>
> This email serves as a request for comments. So go ahead.
>
> Benefits:
> - sane std::complex interface.
> - same compilation mode on all platforms.
> - Better dynamic kernel load management.
> - Option to use std::simd in the future
> - Less manual memory management (think vector, ...).
>
> Drawbacks:
> - It is a major effort.
> - VOLK won't be a C project anymore.
>
> Why do I propose this shift?
> VOLK segfaults on PowerPC architectures. This issue requires a breaking
API change
to be
> fixable. I tried to update the API to fix this isse.
> https://github.com/gnuradio/volk/pull/488
<https://github.com/gnuradio/volk/pull/488>
> It works with GCC and Clang but fails on MSVC.
> One might argue that PowerPC is an obscure architecture at this point
but new
> architectures might cause the same issue in the future. Also, VOLK tries
to be
portable
> and that kind of issue is a serious roadblock.
>
> How did we get into this mess?
> The current API is a workaround to make things work for a specific
compiler: MSVC.
MSVC
> does not support C `complex.h` at all. The trick to make things work
with MSVC is:
> compile VOLK in C++ mode and pretend it is a C++ library anyways.
> In turn `volk_complex.h` defines complex data types differently
depending if VOLK is
> included in C or C++. Finally, we just hope that the target platform
provides the same
> ABI for C complex and C++ complex. C complex and C++ complex are not
compatible.
> However, passing pointers around is.
> Thus, the proposed change does not affect Windows/MSVC users because
they were
excluded
> from our C API anyways. The bullet point: "same compilation mode on all
platforms"
> refers to this issue.
>
> Proposed timeline:
> Together with our re-licensing effort, we aim for a VOLK 3.0 release.
VOLK 3.0 is a
good
> target for breaking API changes.
>
> Effects:
> I'd like to make the transition to VOLK 3.0 as easy as possible. Thus,
I'd like to
keep
> an interface that hopefully doesn't require any code changes for VOLK
2.x users. A
> re-built of your application should be sufficient. However, we'd be able
to adopt a
> C++-ic API as well. e.g. use vectors, spans etc.
>
> The current implementation to detect and load the preferred
implementation at
runtime is
> hard to understand and easy to break. C++ should offer more accessible
tools to make
> this part easier.
>
> What about all the current kernels?
> We'd start with a new API and hide the old kernel code behind that
interface. We
come up
> with a new implementation structure and how to load it. Thus, we can
progressively
> convert to "new-style" implementations.
>
> Another bonus: std::simd
> Currently, std::simd is a proposal for C++23. Making VOLK a C++ lib
would allow us to
> eventually use std::simd in VOLK and thus make Comms DSP algorithms more
optimized on
> more platforms.
>
> Cheers
> Johannes
>