On Sun, 12 Jun 2022 at 16:55, Sumanth Rajkumar <rajkumar.suma...@gmail.com> wrote:
> > > I have provided both approaches. Functional interfaces that operate on > single complex and on Arrays > Thanks for the examples. > <-- SNIP --> > > Here the implementation including iteration/cursors has moved to the > array based Lambda. > For Complex arrays, I think it is beneficial to move the iteration logic > from data storage (ComplexList) to the Lambda (ComplexFunctions::conj) > Possible mix up between lambda and method reference? I certainly would not want to write a lambda function (anonymous method) that is responsible for iteration; I would stick to providing a simple unary operator that can be passed to a list to act on each member. Re: Iteration logic in the array method. I am not sure about this. If the underlying data is not linear (e.g. a 2D double[][]) then performing a range based method on it may not be optimal. It should be up to the implementation to decide the optimal iteration over the underlying data, possibly specialising a spliterator to split based on stripes of the underlying data. Anyway, this set of examples for item-based or range-based approaches confers no benefit to the range based approach. It just does a potentially sub-optimal iteration of a unary operator on each item. But this is due to the example not being a true range based operation. I think the range based API should be based around operations that require access to more than one element in the range in order to function. For example this would be transform operations. All the other examples we have act on single complex numbers. In this case an operator is provided to act on the complex and create a complex result. This can be passed to the container which will optimally iterate over the data, or sub-range of, and apply the operation. One issue is parallelisation using a stream. By its nature the stream acts on single items; the results are then collated at the end. The stream does not hold indices for each item. So you cannot use a terminal foreach() method on each sub-range created by splitting and write back to the original storage. The stream would have to be declared as ordered to maintain the original order for the output and this can be collated at the end to new storage. To use the stream API with a parallel implementation and zero-allocation for the terminal operation would require the stream contains an item that allows read and write. So you can use ComplexDoubleArray: Stream<ComplexDoubleArray> s; Consumer<ComplexDoubleArray> op; s.forEach(op); Here the operator has to have an array that knows its start and length. This would be more like a buffer. Each terminal forEach would require a different instance of ComplexDoubleArray that holds its own start and length. These may be accessed directly or via an iterator that provides read and write to each element in the range. The ComplexDoubleArray can wrap the same underlying data so the only allocation is the object in the stream wrapping the data and a range to act on. This would be N objects allocated equal to the number of threads. Then you write a function that accepts a ComplexDoubleArray and writes back to the same array. I do not see how this is any better than streaming a mutable complex: Stream<ComplexNumber> s; Consumer<ComplexNumber> op; s.forEach(op); ComplexNumber simply has read/write for real and imaginary parts and your operator can do what it requires. Here ComplexNumber is some object allocated once in the forEach method that acts as a cursor over the underlying data. However the stream examples can be misused. The ultimate method can accept the stream objects and use them for something else. For example the ComplexDoubleArray covering part of the range can then be added to (assuming the ComplexDoubleArray is also a List<Complex>). So the sub-range should be a correct subList implementation. The ComplexNumber cursor could be added to a List<ComplexNumber> which would have many entries but ultimately only N actual numbers (N=parallel threads) if the ComplexNumber was a recycled cursor object. So the specialised stream methods should not be public. Instead you could create your operators using factory methods. Here are examples for single-thread and parallel implementation. The parallel implementation uses the stream API to efficiently divide the range. public class ComplexFunctions { public static ComplexDoubleUnaryOperator create(ComplexFunction<Void> operator) { return new ComplexDoubleUnaryOperator() { @Override public ComplexDoubleArray apply(ComplexDoubleArray input, ComplexDoubleArray result) { for (int i = 0; i < input.size(); i++) { final int index = i; ComplexDouble c = input.get(i); operator.apply(c.real(), c.imag(), (re, im) -> { result.setValue(index, re, im); return null; }); } return result; } }; } public static ComplexDoubleUnaryOperator parallel(ComplexFunction<Void> operator) { return new ComplexDoubleUnaryOperator() { @Override public ComplexDoubleArray apply(ComplexDoubleArray input, ComplexDoubleArray result) { IntStream.range(0, input.size()).parallel().forEach(i -> { ComplexDouble c = input.get(i); operator.apply(c.real(), c.imag(), (re, im) -> { result.setValue(i, re, im); return null; }); }); return result; } }; } } All this requires is the ComplexDoubleArray to provide efficient get/set methods with an index. ComplexDoubleArray a; a.apply(ComplexFunctions.create(ComplexFunctions::conj)); a.apply(ComplexFunctions.parallel(ComplexFunctions::conj)); a.apply(ComplexFunctions2.parallel( (r, i, result) -> result.apply(r, 0))); Note that here we can reuse all the ISO c99 functions that operate on a single number. They need not be re-coded for each ComplexDoubleUnaryOperator. So I am seeing a requirement to have the item based API. The range based API needs more concrete implementation examples to solidify how it should work. > > What are you streaming? ComplexDoubleArray sub-arrays? Would the > > ComplexParrallelStreamFunctions class be responsible for creating the > > Spliterator for this to control sub-ranges? > > > > Yes. The parallelization details would be abstracted by > ComplexParrallelFunctions > There may be issues with streams regarding the choice of the ranges to stream in parallel. Ideally the container should control how the stream can be split. In my example above using a simple range split with an index in an IntStream the split may be sub-optimal. > > The developer could choose explicitly to call > list.apply(ComplexFunctions::conj) or > list.apply(ComplexParallelFunctions::conj) > I am not sure we want to provide all the ISO c99 implementations in duplicate. They are defined on Complex and should be implemented once. The list is just a container to allow efficient storage. I do not think it should define the ISO c99 functions in the interface. > > > ComplexList.conj() could choose to apply sequential or parallel etc based > on some config. > > > > https://github.com/sumanth-rajkumar/commons-numbers/blob/1f013bfcf7d92c4daa160284c3d6a2ec26fd6464/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/ComplexList.java#L683 > > If you look at how this can be used then there is a lot of excess object > > overhead here when you wish to write code for complex number computation > in > > an OO way: > > See org.apache.commons.math4.legacy.analysis.solvers.LaguerreSolver > > > > I have included both implementations as explained above. > I have tried to avoid excess object creation in the Array approach, > > Even when using Array approach for single Complex instance, object creation > overhead is same as Single number approach > It is unusual to have setters return a new object. This is what is required for Complex to implement ComplexDoubleArray setters to allow the operation to set the result. Essentially this requires the immutable final class Complex to implement an interface which is for a mutable object. So code reuse here is polluting the API. > Could you please take a quick look and confirm? > The unit test for both approaches for conj operation on ComplexList is here > > > https://github.com/sumanth-rajkumar/commons-numbers/blob/1f013bfcf7d92c4daa160284c3d6a2ec26fd6464/commons-numbers-complex/src/test/java/org/apache/commons/numbers/complex/ComplexListTest.java#L94 > > Bulk operations such as FFT would require the entire data to be operated > > on. So this would not work in parallel streams using sub-ranges. Using a > > 3rd party such as JTransforms would require extracting the data for their > > API. > > > > I see parallelizations in jtransform... > > https://github.com/wendykierp/JTransforms/blob/master/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java#L688 But the DoubleFFT_1D object is created with n = size of data. This uses precomputation for the size of arguments that will be transformed. Note that the object is in fact thread safe and can be reused after construction in concurrent threads. But the data passed to it must be of the correct size. In this case you do not process in parallel using a stream. The sub-ranges are carefully chosen and processed using a common thread pool created by JTransforms. The parallelism is configured based on the size of the data, below a certain size only a single thread is used. > > We can add more methods to ComplexDoubleArray to convert/to from for other > providers as needed such as DoubleBuffer formats etc.. > A utility class to map the ComplexDoubleArray to various output formats, perhaps specified by an enum (since there are not that many possible formats in 1D). But what of ND data? Thus far we have only considered how to operate on a 1D linear range. Conversions would require a way to specify the current format and the output format regarding striped (reals then imaginary), interleaved, paired arrays, etc. for all dimensions. If ND is represented in a linear array, this requires knowledge of the packing of the data, e.g.: index = y * maxx + x. There are typically such data structures in image libraries, often dealing with 5D data, that do this that could form inspiration. Typically data is arranged in a way that is optimal for the application and there are many possible combinations. E.g. image data is often multiples of 2D XY data, so may be a float[][] representing XYZ, Channel and Time. In the case of 2D image a FFT is performed by 2 1D transforms so you extract strips of X data, transform, then extract strips of Y from the partially transformed XY, and repeat. It may be that specifying a generic API for range operations on ND data may not be possible. It would require further reading to narrow down possible scope. > > > > Note: Vector functions on a single complex number is not applicable. > > > > > Yes, in general there won't be any ComplexVectorFunctions for the single > pair functional interface. > We provide ComplexVectorFunctions operations only for List based > operation that would benefit from Vector based SIMD parallelization > However if we pass a List of size 1, it would still work, but possibly > not perform as well. > The Vector API is for more than 1. The javadoc examples note that the number of lanes is typically 4 or 8. Any elements remaining after processing multiples of the lane count will be processed in a sequential loop using non-Vector API code, i.e. the method is coded twice. So a length=1 array is never going to use the Vector API. Alex