On 01/03/2016 02:06 AM, Otmar Ertl wrote:
Am 03.01.2016 7:49 vorm. schrieb "Ole Ersoy" <ole.er...@gmail.com>:
Hi,

I ran another test using a single parallel loop for array based matrix
vector multiplication.  Throughput almost tripled (Test pasted at bottom):
# Run complete. Total time: 00:13:24


Benchmark                                      Mode  Cnt Score Error
Units
MultiplyBenchmark.parallelMultiplication      thrpt  200  2221.682 ±
48.689  ops/s
MultiplyBenchmark.singleThreadMultiplication  thrpt  200   818.755 ±
9.782  ops/s
public class MultiplyBenchmark {

     public static double[] multiplySingleThreaded(double[][] matrix,
double[] vector) {
         return Arrays.stream(matrix)
                 .mapToDouble(row -> IntStream.range(0,
row.length).mapToDouble(col -> row[col]
                         * vector[col]).sum())
                 .toArray();
     }

     public static double[] multiplyConcurrent(double[][] matrix, double[]
vector) {
         return Arrays.stream(matrix).parallel()
                 .mapToDouble(row -> IntStream.range(0,
row.length).mapToDouble(col -> row[col]
                         * vector[col]).sum())
                 .toArray();
     }

     @State(Scope.Thread)
     public static class Matrix {
         static int size = 10000;
         static double[] vector = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

         public static double[][] matrix = new double[size][10];

         static {
             for (int i = 0; i < size; i++) {
                 matrix[i] = vector.clone();
             }
         }
     }

     @Benchmark
     public void singleThreadMultiplication(Matrix m) {
         multiplySingleThreaded(m.matrix, m.vector);
     }

     @Benchmark
     public void parallelMultiplication(Matrix m) {
         multiplyConcurrent(m.matrix, m.vector);

     }
}

Cheers,
Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-help@ <dev-h...@commons.apache.org>
commons.apache.org <dev-h...@commons.apache.org>
I am curious to see how this compares to simple for-loops which I can
imagine help the JIT compiler to do loop unrolling and to make use of
instruction-level parallelism.
According to the person that helped out initially on stackoverflow the stream 
based loop is slightly faster.  In his experiment the for loop took 100 seconds 
and the stream did it in 89 seconds.

http://stackoverflow.com/questions/34519952/java-8-matrix-vector-multiplication

Not so sure about that though.  Just read up on the below article and, like you 
are saying, there are some tricks for making for loops very fast:
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html

Looking at the results per the article, sticking to primitives and for loops 
can be wildly faster than streams.  Looks like I'm going to have to follow her 
advice and benchmark a lot :).

Cheers,
Ole

Reply via email to