Hi all, I have an issue regarding execution on 1 machine VS 5 machines. If I execute the following code the results are not the same though I would expect them to be since the input file is the same. Do you have any suggestions?
Thanks in advance! Lydia ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.getConfig().setGlobalJobParameters(parameters); //read input file DataSet<Tuple3<Integer, Integer, Double>> matrixA = readMatrix(env, parameters.get("input")); //Approximate EigenVector by PowerIteration //get initial vector - which equals matrixA * [1, ... , 1] DataSet<Tuple3<Integer, Integer, Double>> initial0 = (matrixA.groupBy(0)).sum(2); DataSet<Tuple3<Integer, Integer, Double>> maximum = initial0.maxBy(2); //normalize by maximum value DataSet<Tuple3<Integer, Integer, Double>> initial= (initial0.cross(maximum)).map(new normalizeByMax()); //BulkIteration to find dominant eigenvector IterativeDataSet<Tuple3<Integer, Integer, Double>> iteration = initial.iterate(1); DataSet<Tuple3<Integer, Integer, Double>> intermediate = ((((((matrixA.join(iteration).where(1).equalTo(0)) .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2)).groupBy(0)).sum(2)). cross(((((((((matrixA.join(iteration).where(1).equalTo(0)) .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2))).groupBy(0)).sum(2)).sum(2))) .map(new normalizeByMax()); DataSet<Tuple3<Integer, Integer, Double>> diffs = (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter()); DataSet<Tuple3<Integer, Integer, Double>> eigenVector = iteration.closeWith(intermediate,diffs); eigenVector.writeAsCsv(parameters.get("output")); env.execute("Power Iteration");