Hi Vasia,

I have used a simple job (attached) to generate a file which looks like this:

0 0
1 1
2 2
...
456629 456629
456630 456630

I need the vertices to be generated from a file for my future work.

Cheers,
Mihail


On 18.03.2015 17:04, Vasiliki Kalavri wrote:
Hi Mihail, Robert,

I've tried reproducing this, but I couldn't.
I'm using the same twitter input graph from SNAP that you link to and also Scala IDE. The job finishes without a problem (both the SSSP example from Gelly and the unweighted version).

The only thing I changed to run your version was creating the graph from the edge set only, i.e. like this:

Graph<Long, Long, NullValue> graph = Graph.fromDataSet(edges,
new MapFunction<Long, Long>() {
public Long map(Long value) {
return Long.MAX_VALUE;
}
}, env);
Since the twitter input is an edge list, how do you generate the vertex dataset in your case?

Thanks,
-Vasia.

On 18 March 2015 at 16:54, Mihail Vieru <vi...@informatik.hu-berlin.de <mailto:vi...@informatik.hu-berlin.de>> wrote:

    Hi,

    great! Thanks!

    I really need this bug fixed because I'm laying the groundwork for
    my Diplom thesis and I need to be sure that the Gelly API is
    reliable and can handle large datasets as intended.

    Cheers,
    Mihail


    On 18.03.2015 15:40, Robert Waury wrote:
    Hi,

    I managed to reproduce the behavior and as far as I can tell it
    seems to be a problem with the memory allocation.

    I have filed a bug report in JIRA to get the attention of
    somebody who knows the runtime better than I do.

    https://issues.apache.org/jira/browse/FLINK-1734

    Cheers,
    Robert

    On Tue, Mar 17, 2015 at 3:52 PM, Mihail Vieru
    <vi...@informatik.hu-berlin.de
    <mailto:vi...@informatik.hu-berlin.de>> wrote:

        Hi Robert,

        thank you for your reply.

        I'm starting the job from the Scala IDE. So only one
        JobManager and one TaskManager in the same JVM.
        I've doubled the memory in the eclipse.ini settings but I
        still get the Exception.

        -vmargs
        -Xmx2048m
        -Xms100m
        -XX:MaxPermSize=512m

        Best,
        Mihail


        On 17.03.2015 10:11, Robert Waury wrote:
        Hi,

        can you tell me how much memory your job has and how many
        workers you are running?

        From the trace it seems the internal hash table allocated
        only 7 MB for the graph data and therefore runs out of
        memory pretty quickly.

        Skewed data could also be an issue but with a minimum of 5
        pages and a maximum of 8 it seems to be distributed fairly
        even to the different partitions.

        Cheers,
        Robert

        On Tue, Mar 17, 2015 at 1:25 AM, Mihail Vieru
        <vi...@informatik.hu-berlin.de
        <mailto:vi...@informatik.hu-berlin.de>> wrote:

            And the correct SSSPUnweighted attached.


            On 17.03.2015 01:23, Mihail Vieru wrote:

                Hi,

                I'm getting the following RuntimeException for an
                adaptation of the SingleSourceShortestPaths example
                using the Gelly API (see attachment). It's been
                adapted for unweighted graphs having vertices with
                Long values.

                As an input graph I'm using the social network graph
                (~200MB unpacked) from here:
                https://snap.stanford.edu/data/higgs-twitter.html

                For the small SSSPDataUnweighted graph (also
                attached) it terminates and computes the distances
                correctly.


                03/16/2015 17:18:23 IterationHead(WorksetIteration
                (Vertex-centric iteration
                
(org.apache.flink.graph.library.SingleSourceShortestPathsUnweighted$VertexDistanceUpdater@dca6fe4
                |
                
org.apache.flink.graph.library.SingleSourceShortestPathsUnweighted$MinDistanceMessenger@6577e8ce)))(2/4)
                switched to FAILED
                java.lang.RuntimeException: Memory ran out.
                Compaction failed. numPartitions: 32 minPartition: 5
                maxPartition: 8 number of overflow segments: 176
                bucketSize: 217 Overall memory: 20316160 Partition
                memory: 7208960 Message: Index: 8, Size: 7
                    at
                
org.apache.flink.runtime.operators.hash.CompactingHashTable.insert(CompactingHashTable.java:390)
                    at
                
org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTable(CompactingHashTable.java:337)
                    at
                
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:216)
                    at
                
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:278)
                    at
                
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
                    at
                
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205)
                    at java.lang.Thread.run(Thread.java:745)


                Best,
                Mihail








package graphdistance;

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.util.ArrayList;
import java.util.List;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

/**
 * 
 */
public class GenerateVerticesOneToN {

	//
	//	Program
	//

	public static void main(String[] args) throws Exception {
	
		String outputPath = "/home/vieru/dev/flink-experiments/data/social_network.verticeslist";

		// set up the execution environment
		final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

		List<Tuple2<Long, Long>> src = new ArrayList<Tuple2<Long, Long>>();
		src.add(new Tuple2<Long, Long>(0L, 0L));

		DataSet<Tuple2<Long, Long>> vertices = env.fromCollection(src).flatMap(
				new verticeGen()
				);
				
		// write result
		vertices.writeAsCsv(outputPath, "\n", " ");

		// execute program
		env.execute("GenerateVerticesOneToN");

		
	}
////////
//
// 	User Functions
//
	@SuppressWarnings("serial")
	public static final class verticeGen implements FlatMapFunction<Tuple2<Long,Long>, Tuple2<Long,Long>> {

		@Override
		public void flatMap(Tuple2<Long,Long> value, Collector<Tuple2<Long,Long>> out) {
			long nVert = 456630L;
			for (long i = 0; i<= nVert ; i++) {
				out.collect(new Tuple2<Long, Long> (i,i));
			}
		}
	}
	
////////	
}

Reply via email to