Hi,

I am trying graphx on live journal data. I have a cluster of 17 computing
nodes, 1 master and 16 workers. I had few questions about this. 
* I built spark from spark-master (to avoid partitionBy error of spark 1.0). 
* I am using edgeFileList() to load data and I figured I need to provide
partitions I want. the exact syntax I am using is following
val graph = GraphLoader.edgeListFile(sc,
"filepath",true,64).partitionBy(PartitionStrategy.RandomVertexCut)

-- Is it a correct way to load file to get best performance?
-- What should be the partition size? =computing node or =cores?
-- I see following error so many times in my logs, 
ERROR BlockManagerWorker: Exception handling buffer message
java.io.NotSerializableException:
org.apache.spark.graphx.impl.ShippableVertexPartition
Does it suggest that my graph wasn't partitioned properly? I suspect it
affects performance ?

Please suggest whether I'm following every step (correctly)

Thanks in advance,
-Shreyansh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-optimal-partitions-for-a-graph-and-error-in-logs-tp9455.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to