The stage slow when I have for loop inside (Java)

2015-05-24 Thread allanjie
Hi all, I only have one stage which is "mapToPair" and inside the function, I have a for loop which will do about 133433 times. But then it becomes slow, when I replace 133433 with just 133, it works very fast. But I think this is just a simple operation even in normal Java. You can look at th

Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread allanjie
*Problem Description*: The program running in stand-alone spark cluster (1 master, 6 workers with 8g ram and 2 cores). Input: a 468MB file with 133433 records stored in HDFS. Output: just 2MB file will stored in HDFS The program has two map operations and one reduceByKey operation. Finally I save

Re: java program got Stuck at broadcasting

2015-05-21 Thread allanjie
Sure, the code is very simple. I think u guys can understand from the main function. public class Test1 { public static double[][] createBroadcastPoints(String localPointPath, int row, int col) throws IOException{ BufferedReader br = RAWF.reader(localPointPath);

Re: java program got Stuck at broadcasting

2015-05-21 Thread allanjie
Hi, Just check the logs of datanode, it looks like this: * 2015-05-20 11:42:14,605 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.9.0.48:50676, dest: /10.9.0.17:50010, bytes: 134217728, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_804680172_54, offset: 0, srvID: 39fb78

Re: save column values of DataFrame to text file

2015-05-20 Thread allanjie
Sorry, bt how does that work? Can u specify the detail about the problem? On 20 May 2015 at 21:32, oubrik [via Apache Spark User List] < ml-node+s1001560n2295...@n3.nabble.com> wrote: > hi, > try like thiis > > DataFrame df = sqlContext.load("com.databricks.spark.csv", options); > df.select("year

java program got Stuck at broadcasting

2015-05-20 Thread allanjie
The variable I need to broadcast is just 468 MB. When broadcasting, it just “stop” at here: *15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is deprecated. Inste

java program Get Stuck at broadcasting

2015-05-19 Thread allanjie
​Hi All, The variable I need to broadcast is just 468 MB. When broadcasting, it just “stop” at here: * 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is deprecated. I