Hi all,
I only have one stage which is "mapToPair" and inside the function, I have a
for loop which will do about 133433 times.
But then it becomes slow, when I replace 133433 with just 133, it works very
fast.
But I think this is just a simple operation even in normal Java.
You can look at th
*Problem Description*:
The program running in stand-alone spark cluster (1 master, 6 workers with
8g ram and 2 cores).
Input: a 468MB file with 133433 records stored in HDFS.
Output: just 2MB file will stored in HDFS
The program has two map operations and one reduceByKey operation.
Finally I save
Sure, the code is very simple. I think u guys can understand from the main
function.
public class Test1 {
public static double[][] createBroadcastPoints(String localPointPath,
int
row, int col) throws IOException{
BufferedReader br = RAWF.reader(localPointPath);
Hi,
Just check the logs of datanode, it looks like this:
*
2015-05-20 11:42:14,605 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.9.0.48:50676, dest: /10.9.0.17:50010, bytes: 134217728, op: HDFS_WRITE,
cliID: DFSClient_NONMAPREDUCE_804680172_54, offset: 0, srvID:
39fb78
Sorry, bt how does that work?
Can u specify the detail about the problem?
On 20 May 2015 at 21:32, oubrik [via Apache Spark User List] <
ml-node+s1001560n2295...@n3.nabble.com> wrote:
> hi,
> try like thiis
>
> DataFrame df = sqlContext.load("com.databricks.spark.csv", options);
> df.select("year
The variable I need to broadcast is just 468 MB.
When broadcasting, it just “stop” at here:
*15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is
deprecated. Inste
Hi All,
The variable I need to broadcast is just 468 MB.
When broadcasting, it just “stop” at here:
*
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is
deprecated. I