Hi, all
With Spark 1.0 RC3, I found that the executor processes are still there even I
killed the app and the workers?
Any one found the same problem (maybe also exist in other versions)?
Best,
--
Nan Zhu
Hi Spark community,
I have a design/algorithm question that I assume is common enough for
someone else to have tackled before. I have an RDD of time-series data
formatted as time-value tuples, RDD[(Double, Double)], and am trying to
extract threshold crossings. In order to do so, I first want to t
How about ...
val data = sc.parallelize(Array((1,0.05),(2,0.10),(3,0.15)))
val pairs = data.join(data.map(t => (t._1 + 1, t._2)))
It's a self-join, but one copy has its ID incremented by 1. I don't
know if it's performant but works, although output is more like:
(2,(0.1,0.05))
(3,(0.15,0.1))
On
is there something wrong with the mailing list? very few people see my thread
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/os-buffer-cache-does-not-cache-shuffle-output-file-tp5478p5521.html
Sent from the Apache Spark User List mailing list archive at Nab
Hi,
For each line that we read as textLine from HDFS, we have a schema..if
there is an API that takes the schema as List[Symbol] and maps each token
to the Symbol it will be helpful...
One solution is to keep data on hdfs as avro/protobuf serialized objects
but not sure if that works on HBase inp
Hi, all
i'am tuning my app in local mode, and found there was lots of time spent
in local block fetch.
in stage1: i read in input data, and do a repartition,
in stage2: i do some operation on the repartitioned RDD, so it involves a
local block fetch, i find that
the fetch
yes it seems broken. i got only a few emails in last few days
On Fri, May 9, 2014 at 7:24 AM, wxhsdp wrote:
> is there something wrong with the mailing list? very few people see my
> thread
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/os-buffe
i run in spark 1.0.0, the newest under-development version.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5480.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Seems the mailing list was broken when you sent your original question, so
I appended it to the end of this message.
"Buffers" is relatively unimportant in today's Linux kernel; "cache" is
used for both writing and reading [1].
What you are seeing seems to be the expected behavior: the data is wri
For example, this app just reads a 4GB file and writes a copy of it. It
takes 41 seconds to write the file, then 3 more minutes to move all the
temporary files.
I guess this is an issue with the hadoop / jets3t code layer, not Spark.
14/05/06 20:11:41 INFO TaskSetManager: Finished TID 63 in 8688
I've a Spark cluster with 3 worker nodes.
- *Workers:* 3
- *Cores:* 48 Total, 48 Used
- *Memory:* 469.8 GB Total, 72.0 GB Used
I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB
compressed and 11GB uncompressed.
When I try to read the compressed file from HDFS i
11 matches
Mail list logo