Hi Patrick,
In this particular case, at the end of my tasks I have X different types of
keys. I need to write their values to X different files respectively. For
now I'm writing everything to the driver node's local FS.
While the number of key-value pairs can grow to millions (billions?), X is
mo
Nilesh - out of curiosity - what operation are you doing on the values
for the key?
On Sun, May 25, 2014 at 6:35 PM, Nilesh wrote:
> Hi Andrew,
>
> Thanks for the reply!
>
> It's clearer about the API part now. That's what I wanted to know.
>
> Wow, tuples, why didn't that occur to me. That's a l
Hi Andrew,
Thanks for the reply!
It's clearer about the API part now. That's what I wanted to know.
Wow, tuples, why didn't that occur to me. That's a lovely ugly hack. :) I
also came across something that solved my real problem though - the
RDD.toLocalIterator method from 1.0, the logic of whic
Hi,
I have view the code about UGI in spark. If spark interactive with kerberos
HDFS, The spark will apply delegate token in scheduler side, and stored as
credential into the UGI; And the credential will be transferred to spark
executor so that they can authenticate the HDFS. My question is
Hi Nilesh,
That change from Matei to change (Key, Seq[Value]) into (Key,
Iterable[Value]) was to enable the optimization in future releases without
breaking the API. Currently though, all values on a single key are still
held in memory on a single machine.
The way I've gotten around this is by i
I would like to clarify something. Matei mentioned that in Spark 1.0 groupBy
returns an (Key, Iterable[Value]) instead of (Key, Seq[Value]). Does this
also automatically assure us that the whole Iterable[Value] is not in fact
stored in memory? That is to say, with 1.0, will it be possible to do
gro