i don't need more, per se... i just need to watch the size of the variable;
then, if it's within the size limit, go ahead and broadcast it; if not, then
i won't broadcast...
so, that would be a yes then? (2 GB, or which is it exactly?)
--
Sent from: http://apache-spark-user-list.1001560.n3.nabb
is it currently still ~2GB (Integer.MAX_VALUE) ??
or am i misinformed, since that's what google-search and scouring this
mailing list seem to say... ?
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--
okie, i may have found an alternate/workaround to using .collect() for what i
am trying to achieve...
initially, for the Spark application that i am working on, i would call
.collect() on two separate RDDs into a couple of ArrayLists (which was the
reason i was asking what the size limit on the dr
i am currently trying to find a workaround for the Spark application i am
working on so that it does not have to use .collect()
but, for now, it is going to have to use .collect()
what is the size limit (memory for the driver) of RDD file that .collect()
can work with?
i've been scouring google-
newb question...
say, memory per node is 16GB for 6 nodes (for a total of 96GB for the
cluster)
is 16GB the max amount of memory that can be allocated to driver? (since, it
is, after all, 16GB per node)
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
well... it turns out, that extra part-* file goes away when i limit
--num-executors to 1 or 2 (leaving it to default maxes it out, which in turn
gives an extra empty part-file)
i guess the test data i'm using only requires that many executors
--
Sent from: http://apache-spark-user-list.1001
the spark job succeeds (and with correct output), except there is always an
extra part-* file, and it is empty...
i even set number of partitions to only 2 via spark-submit, but there is
still a 3rd, empty, part-file that shows up.
why does it do that? how to fix?
Thank you
--
Sent from
okie, well...
i'm working with a pair rdd
i need to extract the values and store them somehow (maybe a simple
Array??), which i later parallelize and reuse
since adding to a list is a no-no, what, if any, are the other options?
(Java Spark, btw)
thanks
--
Sent from: http://apache-spark-u
it gives null pointer exception...
is there a workaround for adding to an arrayList during .foreach of an rdd?
thank you
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsub
for clarification...
.saveAsTextFile(rdd) writes to local fs, but not hdfs
anyone?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
when i run a job with .setMaster(local[*]), the output is as expected...
but when i run it using YARN (single node, pseudo-distributed hdfs) via
spark-submit, the output is fudged - instead of key-value pairs, it only
shows one value preceded by a comma, and the rest are blank
what am i missing
11 matches
Mail list logo