from:"klrmowse"

Re: Broadcast variable size limit?

2018-08-05 Thread klrmowse

i don't need more, per se... i just need to watch the size of the variable; then, if it's within the size limit, go ahead and broadcast it; if not, then i won't broadcast... so, that would be a yes then? (2 GB, or which is it exactly?) -- Sent from: http://apache-spark-user-list.1001560.n3.nabb

Broadcast variable size limit?

2018-08-05 Thread klrmowse

is it currently still ~2GB (Integer.MAX_VALUE) ?? or am i misinformed, since that's what google-search and scouring this mailing list seem to say... ? Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-05-01 Thread klrmowse

okie, i may have found an alternate/workaround to using .collect() for what i am trying to achieve... initially, for the Spark application that i am working on, i would call .collect() on two separate RDDs into a couple of ArrayLists (which was the reason i was asking what the size limit on the dr

[Spark 2.x Core] .collect() size limit

2018-04-28 Thread klrmowse

i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect() but, for now, it is going to have to use .collect() what is the size limit (memory for the driver) of RDD file that .collect() can work with? i've been scouring google-

--driver-memory allocation question

2018-04-20 Thread klrmowse

newb question... say, memory per node is 16GB for 6 nodes (for a total of 96GB for the cluster) is 16GB the max amount of memory that can be allocated to driver? (since, it is, after all, 16GB per node) Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

Re: [Spark 2.x Core] Job writing out an extra empty part-0000* file

2018-04-19 Thread klrmowse

well... it turns out, that extra part-* file goes away when i limit --num-executors to 1 or 2 (leaving it to default maxes it out, which in turn gives an extra empty part-file) i guess the test data i'm using only requires that many executors -- Sent from: http://apache-spark-user-list.1001

[Spark 2.x Core] Job writing out an extra empty part-0000* file

2018-04-16 Thread klrmowse

the spark job succeeds (and with correct output), except there is always an extra part-* file, and it is empty... i even set number of partitions to only 2 via spark-submit, but there is still a 3rd, empty, part-file that shows up. why does it do that? how to fix? Thank you -- Sent from

Re: [Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

2018-04-07 Thread klrmowse

okie, well... i'm working with a pair rdd i need to extract the values and store them somehow (maybe a simple Array??), which i later parallelize and reuse since adding to a list is a no-no, what, if any, are the other options? (Java Spark, btw) thanks -- Sent from: http://apache-spark-u

[Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

2018-04-07 Thread klrmowse

it gives null pointer exception... is there a workaround for adding to an arrayList during .foreach of an rdd? thank you -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsub

Re: Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

2018-03-17 Thread klrmowse

for clarification... .saveAsTextFile(rdd) writes to local fs, but not hdfs anyone? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

2018-03-16 Thread klrmowse

when i run a job with .setMaster(local[*]), the output is as expected... but when i run it using YARN (single node, pseudo-distributed hdfs) via spark-submit, the output is fudged - instead of key-value pairs, it only shows one value preceded by a comma, and the rest are blank what am i missing

Re: Broadcast variable size limit?

Broadcast variable size limit?

Re: [EXT] [Spark 2.x Core] .collect() size limit

[Spark 2.x Core] .collect() size limit

--driver-memory allocation question

Re: [Spark 2.x Core] Job writing out an extra empty part-0000* file

[Spark 2.x Core] Job writing out an extra empty part-0000* file

Re: [Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

[Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

Re: Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

11 matches

Site Navigation

Mail list logo

Footer information