Hi Wayne, Tnks for reply. I did raise the thread max before posting, based on your previous comment on another post using ulimit -n 2048. That seemed to have helped on the out of memory issue.
I'm curious if this is standard procedure for scaling a spark node's resources vertically or is it just a quick workaround. I would expect the Spark standalone master to have these settings exposed on some configuration file. The second item I'm referring to is the trickiest since only occurs (empty data!) when I increase the number of worker threads Local[N]. I don't see a real gain on increasing the number of threads, actually seems that the performance degrades as it seems I get threads waiting for others to finish to return processed data. As a general statement we could say that for small sized RDDs a high number of threads could be a problem. You agree? tnks, Rod -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-on-load-run-How-to-increase-single-node-capacity-tp6953p7096.html Sent from the Apache Spark User List mailing list archive at Nabble.com.