from:"Kostas Sakellis"

Re: Question about Spark best practice when counting records.

2015-02-27 Thread Kostas Sakellis

Hey Darin, Record count metrics are coming in Spark 1.3. Can you wait until it is released? Or do you need a solution in older versions of spark. Kostas On Friday, February 27, 2015, Darin McBeath wrote: > I have a fairly large Spark job where I'm essentially creating quite a few > RDDs, do se

Re: textFile partitions

2015-02-09 Thread Kostas Sakellis

The partitions parameter to textFile is the "minPartitions". So there will be at least that level of parallelism. Spark delegates to Hadoop to create the splits for that file (yes, even for a text file on disk and not hdfs). You can take a look at the code in FileInputFormat - but briefly it will c

Re: Whether standalone spark support kerberos?

2015-02-05 Thread Kostas Sakellis

Standalone mode does not support talking to a kerberized HDFS. If you want to talk to a kerberized (secure) HDFS cluster i suggest you use Spark on Yarn. On Wed, Feb 4, 2015 at 2:29 AM, Jander g wrote: > Hope someone helps me. Thanks. > > On Wed, Feb 4, 2015 at 6:14 PM, Jander g wrote: > >> We

Re: How many stages in my application?

2015-02-05 Thread Kostas Sakellis

Yes, there is no way right now to know how many stages a job will generate automatically. Like Mark said, RDD#toDebugString will give you some info about the RDD DAG and from that you can determine based on the dependency types (Wide vs. narrow) if there is a stage boundary. On Thu, Feb 5, 2015 at

Re: Reg Job Server

2015-02-05 Thread Kostas Sakellis

On Thu, Feb 5, 2015 at 9:03 PM, Deep Pradhan wrote: > I read somewhere about Gatling. Can that be used to profile Spark jobs? > > On Fri, Feb 6, 2015 at 10:27 AM, Kostas Sakellis > wrote: > >> Which Spark Job server are you talking about? >> >> On Thu, Feb 5, 20

Re: spark driver behind firewall

2015-02-05 Thread Kostas Sakellis

Yes, the driver has to be able to accept incoming connections. All the executors connect back to the driver sending heartbeats, map status, metrics. It is critical and I don't know of a way around it. You could look into using something like the https://github.com/spark-jobserver/spark-jobserver th

Re: Reg Job Server

2015-02-05 Thread Kostas Sakellis

Which Spark Job server are you talking about? On Thu, Feb 5, 2015 at 8:28 PM, Deep Pradhan wrote: > Hi, > Can Spark Job Server be used for profiling Spark jobs? >

Re: Spark Job running on localhost on yarn cluster

2015-02-05 Thread Kostas Sakellis

Kundan, So I think your configuration here is incorrect. We need to adjust memory and #executors. So for your case you have: Cluster setup 5 nodes 16gb RAM 8 cores. The number of executors should be the total number of nodes in your cluster - in your case 5. As for --num-executor-cores it should

Re: Yarn Driver OOME (Java heap space) when executors request map output locations

2014-09-09 Thread Kostas Sakellis

Hey, If you are interested in more details there is also a thread about this issue here: http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html Kostas On Tue, Sep 9, 2014 at 3:01 PM, jbeynon wrote: > Thanks Marcelo, that lo

Re: Question about Spark best practice when counting records.

Re: textFile partitions

Re: Whether standalone spark support kerberos?

Re: How many stages in my application?

Re: Reg Job Server

Re: spark driver behind firewall

Re: Reg Job Server

Re: Spark Job running on localhost on yarn cluster

Re: Yarn Driver OOME (Java heap space) when executors request map output locations

9 matches

Site Navigation

Mail list logo

Footer information