Hey Darin,
Record count metrics are coming in Spark 1.3. Can you wait until it is
released? Or do you need a solution in older versions of spark.
Kostas
On Friday, February 27, 2015, Darin McBeath
wrote:
> I have a fairly large Spark job where I'm essentially creating quite a few
> RDDs, do se
The partitions parameter to textFile is the "minPartitions". So there will
be at least that level of parallelism. Spark delegates to Hadoop to create
the splits for that file (yes, even for a text file on disk and not hdfs).
You can take a look at the code in FileInputFormat - but briefly it will
c
Standalone mode does not support talking to a kerberized HDFS. If you want
to talk to a kerberized (secure) HDFS cluster i suggest you use Spark on
Yarn.
On Wed, Feb 4, 2015 at 2:29 AM, Jander g wrote:
> Hope someone helps me. Thanks.
>
> On Wed, Feb 4, 2015 at 6:14 PM, Jander g wrote:
>
>> We
Yes, there is no way right now to know how many stages a job will generate
automatically. Like Mark said, RDD#toDebugString will give you some info
about the RDD DAG and from that you can determine based on the dependency
types (Wide vs. narrow) if there is a stage boundary.
On Thu, Feb 5, 2015 at
On Thu, Feb 5, 2015 at 9:03 PM, Deep Pradhan
wrote:
> I read somewhere about Gatling. Can that be used to profile Spark jobs?
>
> On Fri, Feb 6, 2015 at 10:27 AM, Kostas Sakellis
> wrote:
>
>> Which Spark Job server are you talking about?
>>
>> On Thu, Feb 5, 20
Yes, the driver has to be able to accept incoming connections. All the
executors connect back to the driver sending heartbeats, map status,
metrics. It is critical and I don't know of a way around it. You could look
into using something like the
https://github.com/spark-jobserver/spark-jobserver th
Which Spark Job server are you talking about?
On Thu, Feb 5, 2015 at 8:28 PM, Deep Pradhan
wrote:
> Hi,
> Can Spark Job Server be used for profiling Spark jobs?
>
Kundan,
So I think your configuration here is incorrect. We need to adjust memory
and #executors. So for your case you have:
Cluster setup
5 nodes
16gb RAM
8 cores.
The number of executors should be the total number of nodes in your cluster
- in your case 5. As for --num-executor-cores it should
Hey,
If you are interested in more details there is also a thread about this
issue here:
http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html
Kostas
On Tue, Sep 9, 2014 at 3:01 PM, jbeynon wrote:
> Thanks Marcelo, that lo