-- Forwarded message --
From: vetal king
Date: Mon, Apr 4, 2016 at 8:59 PM
Subject: Re: All inclusive uber-jar
To: Mich Talebzadeh
Not sure how to create uber jar using sbt, but this is how you can do it
using maven.
org.apache.maven.plugins
to allow Spark applications to
>> use on the machine (default: all available); only on worker
>>
>> bq. sc.getConf().set()
>>
>> I think you should use this pattern (shown in
>> https://spark.apache.org/docs/latest/spark-standalone.html):
>>
>> val conf
Hi all,
While submitting Spark Job I am am specifying options --executor-cores 1
and --driver-cores 1. However, when the job was submitted, the job used all
available cores. So I tried to limit the cores within my main function
sc.getConf().set("spark.cores.max", "1"); however it still use
Prateek
It's possible to submit spark application from outside application. If you
are using java then use ProcessBuilder and execute sparksubmit.
There are two other options which i have not used. There is some spark
submit server and spark also provides REST api to submit job but i don't
have m
ame on each
> worker for that rdd, and each worker is handling a different partition
> which will be reflected on the filename, so no data will be overwriting. In
> fact this is what saveAsNewHadoopFile on a DStream is doing as far as I
> recall
>
> On Fri, 25 Mar 2016, 11:22 vet
extOutPutFormat in
> saveAsNewAPIHadoopFile().
>
> Regards,
> Surendra M
>
> -- Surendra Manchikanti
>
> On Tue, Mar 22, 2016 at 10:26 AM, vetal king wrote:
>
>> We are using Spark 1.4 for Spark Streaming. Kafka is data source for the
>> Spark Stream.
, Sebastian Piu
wrote:
> As you said, create a folder for each different minute, you can use the
> rdd.time also as a timestamp.
>
> Also you might want to have a look at the window function for the batching
>
>
> On Tue, 22 Mar 2016, 17:43 vetal king, wrote:
>
>> Hi Cod
files issue.
>
> On Tue, Mar 22, 2016 at 12:26 PM, vetal king wrote:
> > We are using Spark 1.4 for Spark Streaming. Kafka is data source for the
> > Spark Stream.
> >
> > Records are published on Kafka every second. Our requirement is to store
> > recor
We are using Spark 1.4 for Spark Streaming. Kafka is data source for the
Spark Stream.
Records are published on Kafka every second. Our requirement is to store
records published on Kafka in a single folder per minute. The stream will
read records every five seconds. For instance records published