Fwd: All inclusive uber-jar

2016-04-04 Thread vetal king
-- Forwarded message -- From: vetal king Date: Mon, Apr 4, 2016 at 8:59 PM Subject: Re: All inclusive uber-jar To: Mich Talebzadeh Not sure how to create uber jar using sbt, but this is how you can do it using maven. org.apache.maven.plugins

Re: Unable to set cores while submitting Spark job

2016-03-31 Thread vetal king
to allow Spark applications to >> use on the machine (default: all available); only on worker >> >> bq. sc.getConf().set() >> >> I think you should use this pattern (shown in >> https://spark.apache.org/docs/latest/spark-standalone.html): >> >> val conf

Unable to set cores while submitting Spark job

2016-03-30 Thread vetal king
Hi all, While submitting Spark Job I am am specifying options --executor-cores 1 and --driver-cores 1. However, when the job was submitted, the job used all available cores. So I tried to limit the cores within my main function sc.getConf().set("spark.cores.max", "1"); however it still use

Re: is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread vetal king
Prateek It's possible to submit spark application from outside application. If you are using java then use ProcessBuilder and execute sparksubmit. There are two other options which i have not used. There is some spark submit server and spark also provides REST api to submit job but i don't have m

Re: Problem using saveAsNewAPIHadoopFile API

2016-03-25 Thread vetal king
ame on each > worker for that rdd, and each worker is handling a different partition > which will be reflected on the filename, so no data will be overwriting. In > fact this is what saveAsNewHadoopFile on a DStream is doing as far as I > recall > > On Fri, 25 Mar 2016, 11:22 vet

Re: Problem using saveAsNewAPIHadoopFile API

2016-03-25 Thread vetal king
extOutPutFormat in > saveAsNewAPIHadoopFile(). > > Regards, > Surendra M > > -- Surendra Manchikanti > > On Tue, Mar 22, 2016 at 10:26 AM, vetal king wrote: > >> We are using Spark 1.4 for Spark Streaming. Kafka is data source for the >> Spark Stream.

Re: Problem using saveAsNewAPIHadoopFile API

2016-03-25 Thread vetal king
, Sebastian Piu wrote: > As you said, create a folder for each different minute, you can use the > rdd.time also as a timestamp. > > Also you might want to have a look at the window function for the batching > > > On Tue, 22 Mar 2016, 17:43 vetal king, wrote: > >> Hi Cod

Re: Problem using saveAsNewAPIHadoopFile API

2016-03-22 Thread vetal king
files issue. > > On Tue, Mar 22, 2016 at 12:26 PM, vetal king wrote: > > We are using Spark 1.4 for Spark Streaming. Kafka is data source for the > > Spark Stream. > > > > Records are published on Kafka every second. Our requirement is to store > > recor

Problem using saveAsNewAPIHadoopFile API

2016-03-22 Thread vetal king
We are using Spark 1.4 for Spark Streaming. Kafka is data source for the Spark Stream. Records are published on Kafka every second. Our requirement is to store records published on Kafka in a single folder per minute. The stream will read records every five seconds. For instance records published