submitting dependencies

2018-06-26 Thread amin mohebbi
Could you please help me to understand how I should submit my spark application ? I have used this connector (pygmalios/reactiveinflux-spark) to connect spark with influxdb.  | | | | | | | | | | | pygmalios/reactiveinflux-spark reactiveinflux-spark - Connector between Spark and Infl

Re: RepartitionByKey Behavior

2018-06-26 Thread Chawla,Sumit
Thanks everyone. As Nathan suggested, I ended up collecting the distinct keys first and then assigning Ids to each key explicitly. Regards Sumit Chawla On Fri, Jun 22, 2018 at 7:29 AM, Nathan Kronenfeld < nkronenfeld@uncharted.software> wrote: > On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit

Re: Emit Custom metrics in Spark Structured Streaming job

2018-06-26 Thread subramgr
I am planning to send these metrics to our KairosDB. Let me know if there are any examples that I can take a look -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spar

Emit Custom metrics in Spark Structured Streaming job

2018-06-26 Thread subramgr
In our Spark Structured Streaming job we listen to Kafka and we filter out some messages which we feel are malformed. Currently we log that information using the LOGGER. Is there a way to emit some kind of metrics for each time such a malformed message is seen in Structured Streaming ? Thanks

Re: [Spark Streaming] Spark Streaming with S3 vs Kinesis

2018-06-26 Thread Steve Loughran
On 25 Jun 2018, at 23:59, Farshid Zavareh mailto:fhzava...@gmail.com>> wrote: I'm writing a Spark Streaming application where the input data is put into an S3 bucket in small batches (using Database Migration Service - DMS). The Spark application is the only consumer. I'm considering two poss

Re: the best tool to interact with Spark

2018-06-26 Thread ayan guha
Depends on what are you trying to do. I found zeppelin an excellent option to interactively run queries and code On Tue, Jun 26, 2018 at 10:21 PM, Donni Khan < prince.don...@googlemail.com.invalid> wrote: > Hi all, > > What is the best tool to interact easly with Spark? > > Thank you, > Donni >

Re: Increase no of tasks

2018-06-26 Thread Lalwani, Jayesh
You can use repartition method of Dataframe to change the number of partitions https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.sql.Dataset@repartition(numPartitions:Int):org.apache.spark.sql.Dataset[T] On 6/22/18, 3:04 PM, "pratik4891" wrote: It's default , I have

the best tool to interact with Spark

2018-06-26 Thread Donni Khan
Hi all, What is the best tool to interact easly with Spark? Thank you, Donni

Re: [Spark Streaming] Measure latency

2018-06-26 Thread Gerard Maas
Hi Daniele, A pragmatic approach to do that would be to execute the computations in the scope of a foreachRDD, surrounded by wall-clock timers. For example: dstream.foreachRDD{ rdd => val t0 = System.currentTimeMillis() val aggregates = rdd. // make sure you get a result here, not another

[Spark Streaming] Measure latency

2018-06-26 Thread Daniele Foroni
Hi all, I am using spark streaming and I need to evaluate the latency of the standard aggregations (avg, min, max, …) provided by the spark APIs. Any way to do it in the code? Thanks in advance, --- Daniele