Could you please help me to understand how I should submit my spark application
?
I have used this connector (pygmalios/reactiveinflux-spark) to connect spark
with influxdb.
|
|
|
| | |
|
|
|
| |
pygmalios/reactiveinflux-spark
reactiveinflux-spark - Connector between Spark and Infl
Thanks everyone. As Nathan suggested, I ended up collecting the distinct
keys first and then assigning Ids to each key explicitly.
Regards
Sumit Chawla
On Fri, Jun 22, 2018 at 7:29 AM, Nathan Kronenfeld <
nkronenfeld@uncharted.software> wrote:
> On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit
I am planning to send these metrics to our KairosDB. Let me know if there are
any examples that I can take a look
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spar
In our Spark Structured Streaming job we listen to Kafka and we filter out
some messages which we feel are malformed.
Currently we log that information using the LOGGER.
Is there a way to emit some kind of metrics for each time such a malformed
message is seen in Structured Streaming ?
Thanks
On 25 Jun 2018, at 23:59, Farshid Zavareh
mailto:fhzava...@gmail.com>> wrote:
I'm writing a Spark Streaming application where the input data is put into an
S3 bucket in small batches (using Database Migration Service - DMS). The Spark
application is the only consumer. I'm considering two poss
Depends on what are you trying to do. I found zeppelin an excellent option
to interactively run queries and code
On Tue, Jun 26, 2018 at 10:21 PM, Donni Khan <
prince.don...@googlemail.com.invalid> wrote:
> Hi all,
>
> What is the best tool to interact easly with Spark?
>
> Thank you,
> Donni
>
You can use repartition method of Dataframe to change the number of partitions
https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.sql.Dataset@repartition(numPartitions:Int):org.apache.spark.sql.Dataset[T]
On 6/22/18, 3:04 PM, "pratik4891" wrote:
It's default , I have
Hi all,
What is the best tool to interact easly with Spark?
Thank you,
Donni
Hi Daniele,
A pragmatic approach to do that would be to execute the computations in the
scope of a foreachRDD, surrounded by wall-clock timers.
For example:
dstream.foreachRDD{ rdd =>
val t0 = System.currentTimeMillis()
val aggregates = rdd.
// make sure you get a result here, not another
Hi all,
I am using spark streaming and I need to evaluate the latency of the standard
aggregations (avg, min, max, …) provided by the spark APIs.
Any way to do it in the code?
Thanks in advance,
---
Daniele
10 matches
Mail list logo