Managing Flink on Yarn programmatically

2016-05-16 Thread John Sherwood
I think this is primarily a shortcoming in my ability to grep through Scala efficiently, but are there any resources on how to programmatically spin up & administrate Flink jobs on YARN? The CLI naturally works, but I'd like to build out a service handling the nuances of job management rather than

Re: Flink Kafka Streaming - Source [Bytes/records Received] and Sink [Bytes/records sent] show zero messages

2016-05-16 Thread prateekarora
thanks for the information . I want to collect some performance metrics like throughput , latency etc .thats why looking for these data to collect some information . how can i collect some performance data in flink ? Regards Prateek On Mon, May 16, 2016 at 2:32 PM, Fabian Hueske-2 [via Apache

Re: Flink Kafka Streaming - Source [Bytes/records Received] and Sink [Bytes/records sent] show zero messages

2016-05-16 Thread Fabian Hueske
Hi Prateek, the missing numbers are an artifact from how the stats are collected. ATM, Flink does only collect these metrics for data which is sent over connections *between* Flink operators. Since sources and sinks connect to external systems (and not Flink operators), the dash board does not sho

rocksdb backend on s3 window operator checkpoint issue

2016-05-16 Thread Chen Qin
Hi there, I have been testing checkpointing on rocksdb backed by s3. Checkpoints seems successful except snapshot states of timeWindow operator on keyedStream. Here is the env setting I used env.setStateBackend(new RocksDBStateBackend(new URI("s3://backupdir/"))) The checkpoint for always fail co

Re: Flink recovery

2016-05-16 Thread Madhire, Naveen
Thanks Fabian. Actually I don’t see a .valid-length suffix file in the output HDFS folder. Can you please tell me how would I debug this issue or do you suggest anything else to solve this duplicates problem. Thank you. From: Fabian Hueske mailto:fhue...@gmail.com>> Reply-To: "user@flink.apach

Flink Kafka Streaming - Source [Bytes/records Received] and Sink [Bytes/records sent] show zero messages

2016-05-16 Thread prateekarora
Hi In my flink kafka streaming application i am fetching data from one topic and then process and sent to output topic . my application is working fine but flink dashboard shows Source [Bytes/records Received] and Sink [Bytes/records sent] is zero. Duration Name

Broadcast and read broadcast variable in DataStream

2016-05-16 Thread subash basnet
Hello all, How could I broadcast the variable in Datastream or perform similar operation so that I could read the value as in DataSet: IterativeDataSet *loop* = centroids.iterate(numIterations); DataSet *newCentroids* = points.map(new SelectNearestCenter()). *withBroadcastSet*(*loop*, "*centroids*

Re: Unable to understand datastream error message

2016-05-16 Thread subash basnet
Hello Aljoscha, For *DataSet: * IterativeDataSet *loop* = centroids.iterate(numIterations); DataSet *newCentroids* = points.map(new SelectNearestCenter()). *withBroadcastSet*(*loop*, "*centroids*") .map(new CountAppender()).groupBy(0).reduce(new CentroidAccumulator()) .map(new CentroidAverager());

Re: Unable to understand datastream error message

2016-05-16 Thread Aljoscha Krettek
There you have your explanation. A loop actually has to be a loop for it to work in Flink. On Sat, 14 May 2016 at 16:35 subash basnet wrote: > Hello, > > I had to use, > private static IterativeStream *loop*; > loop as global variable because I cannot broadcast it like that of DataSet > API in D

Re: Interesting window behavior with savepoints

2016-05-16 Thread Andrew Whitaker
Thanks Ufuk. Thanks for explaining. The reasons behind the savepoint being restored successfully kind of make sense, but it seems like the window type (count vs time) should be taken into account when restoring savepoints. I don't actually see anyone doing this, but I would expect flink to complai

What / Where / When / How questions in Spark 2.0 ?

2016-05-16 Thread Ovidiu-Cristian MARCU
Hi, We can see in [2] many interesting (and expected!) improvements (promises) like extended SQL support, unified API (DataFrames, DataSets), improved engine (Tungsten relates to ideas from modern compilers and MPP databases - similar to Flink [3]), structured streaming etc. It seems we somehow

flink snapshotting fault-tolerance

2016-05-16 Thread Stavros Kontopoulos
Hi, I was looking into the flink snapshotting algorithm details also mentioned here: http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ https://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/ http://m

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-16 Thread Stefano Bortoli
Hi Flavio, Till, do you think this can be possibly related to the serialization problem caused by 'the management' of Kryo serializer buffer when spilling on disk? We are definitely going beyond what is managed in memory with this task. saluti, Stefano 2016-05-16 9:44 GMT+02:00 Flavio Pompermaie

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-16 Thread Flavio Pompermaier
That exception showed just once, but the following happens randomly (if I re-run the job after stopping and restartign the cluster it doesn't show up usually): Caused by: java.io.IOException: Serializer consumed more bytes than the record had. This indicates broken serialization. If you are using