Re: TimeWindow overload?

2016-05-03 Thread Stephan Ewen
Hi Elias! There is a feature pending that uses an optimized version for aligned time windows. In that case, elements would go into a single window pane, and the full window would be composed of all panes it spans (in the case of sliding windows). That should help a lot in those cases. The default

Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Hello I executed my Flink code in eclipse and it properly generated the output by creating a folder (as specified in the string) and placing output files in them. But when I exported the project as JAR and ran the same code using ./flink run, it generated the output, but instead of creating a fol

Re: Sink - writeAsText problem

2016-05-03 Thread Fabian Hueske
Did you specify a parallelism? The default parallelism of a Flink instance is 1 [1]. You can set a different default parallelism in ./conf/flink-conf.yaml or pass a job specific parallelism with ./bin/flink using the -p flag [2]. More options to define parallelism are in the docs [3]. [1] https:/

Re: Measuring latency in a DataStream

2016-05-03 Thread Robert Schmidtke
Hi Igor, thanks for your reply. As for your first point I'm not sure I understand correctly. I'm ingesting records at a rate of about 50k records per second, and those records are fairly small. If I add a time stamp to each of them, I will have a lot more data, which is not exactly what I want. In

Re: How to perform this join operation?

2016-05-03 Thread Aljoscha Krettek
Hi Elias, thanks for the long write-up. It's interesting that it actually kinda works right now. You might be interested in a design doc that we're currently working on. I posted it on the dev list but here it is: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit

Re: Sink - writeAsText problem

2016-05-03 Thread Stephan Ewen
Hi! There is the option to always create a directory: "fs.output.always-create-directory" See https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#file-systems Greetings, Stephan On Tue, May 3, 2016 at 9:26 AM, Punit Naik wrote: > Hello > > I executed my Flink code i

Re: Scala compilation error

2016-05-03 Thread Aljoscha Krettek
There is a Scaladoc but it is not covering all packages, unfortunately. In the Scala API you can call transform without specifying a TypeInformation, it works using implicits/context bounds. On Tue, 3 May 2016 at 01:48 Srikanth wrote: > Sorry for the previous incomplete email. Didn't realize I h

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Hi Stephen, Fabian setting "fs.output.always-create-directory" to true in flink-config.yml worked! On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen wrote: > Hi! > > There is the option to always create a directory: > "fs.output.always-create-directory" > > See > https://ci.apache.org/projects/flink

Re: Sink - writeAsText problem

2016-05-03 Thread Fabian Hueske
Yes, but be aware that your program runs with parallelism 1 if you do not configure the parallelism. 2016-05-03 11:07 GMT+02:00 Punit Naik : > Hi Stephen, Fabian > > setting "fs.output.always-create-directory" to true in flink-config.yml > worked! > > On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen

Re: TimeWindow overload?

2016-05-03 Thread Aljoscha Krettek
Hi, even with the optimized operator for aligned time windows I would advice against using long sliding windows with a small slide. The system will internally create a lot of "buckets", i.e. each sliding window is treated separately and the element is put into 1,440 buckets, in your case. With a mo

Re: Insufficient number of network buffers

2016-05-03 Thread Ufuk Celebi
Hey Tarandeep, I think the failures are unrelated. Regarding the number of network buffers: https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-the-network-buffers The timeouts might occur, because the task managers are pretty loaded. I would suggest to incr

How to perform Broadcast and groupBy in DataStream like DataSet

2016-05-03 Thread subash basnet
Hello all, How could we perform *withBroadcastSet* and *groupBy* in DataStream like that of DataSet in the below KMeans code: DataSet newCentroids = points .map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids") .map(new CountAppender()).*groupBy*(0).reduce(new CentroidAccumulator()

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
Hello Fabian, we delved more moving from the input you gave us but a question arised. We always assumed that runtime operators were open for extension without modifying anything inside Flink but it looks like this is not the case and the documentation assumes that the developer is working to a con

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Yeah thanks for letting me know. On 03-May-2016 2:40 PM, "Fabian Hueske" wrote: > Yes, but be aware that your program runs with parallelism 1 if you do not > configure the parallelism. > > 2016-05-03 11:07 GMT+02:00 Punit Naik : > >> Hi Stephen, Fabian >> >> setting "fs.output.always-create-direc

Re: Creating a custom operator

2016-05-03 Thread Fabian Hueske
Hi Simone, you are right, the interfaces you extend are not considered to be public, user-facing API. Adding custom operators to the DataSet API touches many parts of the system and is not straightforward. The DataStream API has better support for custom operators. Can you explain what kind of op

Re: Measuring latency in a DataStream

2016-05-03 Thread Robert Schmidtke
After fixing the clock issue on the application level, the latency is as expected. Thanks again! Robert On Tue, May 3, 2016 at 9:54 AM, Robert Schmidtke wrote: > Hi Igor, thanks for your reply. > > As for your first point I'm not sure I understand correctly. I'm ingesting > records at a rate of

Re: TimeWindow overload?

2016-05-03 Thread Stephan Ewen
Just had a quick chat with Aljoscha... The first version of the aligned window code will still duplicate the elements, later versions should be able to get rid of that. On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek wrote: > Hi, > even with the optimized operator for aligned time windows I w

Re: Flink Iterations Ordering

2016-05-03 Thread Stephan Ewen
Hi! The order in which the elements arrive in an iteration HEAD is the order in which the last operator in the loop (the TAIL) produces them. If that is a deterministic ordering (because of a sorted reduce, for example), then you should be able to rely on the order. Otherwise, the order of elemen

Flink - start-cluster.sh

2016-05-03 Thread Punit Naik
Hi I did all the settings required for cluster setup. but when I ran the start-cluster.sh script, it only started one jobmanager on the master node. Logs are written only on the master node. Slaves don't have any logs. And when I ran a program it said: Resources available to scheduler: Number of

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
I'm not sure this is the right way to do it but we were exploring all the possibilities and this one is the more obvious. We also spent some time to study how to do it to achieve a better understanding of Flink's internals. What we want to do though is to integrate Flink with another distributed s

Re: Insufficient number of network buffers

2016-05-03 Thread Tarandeep Singh
Yes, you are right, the exception was caused as task managers were heavily loaded. I checked ganglia metrics and CPU usage was very high. I reduced parallelism and ran with 5000 buffers and didn't get any exception. Thanks, Tarandeep On Tue, May 3, 2016 at 2:19 AM, Ufuk Celebi wrote: > Hey Tara

Re: Scala compilation error

2016-05-03 Thread Srikanth
Yes, I did notice the usage of implicit in ConnectedStreams.scala. Better Scaladoc will be helpful, especially when compiler errors are not clear. Thanks On Tue, May 3, 2016 at 5:02 AM, Aljoscha Krettek wrote: > There is a Scaladoc but it is not covering all packages, unfortunately. In > the Sc

Re: How to perform Broadcast and groupBy in DataStream like DataSet

2016-05-03 Thread Stefano Baghino
I'm not sure in regards of "withBroadcastSet", but in the DataStream you "keyBy" instead of "groupBy". On Tue, May 3, 2016 at 12:35 PM, subash basnet wrote: > Hello all, > > How could we perform *withBroadcastSet* and *groupBy* in DataStream like > that of DataSet in the below KMeans code: > > D

Re: Multiple windows with large number of partitions

2016-05-03 Thread Aljoscha Krettek
Yes, please go ahead. That would be helpful. On Mon, 2 May 2016 at 21:56 Christopher Santiago wrote: > Hi Aljoscha, > > Yes, there is still a high partition/window count since I have to keyby > the userid so that I get unique users. I believe what I see happening is > that the second window wit

Re: How to perform Broadcast and groupBy in DataStream like DataSet

2016-05-03 Thread subash basnet
Hello Stefano, Thank you, I found out that just sometime ago that I could use keyBy, but I couldn't find how to set and getBroadcastVariable in datastream like that of dataset. For example in below code we get collection of *centroids* via broadcast. Eg: In KMeans.java class X extends MapFunction

how to convert datastream to collection

2016-05-03 Thread subash basnet
Hello all, Suppose I have the datastream as: DataStream> *newCentroids*; How to get collection of *newCentroids * to be able to loop as below: private Collection> *centroids*; for (Centroid cent : *centroids*) { } Best Regards, Subash Basnet

Re: how to convert datastream to collection

2016-05-03 Thread Suneel Marthi
DataStream> *newCentroids = new DataStream<>.()* *Iterator> iter = DataStreamUtils.collect(newCentroids);* *List> list = Lists.newArrayList(iter);* On Tue, May 3, 2016 at 10:26 AM, subash basnet wrote: > Hello all, > > Suppose I have the datastream as: > DataStream> *newCentroids*; > > How

Re: how to convert datastream to collection

2016-05-03 Thread Aljoscha Krettek
Hi, please keep in mind that we're dealing with streams. The Iterator might never finish. Cheers, Aljoscha On Tue, 3 May 2016 at 16:35 Suneel Marthi wrote: > DataStream> *newCentroids = new DataStream<>.()* > > *Iterator> iter = > DataStreamUtils.collect(newCentroids);* > > *List> list = Li

Re: Flink + Kafka + Scalabuff issue

2016-05-03 Thread Alexander Gryzlov
Hello, Just to follow up on this issue: after collecting some data and setting up additional tests we have managed to pinpoint the issue to the the ScalaBuff-generated code that decodes enumerations. After switching to use ScalaPB generator instead, the problem was gone. One thing peculiar about

Re: how to convert datastream to collection

2016-05-03 Thread Srikanth
Why do you want collect and iterate? Why not iterate on the DataStream itself? May be I didn't understand your use case completely. Srikanth On Tue, May 3, 2016 at 10:55 AM, Aljoscha Krettek wrote: > Hi, > please keep in mind that we're dealing with streams. The Iterator might > never finish. >

Re: How to perform this join operation?

2016-05-03 Thread Elias Levy
Till, Thanks again for putting this together. It is certainly along the lines of what I want to accomplish, but I see some problem with it. In your code you use a ValueStore to store the priority queue. If you are expecting to store a lot of values in the queue, then you are likely to be using

PLC/Scada/Sensor anomaly detection

2016-05-03 Thread Ivan
Hello! Has anyone used Flink in "production" for PLC's sanomaly detections? Any pointers/docs to check? Best regards, Iván Venzor C.

s3 checkpointing issue

2016-05-03 Thread Chen Qin
Hi there, I run a test job with filestatebackend and save checkpoints on s3 (via s3a) The job crash when checkpoint triggered. Looking into s3 directory and list objects. I found the directory is create successfully but all checkpoints directory size are empty. The host running task manager show

reading from latest kafka offset when flink starts

2016-05-03 Thread Balaji Rajagopalan
I am using the flink connector to read from a kafka stream, I ran into the problem where the flink job went down due to some application error, it was down for sometime, meanwhile the kafka queue was growing as expected no consumer to consume from the given group , and when I started the flink it s

Re: Flink - start-cluster.sh

2016-05-03 Thread Balaji Rajagopalan
What is the flink documentation you were following to set up your cluster , can you point to that ? On Tue, May 3, 2016 at 6:21 PM, Punit Naik wrote: > Hi > > I did all the settings required for cluster setup. but when I ran the > start-cluster.sh script, it only started one jobmanager on the ma