Ability to partition logs per pipeline

2016-07-13 Thread Chawla,Sumit
Hi All Does flink provide any ability to streamline logs being generated from a pipeline. How can we keep the logs from two pipelines separate so that its easy to debug the pipeline execution (something dynamic to automatically partition the logs per pipeline) Regards Sumit Chawla

Re: Parameters to Control Intra-node Parallelism

2016-07-13 Thread Ovidiu-Cristian MARCU
Hi, I would pay attention to the memory settings such that heap+off-heap+network buffers can be served from your node’s RAM for both TMs. Also, there is some correlation between the number of buffers, parallelism and your workflow’s operators. The suggestion to be used for the numberOfBuffers d

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Geoffrey Mon
Hello, Here is the TaskManager log on pastebin: http://pastebin.com/XAJ56gn4 I will look into whether the files were created. By the way, the cluster is made with virtual machines running on BlueData EPIC. I don't know if that might be related to the problem. Thanks, Geoffrey On Wed, Jul 13, 2

how to get rid of null pointer exception in collection in DataStream

2016-07-13 Thread subash basnet
Hello all, I need to collect the centroids to find out the nearest center for each point. DataStream points = newDataStream.map(new getPoints()); DataStream *centroids* = newCentroidDataStream.map(new TupleCentroidConverter()); ConnectedIterativeStreams loop = points.iterate().withFeedbackType(Cen

Re: Data point goes missing within iteration

2016-07-13 Thread Biplob Biswas
Hi, Sorry for the late reply, was trying different stuff on my code. And from what I observed, its very weird for me. So after experimentation, I found out that when I increase the number of centroids, the number of data points forwarded decreases, when I lower the umber of centroids, the datapo

Re: HDFS to Kafka

2016-07-13 Thread Aljoscha Krettek
Hi, this does not work right now because FileInputFormat does not allow setting the "enumerateNestedFiles" field directly and the Configuration is completely ignored in Flink streaming jobs. Cheers, Aljoscha On Wed, 13 Jul 2016 at 11:06 Robert Metzger wrote: > Hi Dominique, > > In Flink 1.1 we'

How large a Flink cluster can have?

2016-07-13 Thread Yan Chou Chen
FAQ[1], mailing list[2], and the powered by page[3] doesn't find related information. Just out of curiosity, what is the current largest Flink cluster size running in production? For instance, long time ago yahoo [4] ran 4k hadoop nodes in production. Thanks [1]. https://flink.apache.org/faq.html

Re: Writing in flink clusters

2016-07-13 Thread Chesnay Schepler
Hello, Is that the complete error message? I'm a bit surprised it does not explicitly name any file name. If it really doesn't we should change that. Regards, Chesnay Schepler On 13.07.2016 15:35, Alexis Gendronneau wrote: Hi Roy, Have you looked on the nodes in charge of sink tasks ? You s

Re: Writing in flink clusters

2016-07-13 Thread Debaditya Roy
Thanks . Will check. :-) On Wed, Jul 13, 2016 at 3:35 PM, Alexis Gendronneau wrote: > Hi Roy, > > Have you looked on the nodes in charge of sink tasks ? You should be able > to find them on flink web interface by clicking on the sink taks. If you > get the OVERWRITE error, your output is certain

Re: Writing in flink clusters

2016-07-13 Thread Alexis Gendronneau
Hi Roy, Have you looked on the nodes in charge of sink tasks ? You should be able to find them on flink web interface by clicking on the sink taks. If you get the OVERWRITE error, your output is certainly somewhere. By the way, when using distributed mode it is easier to use an output like HDFS. T

Re: countWindow custom WindowFunction

2016-07-13 Thread Ciar, David B.
Hi Vishnu, Thank you for the pointers/modified example, that was really helpful and it is working as expected now. I took another look through the documentation and found in the "Window" section for streaming data, the "Recipes for building windows" sub-section, where it shows the countWindo

Writing in flink clusters

2016-07-13 Thread Debaditya Roy
Hello users, I have written and executed a flink program in a cluster. The program was supposed to write to some text file as a sink, however I cannot find the text files in the target directory of the cluster nodes, but when I reexecute the program second time, it gives me the predictable error:

Re: countWindow custom WindowFunction

2016-07-13 Thread Vishnu Viswanath
Hi David, countWindow(size,slide) creates a GlobalWindow, not a TimeWindow. Also you have to use Tuple instead of Tuple2. class SequentialDeltaCheck extends WindowFunction[RawObservation, String, Tuple, GlobalWindow]{ def apply(key: Tuple, window: GlobalWindow, input: Iterable[RawObservation],

countWindow custom WindowFunction

2016-07-13 Thread Ciar, David B.
Hello everyone, I'm relatively new to using Apache Flink and Scala, and am just getting to grips with some of the basic functionality both provide. I've hit a wall trying to implement a custom WindowFunction over a keyed countWindow however, and hoped someone may have a pointer. The full cod

Re: Data point goes missing within iteration

2016-07-13 Thread Ufuk Celebi
Any update? On Wed, Jul 6, 2016 at 5:55 PM, Ufuk Celebi wrote: > I couldn't tell anything from the code. I would suggest to reduce it > to a minimal example with Integers where you do the same thing flow > structure wise (with simple types) and let's check that again. > > On Wed, Jul 6, 2016 at 9

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Chesnay Schepler
Hello Geoffrey, How often does this occur? Flink distributes the user-code and the python library using the Distributed Cache. Either the file is deleted right after being created for some reason, or the DC returns a file name before the file was created (which shouldn't happen, it should b

Re: custom scheduler in Flink?

2016-07-13 Thread Aljoscha Krettek
Hi, I'm afraid there is no documentation about schedulers, especially at this low level. Maybe this new design proposal would of interest for you, though: https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures In there is a link to the mailing list dis

Re: Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-07-13 Thread ANDREA SPINA
Hi everybody, increasing the akka.ask.timeout solved the second issue. Anyway that was a warning about a congestioned network. So I worked to improve the algorithm. Increasing the numberOfBuffers and the corresponding size solved the first issue, thus now I can run with the full DOP. In my case ena

Re: HDFS to Kafka

2016-07-13 Thread Robert Metzger
Hi Dominique, In Flink 1.1 we've reworked the reading of static files in the DataStream API. There is now a method for passing any FileInputFormat: readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo). I guess you can pass a FileInputFormat with the recursive enumeration enab

Re: sampling function

2016-07-13 Thread Le Quoc Do
Hi Till, I have created the JIRA: https://issues.apache.org/jira/browse/FLINK-4205 Thank you, Do On Tue, Jul 12, 2016 at 6:05 PM, Till Rohrmann wrote: > Stratified sampling would also be beneficial for the DataSet API. I think > it would be best if this method is also added to DataSetUtils or