Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
There are only 3 nodes in the HDFS cluster and when running fsck it shows the filesystem as healthy. $ hdfs fsck /user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-43/ 17/04/28 16:24:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using bui

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
The top level exception is similar to one on this Jira issue but the root Exception is different. This one says it was fixed in 1.2.0 which is what I'm using https://issues.apache.org/jira/browse/FLINK-5663 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
This is the stacktrace I'm getting when checkpointing to the HDFS. It happens like once every 3 checkpoints and I don't see this without parallelism. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 6 for operator KeyedCEPPatternOperator -> Flat Map -> Map -> Sink: Unna

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread Marcus Clendenin
I changed the max number of open files and got past this error but now I'm seeing errors that it's unable to flush the file. I am checkpointing using hdfs, should I be using local file system? Is there any better way to use the cep patterns with multiple patterns or are you suggesting creating my

Re: hadoopcompatibility not in dist

2017-04-28 Thread Till Rohrmann
Hi Flavio, I think that the /opt folder only contains optional packages which you can move into /lib in order to be loaded with your Flink cluster. What Fabian was referring to is to make it easier for the user to find this package so that he doesn't have to download it by himself. Cheers, Till ​

Re: Writing the results of the stream onto a CSV File

2017-04-28 Thread Till Rohrmann
Hi Abdul, the DataStream#writeAsText does not support a TextFormatter as argument. You either have to implement your own OutputFormat and calling DataStream#writeUsingOutputFormat or as Fabio recommended simply use DataStream#writeAsCsv. Cheers, Till ​ On Fri, Apr 28, 2017 at 11:46 AM, Fábio Dia

Re: Problems reading Parquet input from HDFS

2017-04-28 Thread Flavio Pompermaier
Hi Lukas, a colleague of mine issued a PR to update the code of that example ( https://github.com/FelixNeutatz/parquet-flinktacular/pull/2). We updated avro to 1.8.1 and the example worked fine (we didn't test on the cluster yet). Let me know if it does help.. Best, Flavio On Tue, Apr 25, 2017 a

Re: gelly scatter/gather

2017-04-28 Thread Till Rohrmann
Hi Alieh, where do you see the AskTimeoutException exactly? Maybe you can share the complete stack trace and the logs with us. Moreover, which version of Flink are you running? Cheers, Till ​ On Fri, Apr 28, 2017 at 2:13 PM, Kaepke, Marc wrote: > Hi Alieh, > > I can't solve your problem yet. B

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Perfect! And let us know how it goes! Kostas > On Apr 28, 2017, at 5:04 PM, mclendenin wrote: > > Ok, I will try using Flink 1.3 > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-CEP-Patterns-tp12871p12896.html > Sent f

Re: Fault tolerance & idempotency on window functions

2017-04-28 Thread Aljoscha Krettek
Hi, Yes, your analysis is correct: Flink will not retry for individual elements but will restore from the latest consistent checkpoint in case of failure. This also means that you can get different window results based on which element arrives first, i.e. you have a different timestamp on your o

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread Aljoscha Krettek
The problem here is that this will try to open 300 RocksDB instances on each of the TMs (depending on how the parallelism is spread between the machines this could be more or less). As the exception says, this will open too many files because each RocksDB instance has a directory with several fi

Re: Behavior of the cancel command

2017-04-28 Thread Aljoscha Krettek
Hi Jürgen, Is there missing data with respect to what should have been written at the time of the cancel or when the last checkpoint (or in that case, the savepoint) was performed. I’m asking because the cancel command is only sent out once the savepoint has been completed, as can be seen at [1]

RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
Starting ~300 CEP patterns with parallelism of 6 since there are 6 partitions on a kafka topic. Checkpoint using rocksDB to Hadoop on interval of 50 seconds. Cluster is HA with 2 JM and 5 TM. Getting following exception : java.io.IOException: Error creating ColumnFamilyHandle. at org.a

Re: Multiple CEP Patterns

2017-04-28 Thread mclendenin
Ok, I will try using Flink 1.3 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-CEP-Patterns-tp12871p12896.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Yes this is the master branch. We have not yet forked the 1.3 branch. And I do not think there is a better way and I am not sure if there can be. Apart from the memory leak that is described in the JIRA, the different NFA’s cannot share any state, so for each one the associated memory overhea

Re: Queryable State

2017-04-28 Thread Chet Masterson
 Any insight here? I've got a situation where a key value state on a task manager is being registered with the job manager, but when I try to query it, the job manager responds it doesn't know the location of the key value state...  26.04.2017, 12:11, "Chet Masterson" :After setting the logging to

Re: Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Aljoscha Krettek
Hi, Yes, basically all the exactly-once/at-least-once guarantees are not given if checkpointing does not work correctly. For example, this will also be the case when reading from Kafka and writing to Kafka. Best, Aljoscha > On 28. Apr 2017, at 15:53, Yassine MARZOUGUI > wrote: > > Hi Aljosch

Re: Multiple CEP Patterns

2017-04-28 Thread mclendenin
I do have a within clause on all the patterns and I am doing CEP.pattern on each one. On the output I am adding a Kafka sink. Since all the patterns are going to the same sink I was wondering if there was a better way to do it rather then having that overhead. For the memory issues with 1.2, I do

Re: ElasticsearchSink Serialization Error

2017-04-28 Thread Aljoscha Krettek
Hi, ResultIndexRequestBuilder is a non-static inner class. This means it has a pointer to the enclosing instance. If you make it a static inner class your code should work. Best, Aljoscha > On 28. Apr 2017, at 04:57, Vijay Srinivasaraghavan > wrote: > > Hello, > > I am seeing below error whe

Re: Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
Hi Aljoscha, Thank you for your response. I guess then I will manually rename the pending files. Does this however mean that the BucketingSink is not exactly-once as it is described is the docs, since in this case (failure of the job and failure of checkpoints) there will be duplicates? Or am I mi

Re: Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Aljoscha Krettek
Hi, Yes, your analysis is correct. The pending files are not recognised as such because they were never in any checkpointed state that could be restored. I’m afraid it’s not possible to build the sink state just from the files existing in the output folder. The reason we have state in the first

Re: Graph iteration with triplets or access to edges

2017-04-28 Thread Vasiliki Kalavri
Hi Marc, you can access the edge values inside the ScatterFunction using the getEdges() method. For an example look at SingleSourceShortestPaths [1] which sums up edge values to compute distances. I hope that helps! -Vasia. [1]: https://github.com/apache/flink/blob/master/flink-libraries/flink-g

Re: gelly scatter/gather

2017-04-28 Thread Kaepke, Marc
Hi Alieh, I can't solve your problem yet. But I work with gelly and Scatter/Gather and later GSA too. Could you publish or show me your algorithm? Best from Hamburg Marc Sent from my iPhone > On 28. Apr 2017, at 13:58, Alieh wrote: > > Hi all > > I have an iterative algorithm implemented us

gelly scatter/gather

2017-04-28 Thread Alieh
Hi all I have an iterative algorithm implemented using Gelly scatter/gather. Using 8 workers of a cluster, I encounter the error "akka.pattern.AskTimeoutException", which I think the reason is heap size. Surprisingly, using 4 workers of the same cluster, my program is executed!!! It seems tha

Re: Graph iteration with triplets or access to edges

2017-04-28 Thread Kaepke, Marc
to summarize my question: Does Flink or Gelly offers an access to the edges of a single vertex? Or: I need a VertexTriplet and not an EdgeTriplet (graph.getTriplets()) Thanks! Best, Marc > Am 27.04.2017 um 20:20 schrieb Kaepke, Marc : > > Hi everyone, > > in Gelly I use the Scatter-Gather Itera

Re: Writing the results of the stream onto a CSV File

2017-04-28 Thread Fábio Dias
Hi, Instead of use writeAsText you have a writeAsCsv https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html You can use just with the string path (like you have) or you can use the overwrite flag if it suit your needs. Best Regar

Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
Hi all, I'm have a failed job containing a BucketingSink. The last successful checkpoint was before the source started emitting data. The following checkpoints all failed due to the long timeout as I mentioned here : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-v

Writing the results of the stream onto a CSV File

2017-04-28 Thread Abdul Salam Shaikh
Hi, I am trying to write the results of my stream into a CSV format using the following code and it has compilation issues: DataStream objectStream = windowedStream.flatMap(new WindowObjectStreamTransformer()); objectStream.writeAsText("H:\\data.csv", new TextFormatter() { pub

Re: hadoopcompatibility not in dist

2017-04-28 Thread Flavio Pompermaier
I faced this problem yesterday and putting flink-hadoop-compatibility under flink/lib folder solved the problem for me. But what is the official recommendation? Should I put it into lib or opt folder? Is there any difference from a class-loading point of view? Best, Flavio On Fri, Apr 7, 2017 at

Re: Join two kafka topics to do CEP

2017-04-28 Thread tarek26
Hi Gordon, Thank you for your help, maybe i have not explained well. I have two kafka topics (tracking and rules) and I would like to join "tracking" datastream with "rules" datastream as the data arrives in the "tracking" datastream. Here the result that I expect, but without restarting the Jo

Re: Iterating over keys in state backend

2017-04-28 Thread Kostas Kloudas
Hi Ken, So you have a queue where elements are sorted by timestamp and score, and when the time (event time I suppose) passes that of the timestamp of an element, you want to fetch the element and: if the score is too low you archive it if the score is OK you emit it. If I get it right, then

Re: CEP join across events

2017-04-28 Thread Kostas Kloudas
Hi Elias, I think this is a really interesting suggestion for the case where you do not have an “accumulating” value. Because imagine that you want to accept the “next” element, if the sum of all the previous is less than Y. To have a similar syntax with an accumulator, we should add more met

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Sorry for the quick followup, but another question, in case the JIRA I sent you is not what affects your job, do your patterns have a timeout (the within() clause) ? If not, then also other parts of the system (e.g. the internal state of your NFA) may grow indefinitely. Kostas > On Apr 28, 20

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Hi! I suppose that by memory errors you mean you run out of memory, right? Are you using Flink 1.2 or the current master (upcoming Flink 1.3). The reason I am asking is because Flink 1.2 suffered from this https://issues.apache.org/jira/browse/FLINK-5174