回复:Flink batch processing fault tolerance

2017-02-16 Thread wangzhijiang999
yes, it is really a critical problem for large batch job because the unexpected failure is a common case. And we are already focusing on realizing the ideas mentioned in FLIP1, wish to contirbute to flink in months. Best, Zhijiang--发

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-16 Thread Tzu-Li (Gordon) Tai
Hi Geoffrey, Thanks for investigating and updating on this. Good to know that it is working! Just to clarify, was your series of jobs submitted to a “yarn session + regular bin/flink run”, or “per job yarn cluster”? I’m asking just to make sure of the limitations Robert mentioned. Cheers, Gordo

Re: Flink batch processing fault tolerance

2017-02-16 Thread Si-li Liu
Hi, It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again. 2017-02-17 10:56 GMT+08:00 Renjie Liu : > https://cwiki.apache.org/confluence/display/FLINK/FLIP- > 1+%3A+Fine+Grained+Recovery+from+Task+Failures > This FLIP may help. > > On Thu,

Re: Flink batch processing fault tolerance

2017-02-16 Thread Renjie Liu
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures This FLIP may help. On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev wrote: > Hi Aljoscha, > > Could you share your plans of resolving it? > > > > Best, > > Anton > > > > > > *From:* Aljoscha Krett

Re: Reliable Distributed FS support (HCFS)

2017-02-16 Thread Vijay Srinivasaraghavan
Following up on my question regarding backed Filesystem (HCFS) requirements. Appreciate any inputs. ---Regarding the Filesystem abstraction support, we are planning to use a distributed file system which complies with Hadoop Compatible File System (HCFS) standard in place of standard HDFS. Accor

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-16 Thread Geoffrey Mon
Hi Robert, Thanks for your reply. I've done some further testing and (hopefully) solved the issue; this turned out to be a red herring. After discovering that the same issue manifested itself when testing on my local machine, I found that multiple jobs can be submitted from a main() function for

Re: Akka 2.4

2017-02-16 Thread Ted Yu
Please see FLINK-3662 On Thu, Feb 16, 2017 at 9:01 AM, Dmitry Golubets wrote: > Hi, > > Can I force Flink to use Akka 2.4 (recompile if needed)? > Is it going to misbehave in a subtle way? > > > Best regards, > Dmitry >

Akka 2.4

2017-02-16 Thread Dmitry Golubets
Hi, Can I force Flink to use Akka 2.4 (recompile if needed)? Is it going to misbehave in a subtle way? Best regards, Dmitry

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2017-02-16 Thread Clifford Resnick
Hi Vinay, We found that our problems were not with RocksDb, but rather what we were throwing at it. We were working with more complex data types (e.g. Collections) and found that nearly 80% of the time was spent in serialization, so optimizing that helped a lot. But if your state is more primit

Re: Reading compressed XML data

2017-02-16 Thread Sebastian Neef
Hi Robert, sorry for the long delay. > I wonder why the decompression with the XmlInputFormat doesn't work. Did > you get any exception? I didn't get any exception, it just seems to read nothing (or at least don't match any opening/closing tags). I digged a bit into the code and found out, that

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Tzu-Li (Gordon) Tai
Good to know! On February 16, 2017 at 10:13:28 PM, Pedro Monteiro (pedro.mlmonte...@gmail.com) wrote: Dear Gordon, Thanks for your help, I think I am on the right track as of now. On the other hand, I have another question: is it possible to add sources to environments that are already execu

Re: Flink jdbc

2017-02-16 Thread Punit Tandel
Thanks for the info, At the moment i used the flink-jdbc to write the streaming data coming from kafka which i can process and write those data in postgres or mysql database configured on cluster or sandbox, However when trying to write integration tests i am using in memory H2 database which s

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Pedro Monteiro
Dear Gordon, Thanks for your help, I think I am on the right track as of now. On the other hand, I have another question: is it possible to add sources to environments that are already executing? In what I am currently developing, I need to add new sources as they arrive to my system. I will wai

Re: Flink jdbc

2017-02-16 Thread Fabian Hueske
The JdbcOutputFormat was originally meant for batch jobs. It should be possible to use it for streaming jobs as well, however, you should be aware that it is not integrated with Flink checkpointing mechanism. So, you might have duplicate data in case of failures. I also don't know if or how well i

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2017-02-16 Thread vinay patil
Hi Cliff, It will be really helpful if you could share your RocksDB configuration. I am also running on c3.4xlarge EC2 instances backed by SSD's . I had tried with FLASH_SSD_OPTIMIZED option which works great but somehow the pipeline stops in between and the overall processing time increases, I

Re: Clarification: use of AllWindowedStream.apply() function

2017-02-16 Thread nsengupta
Thanks, Aljoscha for the clarification. I understand that instead of using a flatMap() in the way I am using, I am better off using : * a fold (init, fold_func, window_func) first and then * map to a different type of my choice, inside the window_func, parameterised above I hope I am correct. If

Cartesian product over windows

2017-02-16 Thread Ioannis Kontopoulos
Hello everyone, Given a stream of events (each event has a timestamp and a key), I want to create all possible combinations of the keys in a window (sliding, event time) and then process those combinations in parallel. For example, if the stream contains events with keys 1,2,3,4 in a given window

Re: Checkpointing with RocksDB as statebackend

2017-02-16 Thread vinay patil
I think its more of related to RocksDB, I am also not aware about RocksDB but reading the tuning guide to understand the important values that can be set Regards, Vinay Patil On Thu, Feb 16, 2017 at 5:48 PM, Stefan Richter [via Apache Flink User Mailing List archive.] wrote: > What kind of prob

Re: Checkpointing with RocksDB as statebackend

2017-02-16 Thread Stefan Richter
What kind of problem are we talking about? S3 related or RocksDB related. I am not aware of problems with RocksDB per se. I think seeing logs for this would be very helpful. > Am 16.02.2017 um 11:56 schrieb Aljoscha Krettek : > > +Stefan Richter and +Stephan

Re: Checkpointing with RocksDB as statebackend

2017-02-16 Thread vinay patil
Hi Aljoscha, Which problem you are referring to ? I am seeing unexpected stalls in between for a long time. Also one thing I have observed with FLASH_SSD_OPTIMIZED option is that it is using more amount of physical memory and not flushing the data to storage. I am trying to figure out the best

RE: Flink batch processing fault tolerance

2017-02-16 Thread Anton Solovev
Hi Aljoscha, Could you share your plans of resolving it? Best, Anton From: Aljoscha Krettek [mailto:aljos...@apache.org] Sent: Thursday, February 16, 2017 2:48 PM To: user@flink.apache.org Subject: Re: Flink batch processing fault tolerance Hi, yes, this is indeed true. We had some plans for ho

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Pedro Monteiro
Thank you again for your prompt response. I will give it a try and will come back to you. *Pedro Lima Monteiro* On 16 February 2017 at 10:20, Tzu-Li (Gordon) Tai wrote: > I would recommend checking out the Flink RabbitMQ Source for examples: > https://github.com/apache/flink/blob/master/flink-

Re: Log4J

2017-02-16 Thread Robert Metzger
I've also (successfully) tried running Flink with log4j2 to connect it to greylog2. If I remember correctly, the biggest problem was "injecting" the log4j2 properties file into the classpath (when running Flink on YARN). Maybe you need to put the file into the lib/ folder, so that it is shipped to

Re: Checkpointing with RocksDB as statebackend

2017-02-16 Thread Aljoscha Krettek
+Stefan Richter and +Stephan Ewen could this be the same problem that you recently saw when working with other people? On Wed, 15 Feb 2017 at 17:23 Vinay Patil wrote: > Hi Guys, > > Can anyone please help me with this issue > > Regards, > Vinay Patil > > On Wed, Feb 15, 2017 at 6:17 PM, Vinay

Re: Log4J

2017-02-16 Thread Stephan Ewen
Hi! The bundled log4j version (1.x) does not support that. But you can replace the logging jars with those of a different framework (like log4j 2.x), which supports changing the configuration without stopping the application. You don't need to rebuild flink, simply replace two jars in the "lib" f

Re: Flink batch processing fault tolerance

2017-02-16 Thread Aljoscha Krettek
Hi, yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API. Best, Aljoscha On Wed, 15 Feb 2017 at 0

Re: Clarification: use of AllWindowedStream.apply() function

2017-02-16 Thread Aljoscha Krettek
Hi, you would indeed use apply(), or better fold(, , ) to map the result of folding your window to some other data type. If you will, a WindowFunction allows "mapping" the result of your windowing to a different type. Best, Aljoscha On Wed, 15 Feb 2017 at 06:14 nsengupta wrote: > I have gone th

Re: How important is 'registerType'?

2017-02-16 Thread Aljoscha Krettek
Hi, are you changing anything on your job between performing the savepoint and restoring the savepoint? Flink upgrade, Job upgrade, changing Kryo version, changing order in which you register Kryo serialisers? Best, Aljoscha On Fri, 10 Feb 2017 at 18:26 Dmitry Golubets wrote: > The docs say tha

Re: Flink Job Exception

2017-02-16 Thread Aljoscha Krettek
Hi Govindarajan, the Jira issue that you linked to and which Till is currently fixing will only fix the obvious type mismatch in the Akka messages. There is also an underlying problem that causes this message to be sent in the first place. In the case of the user who originally created the Jira iss

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Tzu-Li (Gordon) Tai
I would recommend checking out the Flink RabbitMQ Source for examples: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-rabbitmq/src/main/java/org/apache/flink/streaming/connectors/rabbitmq/RMQSource.java For your case, you should extend the `RichSourceFunction` which p

Re: Flink jdbc

2017-02-16 Thread Punit Tandel
Yes i have been following the tutorials and reading from H2 and writing to H2 works fine, But problem here is data coming from kafka and writing them to h2 engine does not seems to work and cant see any error thrown while writing into in memory H2 database, So couldnt say whats the error and w

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Pedro Monteiro
Dear Tzu-Li, Thank you so much for your prompt response. Lets assume I have a variable, in Java, env which is my StreamExecutionEnvironment. When I go ahead and attempt to do: > ​env.addSource(); > ​It requests an implementation of a Source Function interface: ​ > env.addSource(new SourceFunct

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Tzu-Li (Gordon) Tai
Hi Pedro! This is definitely possible, by simply writing a Flink `SourceFunction` that uses MongoDB clients to fetch the data. It should be straightforward and works well with MongoDB’s cursor APIs. Could you explain a bit which part in particular you were stuck with? Cheers, Gordon On Februa

Streaming data from MongoDB using Flink

2017-02-16 Thread Pedro Monteiro
Good morning, I am trying to get data from MongoDB to be analysed in Flink. I would like to know if it is possible to stream data from MongoDB into Flink. I have looked into Flink's source function to add in the addSource method of the StreamExecutionEnvironment but I had no luck. Can anyone help

Re: Flink Job Exception

2017-02-16 Thread Till Rohrmann
Hi Govindarajan, there is a pending PR for this issue. I think I can merge it today. Cheers, Till On Thu, Feb 16, 2017 at 12:50 AM, Govindarajan Srinivasaraghavan < govindragh...@gmail.com> wrote: > Hi All, > > I'm trying to run a streaming job with flink 1.2 version and there are 3 > task mana