Return of Flink shading problems in 1.2.0

2017-03-16 Thread Foster, Craig
Hi: A few months ago, I was building Flink and ran into shading issues for flink-dist as described in your docs. We resolved this in BigTop by adding the correct way to build flink-dist in the do-component-build script and everything was fine after that. Now, I’m running into issues doing the s

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
Hi Tarandeep, Thanks for clarifying. For the next step, I would recommend taking a look at  https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/debugging_event_time.html  and try to find out what exactly is wrong with the watermark progression. Flink 1.2 exposes watermarks as

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tarandeep Singh
Anyone? Any suggestions what could be going wrong or what I am doing wrong? Thanks, Tarandeep On Thu, Mar 16, 2017 at 7:34 AM, Tarandeep Singh wrote: > Data is read from Kafka and yes I use different group id every time I run > the code. I have put break points and print statements to verify t

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-16 Thread Yassine MARZOUGUI
Hi Xiaogang, Indeed, the MapState is what I was looking for in order to have efficient sorted state, as it would faciliate many use cases like this one, or joining streams, etc. I searched a bit and found your contribution of MapState for the next 1.3 re

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
Due to the slight out of sequence of the log timestamps, I tried switching to a “BoundedOutOfOrdernessTimestampExtractor” and used a minute as the threshold, but I still couldn’t get the watermarks to fire. Setting break points and trying to follow the code, I Can’t see hwere the getCurrentWate

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
Actually, I think the program you shared in the first mail of this thread looks fine for the purpose you describe. Timestamp and watermark assignment works as follows: - For each records, a long timestamp is extracted (UNIX/epoch timestamp). - A watermark is a timestamp which says that no more rec

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
I have managed to discover that my presumption of log4j log file being a _guaranteed_ sequential order of time is incorrect (race conditions). So some logs are out of sequence, and I was getting _some_ Monotonic Timestamp violations. I did not discover this because somehow my local flink was n

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-16 Thread SHI Xiaogang
Hi Yassine, If I understand correctly, you are needing sorted states which unfortunately are not supported in Flink now. We have some ideas to provide such sorted states to facilitate the development of user applications. But it is still under discussion due to the concerns on back compatibility.

Re: Flink 1.2 and Cassandra Connector

2017-03-16 Thread Robert Metzger
I've created a pull request for the fix: https://github.com/apache/flink/pull/3556 It would be nice if one of the issue reporters could validate that the cassandra connector works after the fix. If it is a valid fix, I would like to include it into the upcoming 1.2.1 release. On Thu, Mar 16, 2017

Re: Flink 1.2 and Cassandra Connector

2017-03-16 Thread Robert Metzger
Yep, this is definitively a bug / misconfiguration in the build system. The cassandra client defines metrics-core as a dependency, but the shading is dropping the dependency when building the dependency reduced pom. To resolve the issue, we need to add the following line into the shading config of

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tarandeep Singh
Data is read from Kafka and yes I use different group id every time I run the code. I have put break points and print statements to verify that. Also, if I don't connect with control stream the window function works. - Tarandeep > On Mar 16, 2017, at 1:12 AM, Tzu-Li (Gordon) Tai wrote: > > H

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread vinay patil
@ Stephan, I am not using explicit Evictor in my code. I will try using the Fold function if it does not break my existing functionality :) @Robert : Thank you for your answer, yes I have already tried to set G1GC this morning using env.java.opts, it works. Which is the recommended GC for Stream

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread Robert Metzger
Yes, you can change the GC using the env.java.opts parameter. We are not setting any GC on YARN. On Thu, Mar 16, 2017 at 1:50 PM, Stephan Ewen wrote: > The only immediate workaround is to use windows with "reduce" or "fold" or > "aggregate" and not "apply". And to not use an evictor. > > The goo

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread Stephan Ewen
The only immediate workaround is to use windows with "reduce" or "fold" or "aggregate" and not "apply". And to not use an evictor. The good news is that I think we have a good way of fixing this soon, making an adjustment in RocksDB. For the Yarn / g1gc question: Not 100% sure about that - you ca

Re: Flink 1.2 and Cassandra Connector

2017-03-16 Thread Stephan Ewen
Can we improve the Flink experience here by adding this dependency directly to the cassandra connector pom.xml (so that user jars always pull it in via transitivity)? On Wed, Mar 15, 2017 at 4:09 PM, Nico wrote: > Hi @all, > > I came back to this issue today... > > @Robert: > "com/codahale/metri

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
What kind of timestamp and watermark extractor are you using? Can you share your implementation? You can have a look at the example programs (for example [1]). These can be started and debugged inside an IDE by executing the main method. If you run it in an external process, you should be able to

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
Thanks again for your reply. I've tried with both Parallel=1 through to 3. Same behavior. The log file is monotonically increasing time stamps generated through an application using log4j. Each log line is distinctly incrementing time stamps it is an 8GB file I'm using as a test case and has ab

Re: SQL + flatten (or .*) quality docs location?

2017-03-16 Thread Fabian Hueske
Hi Stu, there is only one page of documentation for the Table API and SQL [1]. I agree the structure could be improved and split into multiple pages. Regarding the flatting of a Pojo have a look at the "Built-In Functions" section [2]. If you select "SQL" and head to the "Value access functions",

Re: Questions regarding queryable state

2017-03-16 Thread Ufuk Celebi
On Thu, Mar 16, 2017 at 10:00 AM, Kathleen Sharp wrote: > Hi, > > I have some questions regarding the Queryable State feature: > > Is it possible to use the QueryClient to get a list of keys for a given State? No, this is not possible at the moment. You would have to trigger a query for each key

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
Hi Paul, since each operator uses the minimum watermark of all its inputs, you must ensure that each parallel task is producing data. If a source does not produce data, it will not increase the timestamps of its watermarks. Another challenge, that you might run into is that you need to make sure

Questions regarding queryable state

2017-03-16 Thread Kathleen Sharp
Hi, I have some questions regarding the Queryable State feature: Is it possible to use the QueryClient to get a list of keys for a given State? At the moment it is not possible to use ListState - will this ever be introduced? My first impression is that I would need one of these 2 to be able to

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
Hi Tarandeep, I haven’t looked at the rest of the code yet, but my first guess is that you might not be reading any data from Kafka at all: private static DataStream readKafkaStream(String topic, StreamExecutionEnvironment env) throws IOException { Properties properties = new Propertie

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread vinay patil
Hi Stephan, What can be the workaround for this ? Also need one confirmation : Is G1 GC used by default when running the pipeline on YARN. (I see a thread of 2015 where G1 is used by default for JAVA8) Regards, Vinay Patil On Wed, Mar 15, 2017 at 10:32 PM, Stephan Ewen [via Apache Flink User

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-16 Thread Bowen Li
There's always a tradeoff we need to make. I'm in favor of upgrading to Java 8 to bring in all new Java features. The common way I've seen (and I agree) other software upgrading major things like this is 1) upgrade for next big release without backward compatibility and notify everyone 2) maintain