Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-03 Thread Fabian Hueske
Hi, this should be covered here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html#watermarks-in-parallel-streams Best, Fabian Am Do., 2. Mai 2019 um 17:48 Uhr schrieb an0 : > This explanation is exactly what I'm looking for, thanks! Is such an > important rule doc

Re: Timestamp and key preservation over operators

2019-05-03 Thread Fabian Hueske
Hi Averell, Yes, timestamps and watermarks do not (completely) move together. The watermark should always be lower than the timestamps of the currently processed records. Otherwise, the records might be processed as late records (depending on the logic). The easiest way to check the timestamp of

Re: Filter push-down not working for a custom BatchTableSource

2019-05-03 Thread Fabian Hueske
Hi Josh, The code looks good to me. This seems to be a bug then. It's strange that it works for ORC. Would you mind opening a Jira ticket and maybe a simple reproducable code example? Thank you, Fabian Am Do., 2. Mai 2019 um 18:23 Uhr schrieb Josh Bradt : > Hi Fabian, > > Thanks for your reply

Re: Timestamp and key preservation over operators

2019-05-03 Thread Averell
Thank you Fabian. One more question from me on this topic: as I send out early messages in my window function, the timestamp assigned by window function (to the end-time of the window) is not like my expectation. I want it to be the time of the (last) message that triggered the output. Is there a

Re: Timestamp and key preservation over operators

2019-05-03 Thread Fabian Hueske
The window operator cannot configured to use the max timestamp of the events in the window as the timestamp of the output record. The reason is that such a behavior can produce late records. If you want to do that, you have to track the max timestamp and assign it yourself with a timestamp assigne

RocksDB native checkpoint time

2019-05-03 Thread Gyula Fóra
Hi! Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots) It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a m

Re: ClassNotFoundException on remote cluster

2019-05-03 Thread Chesnay Schepler
Historically spring applications are not interacting well with Flink; we've had people run in these issues multiple times on the mailing lists. Unfortunately, I don't believe any committer has any real experience in this area; I'm afraid I can neither help you myself nor refer you to someone th

Re: RocksDB native checkpoint time

2019-05-03 Thread Piotr Nowojski
Hi Gyula, Have you read our tuning guide? https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb Synchronous part is mostly about flushing d

update the existing Keyed value state

2019-05-03 Thread Selvaraj chennappan
Hi Users, We want to have a real time aggregation (KPI) . we are maintaining aggregation counters in the keyed value state . key could be customer activation date and type. Lot of counters are maintained against that key. If we want to add one more counter for the existing keys which is in the s

Re: RocksDB native checkpoint time

2019-05-03 Thread Stefan Richter
Hi, out of curiosity, does it happen with jobs that have a large number of states (column groups) or also for jobs with few column groups and just “big state”? Best, Stefan > On 3. May 2019, at 11:04, Piotr Nowojski wrote: > > Hi Gyula, > > Have you read our tuning guide? > https://ci.apache

Re: DateTimeBucketAssigner using Element Timestamp

2019-05-03 Thread Piotr Nowojski
Hi Peter, It sounds like this should work, however my question would be do you want exactly-once processing? If yes, then you would have to somehow know which exact events needs re-processing or deduplicate them somehow. Keep in mind that in case of an outage in the original job, you probably w

Re: update the existing Keyed value state

2019-05-03 Thread Piotr Nowojski
Hi, This might be tricky. There are some on-going efforts [1] and 3rd party project [2] that allow you to read save point, modify it and write back the new modified save point from which you can restore. Besides those, you might be able to modify the code of your aggregators, to initialise the

Re: RocksDB native checkpoint time

2019-05-03 Thread Gyula Fóra
Thanks Piotr for the tips we will play around with some settings. @Stefan It is a few columns but a lot of rows Gyula On Fri, May 3, 2019 at 11:43 AM Stefan Richter wrote: > Hi, > > out of curiosity, does it happen with jobs that have a large number of > states (column groups) or also for jobs

Re: RocksDB native checkpoint time

2019-05-03 Thread Konstantin Knauf
Hi Gyula, I looked into this a bit recently as well and did some experiments (on my local machine). The only parameter that significantly changed anything in this setup was reducing the total size of the write buffers (number or size memtables). I was not able to find any online resources on the p

Re: Timestamp and key preservation over operators

2019-05-03 Thread Averell
Thank you Fabian. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Avro SerializationSchema for Confluent Registry

2019-05-03 Thread PoolakkalMukkath, Shakir
Hi Till, Is there a Serialization schema for Kafka Producer when using Confluent Registry ? I am trying to publish to a topic which uses confluent registry and Avro schema. If there is one, please point me. Otherwise, what are the alternatives to do this ? Thanks Shakir

Re: Avro SerializationSchema for Confluent Registry

2019-05-03 Thread Dawid Wysakowicz
Hi Shakir, There is no out of the box Serialization schema that uses Confluent Registry. There is an open PR[1] that tries to implement that. It is a work in progress though. Recently the issue was picked up by another contributor (Dominik) that wanted to take it over. You may try to reuse some o

CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
Hi, Back to my story about enriching two different streams with data from one (slow stream) using Flink's low lever functions like CoProcessFunction (mentioned in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoFlatMapFunction-with-more-than-two-input-streams-td2

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Piotr Nowojski
Hi Averell, I will be referring to your original two options: 1 (duplicating stream_C) and 2 (multiplexing stream_A and stream_B). Both of them could be expressed using Temporal Table Join. You could multiplex stream_A and stream_B in Table API, temporal table join them with stream_C and then

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-03 Thread an0
Thanks, but it does't seem covering this rule: --- Quote Watermarks are generated at, or directly after, source functions. Each parallel subtask of a source function usually generates its watermarks independently. These watermarks define the event time at that particular parallel source. As the

Ask about submitting multiple stream jobs to Flink

2019-05-03 Thread Rad Rad
Hi, I have a jar file which aims to submit multiple Flink Stream jobs. When the program submits the first one successfully to the Flink, my program can't submit the second one even it can't go to the second code line. How can I fix this problem to submit different stream jobs at regular interv

Re: [EXTERNAL] Re: Avro SerializationSchema for Confluent Registry

2019-05-03 Thread PoolakkalMukkath, Shakir
Thanks Dawid. That helps From: Dawid Wysakowicz Date: Friday, May 3, 2019 at 9:26 AM To: "PoolakkalMukkath, Shakir" , Till Rohrmann , "user@flink.apache.org" , Dominik Wosiński Subject: [EXTERNAL] Re: Avro SerializationSchema for Confluent Registry Hi Shakir, There is no out of the box Seri

Re: Filter push-down not working for a custom BatchTableSource

2019-05-03 Thread Josh Bradt
Hi Fabian, Thanks for taking a look. I've filed this ticket: https://issues.apache.org/jira/browse/FLINK-12399 Thanks, Josh On Fri, May 3, 2019 at 3:41 AM Fabian Hueske wrote: > Hi Josh, > > The code looks good to me. > This seems to be a bug then. > It's strange that it works for ORC. > > Wo

Re: DateTimeBucketAssigner using Element Timestamp

2019-05-03 Thread Peter Groesbeck
Thanks for the quick response Piotr, I feel like I have everything working but no files are getting written to disk. I've implemented my own BucketAssigner like so: class BackFillBucketAssigner[IN] extends BucketAssigner[IN, String] { override def getBucketId(element: IN, context: BucketAssigne

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
Thank you Piotr for the thorough answer. So you meant implementation in DataStreamAPI with cutting corners would, generally, shorter than Table Join. I thought that using Tables would be more intuitive and shorter, hence my initial question :) Regarding all the limitations with Table API that you