Re: Understand Broadcast State in Node Failure Case

2018-10-22 Thread Chengzhi Zhao
gt; > Am Mo., 22. Okt. 2018 um 19:26 Uhr schrieb Chengzhi Zhao < > w.zhaocheng...@gmail.com>: > >> Hey folks, >> >> We are trying to use the broadcast state as "Shared Rule" state to filter >> test data in our stream pipeline, the broadcast will

Understand Broadcast State in Node Failure Case

2018-10-22 Thread Chengzhi Zhao
Hey folks, We are trying to use the broadcast state as "Shared Rule" state to filter test data in our stream pipeline, the broadcast will be connected with other streams in the pipeline. I noticed on broadcast_state[1] important consideration page, it is mentioned *No RocksDB state backend* and st

Re: Writer has already been opened on BucketingSink to S3

2018-09-20 Thread Chengzhi Zhao
lem and your analysis so that we can discuss how to > proceed with it? > > Best, > Stefan > > Am 18.09.2018 um 23:16 schrieb Chengzhi Zhao : > > After checking the code, I think the issue might be related to > AvroKeyValueSinkWriter.java and led to the writer has not been c

Re: Writer has already been opened on BucketingSink to S3

2018-09-18 Thread Chengzhi Zhao
similar issue, Appreciate anyone has insights on it. Thanks! Best, Chengzhi On Mon, Sep 17, 2018 at 12:01 PM Chengzhi Zhao wrote: > Hi Flink Community, > > I am using flink 1.6.0 and I am using BucketingSink to S3. > > After the application running for a while ~ 20 mins, I g

Avro writer has already been opened

2018-07-24 Thread Chengzhi Zhao
Hi, there, I am using avro format and write data to S3, recently upgraded from Flink 1.3.2 to 1.5 and noticed the following errors as below: I am using RocksDB and checkpointDataUri is an S3 location. My writer looks like something below. val writer = new AvroKeyValueSinkWriter[String, R](proper

Re: Better way to clean up state when connect

2018-05-15 Thread Chengzhi Zhao
> Best, > Xingcan > > On May 16, 2018, at 3:25 AM, Chengzhi Zhao > wrote: > > Hi Xingcan, > > Thanks a lot for providing your inputs on the possible solutions here. Can > you please clarify on how to broadcasted in Flink? > Appreciate your help!! > > Best,

Re: Better way to clean up state when connect

2018-05-15 Thread Chengzhi Zhao
ects/flink/flink-docs- > master/dev/datastream_api.html#data-sources > [2] https://ci.apache.org/projects/flink/flink-docs- > master/dev/stream/operators/process_function.html#timers > [3] https://ci.apache.org/projects/flink/flink-docs- > master/dev/stream/operators/process_function.ht

Re: Better way to clean up state when connect

2018-05-14 Thread Chengzhi Zhao
gt; Xingcan > > > On May 12, 2018, at 12:48 AM, Chengzhi Zhao > wrote: > > > > Hi there, > > > > I have a use case to check for active ID, there are two streams and I > connect them: one has actual data (Stream A) and the other one is for > lookup purpose (S

Better way to clean up state when connect

2018-05-11 Thread Chengzhi Zhao
Hi there, I have a use case to check for active ID, there are two streams and I connect them: one has actual data (Stream A) and the other one is for lookup purpose (Stream B), I am getting Stream B as a file which includes all active ID, so inactive ID would not be show up on this list. I tried t

Re: Multiple Streams Connect Watermark

2018-04-27 Thread Chengzhi Zhao
so you'd have to implement a > custom TypeInformation and TypeSerializer to make that work. > > Best, Fabian > > 2018-04-26 20:44 GMT+02:00 Chengzhi Zhao : > >> Thanks Fabian for the explanation. >> >> If I have data with different schemas, it seems the only o

Re: Multiple Streams Connect Watermark

2018-04-26 Thread Chengzhi Zhao
st, Fabian > > 2018-04-25 20:20 GMT+02:00 Chengzhi Zhao : > >> Hi, everyone, >> >> I am trying to do some join-like pipeline using flink connect operator >> and CoProcessFunction, I have use case that I need to connect 3+ streams. >> So I am having somethin

Multiple Streams Connect Watermark

2018-04-25 Thread Chengzhi Zhao
Hi, everyone, I am trying to do some join-like pipeline using flink connect operator and CoProcessFunction, I have use case that I need to connect 3+ streams. So I am having something like this: A ===> C B ==> E D So two streams A and B connect at first with 3 h

Re: Slow watermark advances

2018-04-13 Thread Chengzhi Zhao
an make them more “close” to each other. > > Best, > Xingcan > > > On 13 Apr 2018, at 11:48 PM, Chengzhi Zhao > wrote: > > Hi, flink community, > > I had an issue with slow watermark advances and needs some help here. So > here is what happened: I have two stream

Slow watermark advances

2018-04-13 Thread Chengzhi Zhao
Hi, flink community, I had an issue with slow watermark advances and needs some help here. So here is what happened: I have two streams -- A and B, and they perform co-process to join together and A has another steam as output. A --> Output B --> (Connect A) --> Output I used BoundedOutOfOrderne

Re: Watermark Question on Failed Process

2018-04-04 Thread Chengzhi Zhao
/stream/operators/process_function.html > [2] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/stream/state/state.html#operator-state > > 2018-04-03 19:39 GMT+02:00 Chengzhi Zhao : > >> Thanks Timo for your response and the references. >> >> I

Re: Watermark Question on Failed Process

2018-04-03 Thread Chengzhi Zhao
t allow > idempotent updates. > > I hope this helps. > > Regards, > Timo > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/stream/operators/windows.html#windows > [2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/event_ > timest

Watermark Question on Failed Process

2018-04-02 Thread Chengzhi Zhao
Hello, flink community, I am using period watermark and extract the event time from each records from files in S3. I am using the `TimeLagWatermarkGenerator` as it was mentioned in flink documentation. Currently, a new watermark will be generated using processing time by fixed amount override de

Suggested way to backfill for datastream

2018-02-26 Thread Chengzhi Zhao
Hey, flink community, I have a question on backfill data and want to get some ideas on how people think. I have a stream of data using BucketingSink to S3 then to Redshift. If something changed with the logic in flink and I need to backfill some dates, for example, we are streaming data for today

Re: Question about flink checkpoint

2018-02-07 Thread Chengzhi Zhao
Thanks, Fabian, I opened an JIRA ticket and I'd like to work on it if people think this would be a improvement: https://issues.apache.org/jira/browse/FLINK-8599 Best, Chengzhi On Wed, Feb 7, 2018 at 4:17 AM, Fabian Hueske wrote: > Hi Chengzhi Zhao, > > I think this is rather an

Fwd: Question about flink checkpoint

2018-02-06 Thread Chengzhi Zhao
w the following error on no file been found. java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No such file or directory: s3a://myfile Is there a way to skip this bad file and move on? Thanks in advance. Best, Chengzhi Zhao