Accumulators storage path

2020-12-10 Thread Hanan Yehudai
I am having all the Accumulators store their data on /tmp - as this is the default. when running on docker - this is mapped on my VM’s “/” partition. a lot of accumulatos – cause low disk util => pods are evicted. Is there a way to set the Accumulators persistence to a different path then

Flink behavior as a slow consumer - out of Heap MEM

2019-11-25 Thread Hanan Yehudai
HI , I am trying to do some performance test to my flink deployment. I am implementing an extremely simplistic use case I built a ZMQ Source The topology is ZMQ Source - > Mapper- > DIscardingSInk ( a sink that does nothing ) Data is pushed via ZMQ at a very high rate. When the incoming rate f

RE: SQL for Avro GenericRecords on Parquet

2019-11-18 Thread Hanan Yehudai
recordType_ from ParquetTable where id > 22 "); DataSet result = batchTableEnvironment.toDataSet(tab, Row.class); result.print(); } } From: Peter Huang Sent: Monday, November 18, 2019 7:22 PM To: dev Cc: user@flink.apache.org Subject: Re: SQL for Avro Gener

SQL for Avro GenericRecords on Parquet

2019-11-18 Thread Hanan Yehudai
I have tried to persist Generic Avro records in a parquet file and then read it via ParquetTablesource – using SQL. Seems that the SQL I not executed properly ! The persisted records are : Id , type 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 2

is Flink a database ?

2019-11-04 Thread Hanan Yehudai
This seems like a controversial subject.. on purpose 😊 I have my data lake in parquet files – should I use Flink batch mode to query historical batch ad Hoc queries ? or should I use a dedicated “database” eg Drill / Dremio / Hiveand their likes ? what advantage will Flink give me f

RE: Join with slow changing dimensions/ streams

2019-09-05 Thread Hanan Yehudai
Thanks Fabian. is there any advantage using broadcast state VS using just CoMap function on 2 connected streams ? From: Fabian Hueske Sent: Thursday, September 5, 2019 12:59 PM To: Hanan Yehudai Cc: flink-u...@apache.org Subject: Re: Join with slow changing dimensions/ streams Hi, Flink

Join with slow changing dimensions/ streams

2019-09-02 Thread Hanan Yehudai
I have a very common use case -enriching the stream with some dimension tables. e.g the events stream has a SERVER_ID , and another files have the LOCATION associated with e SERVER_ID. ( a dimension table csv file) in SQL I would simply join. but hen using Flink stream API , as far

RE: tumbling event time window , parallel

2019-09-02 Thread Hanan Yehudai
Im not sure what you mean by use process function and not window process function , as the window operator takes in a windowprocess function.. From: Fabian Hueske Sent: Monday, August 26, 2019 1:33 PM To: Hanan Yehudai Cc: user@flink.apache.org Subject: Re: tumbling event time window

RE: tumbling event time window , parallel

2019-08-26 Thread Hanan Yehudai
WM will be the highest EVENT_TIME on my set of files.. thanks From: Fabian Hueske Sent: Monday, August 26, 2019 12:38 PM To: Hanan Yehudai Cc: user@flink.apache.org Subject: Re: tumbling event time window , parallel Hi, The paths of the files to read are distributed across all reader

RE: tumbling event time window , parallel

2019-08-26 Thread Hanan Yehudai
use a ContinuousEventTimeTrigger to make sure the window is calculated ? and got the processing to trigger multiple times so I’m not sure exactly how this type of trigger works.. Thanks From: Fabian Hueske Sent: Monday, August 26, 2019 11:06 AM To: Hanan Yehudai Cc: user

tumbling event time window , parallel

2019-08-25 Thread Hanan Yehudai
I have an issue with tumbling windows running in parallel. I run a Job on a set of CSV files. When the parallelism is set to 1. I get the proper results. While it runs in parallel. I get no output. Is it due to the fact the parallel streams take the MAX(watermark) from all the parallel sou

RE: monitor finished files on a Continues Reader

2019-05-20 Thread Hanan Yehudai
It helps ! thank you 😊 From: Aljoscha Krettek Sent: 20 May 2019 12:45 To: Hanan Yehudai Cc: user@flink.apache.org Subject: Re: monitor finished files on a Continues Reader Hi, I think what you’re trying to achieve is not possible with the out-of-box file source. The problem is that it is

monitor finished files on a Continues Reader

2019-05-20 Thread Hanan Yehudai
Hi im looking for a way to delete / rename files that are done loading.. im using the env.readFile , monitoring a directory for all new files, once files are done with I would like to delete it. Is there a way to monitor the closed splits in the continues reader ? is there an different way t