Re: Streaming data to parquet

2020-09-11 Thread Ayush Verma
Hi, Looking at the problem broadly, file size is directly tied up with how often you commit. No matter which system you use, this variable will always be there. If you commit frequently, you will be close to realtime, but you will have numerous small files. If you commit after long intervals, you

Re: Using S3 as a streaming File source

2020-09-01 Thread Ayush Verma
Word of caution. Streaming from S3 is really cost prohibitive as the only way to detect new files is to continuously spam the S3 List API. On Tue, Sep 1, 2020 at 4:50 PM Jörn Franke wrote: > Why don’t you get an S3 notification on SQS and do the actions from there? > > You will probably need to

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread Ayush Verma
Hi, I would suggest you upgrade flink to 1.7.x and flink-s3-fs-hadoop to 1.7.2. You might be facing this issue: - https://issues.apache.org/jira/browse/FLINK-11496 - https://issues.apache.org/jira/browse/FLINK-11302 Kind regards Ayush Verma On Sun, Aug 18, 2019 at 6:02 PM taher koitawala

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread Ayush Verma
Hello, could you tell us the version of flink-s3-fs-hadoop library that you are using ? On Sun 18 Aug 2019 at 16:24, taher koitawala wrote: > Hi Swapnil, >We faced this problem once, I think changing checkpoint dir to hdfs > and keeping sink dir to s3 with EMRFS s3 consistency enabled so

Issue using Flink on EMR

2019-06-03 Thread Ayush Verma
Hello, We have a Flink on EMR setup following this guide . YARN, apparently changes the io.tmp.dirs property to /mnt/yarn & /mnt1/yarn. When using these directories, the flink job gets the following error. 2019-05-22 12:23:12,515 INF

Limitations in StreamingFileSink BulkFormat

2019-05-31 Thread Ayush Verma
erts on this. And if there are any potential workarounds to get the desired behaviour. Kind regards Ayush Verma

Re: Heap Problem with Checkpoints

2018-08-09 Thread Ayush Verma
Hello Piotr, I work with Fabian and have been investigating the memory leak associated with issues mentioned in this thread. I took a heap dump of our master node and noticed that there was >1gb (and growing) worth of entries in the set, /files/, in class *java.io.DeleteOnExitHook*. Almost all the