subject:"Streaming data to parquet"

Re: Streaming data to parquet

2020-09-14 Thread Senthil Kumar

Arvid, Jan and Ayush, Thanks for the ideas! -Kumar From: Jan Lukavský Date: Monday, September 14, 2020 at 6:23 AM To: "user@flink.apache.org" Subject: Re: Streaming data to parquet Hi, I'd like to mention another approach, which might not be as "flinkish", but rem

Re: Streaming data to parquet

2020-09-14 Thread Jan Lukavský

pache.org>> *Cc: *Marek Maj mailto:marekm...@gmail.com>>, user mailto:user@flink.apache.org>> *Subject: *Re: Streaming data to parquet Hi, Looking at the problem broadly, file size is directly tied up with how often you commit. No matter which system you use, this

Re: Streaming data to parquet

2020-09-14 Thread Arvid Heise

; > > *From: *Ayush Verma > *Date: *Friday, September 11, 2020 at 8:14 AM > *To: *Robert Metzger > *Cc: *Marek Maj , user > *Subject: *Re: Streaming data to parquet > > > > Hi, > > > > Looking at the problem broadly, file size is directly tied up with

Re: Streaming data to parquet

2020-09-11 Thread Senthil Kumar

appreciate any ideas etc. Cheers Kumar From: Ayush Verma Date: Friday, September 11, 2020 at 8:14 AM To: Robert Metzger Cc: Marek Maj , user Subject: Re: Streaming data to parquet Hi, Looking at the problem broadly, file size is directly tied up with how often you commit. No matter which

Re: Streaming data to parquet

2020-09-11 Thread Ayush Verma

Hi, Looking at the problem broadly, file size is directly tied up with how often you commit. No matter which system you use, this variable will always be there. If you commit frequently, you will be close to realtime, but you will have numerous small files. If you commit after long intervals, you

Re: Streaming data to parquet

2020-09-11 Thread Robert Metzger

Hi Marek, what you are describing is a known problem in Flink. There are some thoughts on how to address this in https://issues.apache.org/jira/browse/FLINK-11499 and https://issues.apache.org/jira/browse/FLINK-17505 Maybe some ideas there help you already for your current problem (use long checkp

Streaming data to parquet

2020-09-10 Thread Marek Maj

Hello Flink Community, When designing our data pipelines, we very often encounter the requirement to stream traffic (usually from kafka) to external distributed file system (usually HDFS or S3). This data is typically meant to be queried from hive/presto or similar tools. Preferably data sits in c

Re: Streaming data to parquet

Re: Streaming data to parquet

Re: Streaming data to parquet

Re: Streaming data to parquet

Re: Streaming data to parquet

Re: Streaming data to parquet

Streaming data to parquet

7 matches

Site Navigation

Mail list logo

Footer information