Re: Flink parquet read.write performance

2018-09-05 Thread clay4444
hi Regina I've just been using flink, and recently I've been asked to store Flink data on HDFS in parquet format. I've found several examples in GitHub and the community, but there are always bugs. I see your storage directory, and that's what I want, so I'd like to ask you to reply to me for a sl

RE: Flink parquet read.write performance

2017-08-24 Thread Newport, Billy
: Newport, Billy [Tech] Cc: Chan, Regina [Tech]; user@flink.apache.org Subject: Re: Flink parquet read.write performance Hi, The reason is that there are two (or more) different Threads doing the reading. As an illustration, consider first this case: DataSet input = ... input.map(new MapA()).map(new

RE: Flink parquet read.write performance

2017-08-24 Thread Newport, Billy
12:21 PM To: Aljoscha Krettek Cc: Newport, Billy [Tech]; Chan, Regina [Tech]; user@flink.apache.org Subject: Re: Flink parquet read.write performance Hi! The sink is merely a union of the result of the co-group and the one data source. Can't you just make to distinct pipelines out of that?

Re: Flink parquet read.write performance

2017-08-23 Thread Stephan Ewen
ly > > > > *From:* Aljoscha Krettek [mailto:aljos...@apache.org > ] > *Sent:* Saturday, August 19, 2017 1:45 AM > *To:* Chan, Regina [Tech] > *Cc:* Newport, Billy [Tech]; user@flink.apache.org > *Subject:* Re: Flink parquet read.write performance > > Hi, > > The

Re: Flink parquet read.write performance

2017-08-23 Thread Aljoscha Krettek
e datasink? > > > > Thanks, > Regina > > From: Aljoscha Krettek [mailto:aljos...@apache.org > <mailto:aljos...@apache.org>] > Sent: Friday, August 18, 2017 12:14 PM > To: Newport, Billy [Tech] > Cc: user@flink.apache.org <mailto:user@flink.apache.org

RE: Flink parquet read.write performance

2017-08-23 Thread Newport, Billy
[Tech] Cc: Newport, Billy [Tech]; user@flink.apache.org Subject: Re: Flink parquet read.write performance Hi, The Sink cannot be chained to the previous two operations because there are two operations. Chaining only works if there is one predecessor operation. Data transfer should still be

Re: Flink parquet read.write performance

2017-08-18 Thread Aljoscha Krettek
gt; To: Newport, Billy [Tech] > Cc: user@flink.apache.org > Subject: Re: Flink parquet read.write performance > > Hi Billy, > > Do you also have the data (picture) from the "Timeline" tab of the completed > job? This would give some hints about how long that o

Re: Flink parquet read.write performance

2017-08-18 Thread Aljoscha Krettek
Hi Billy, Do you also have the data (picture) from the "Timeline" tab of the completed job? This would give some hints about how long that other DataSource (with chain) was active. It might be that the sink is waiting for the other input to become online. Best, Aljoscha > On 18. Aug 2017, at