hi Regina
I've just been using flink, and recently I've been asked to store Flink data
on HDFS in parquet format. I've found several examples in GitHub and the
community, but there are always bugs. I see your storage directory, and
that's what I want, so I'd like to ask you to reply to me for a sl
: Newport, Billy [Tech]
Cc: Chan, Regina [Tech]; user@flink.apache.org
Subject: Re: Flink parquet read.write performance
Hi,
The reason is that there are two (or more) different Threads doing the reading.
As an illustration, consider first this case:
DataSet input = ...
input.map(new MapA()).map(new
12:21 PM
To: Aljoscha Krettek
Cc: Newport, Billy [Tech]; Chan, Regina [Tech]; user@flink.apache.org
Subject: Re: Flink parquet read.write performance
Hi!
The sink is merely a union of the result of the co-group and the one data
source.
Can't you just make to distinct pipelines out of that?
ly
>
>
>
> *From:* Aljoscha Krettek [mailto:aljos...@apache.org
> ]
> *Sent:* Saturday, August 19, 2017 1:45 AM
> *To:* Chan, Regina [Tech]
> *Cc:* Newport, Billy [Tech]; user@flink.apache.org
> *Subject:* Re: Flink parquet read.write performance
>
> Hi,
>
> The
e datasink?
>
>
>
> Thanks,
> Regina
>
> From: Aljoscha Krettek [mailto:aljos...@apache.org
> <mailto:aljos...@apache.org>]
> Sent: Friday, August 18, 2017 12:14 PM
> To: Newport, Billy [Tech]
> Cc: user@flink.apache.org <mailto:user@flink.apache.org
[Tech]
Cc: Newport, Billy [Tech]; user@flink.apache.org
Subject: Re: Flink parquet read.write performance
Hi,
The Sink cannot be chained to the previous two operations because there are two
operations. Chaining only works if there is one predecessor operation. Data
transfer should still be
gt; To: Newport, Billy [Tech]
> Cc: user@flink.apache.org
> Subject: Re: Flink parquet read.write performance
>
> Hi Billy,
>
> Do you also have the data (picture) from the "Timeline" tab of the completed
> job? This would give some hints about how long that o
Hi Billy,
Do you also have the data (picture) from the "Timeline" tab of the completed
job? This would give some hints about how long that other DataSource (with
chain) was active. It might be that the sink is waiting for the other input to
become online.
Best,
Aljoscha
> On 18. Aug 2017, at