Hi Ken,
as far as I understood, you are using the format to overcome some short
comings in Flink. There is no need to even look at the data or even to
create it if the join would work decently.
If so, then it would make sense to keep the format, as I expect similar
issues to always appear and pro
Hi Arvid,
Thanks for following up…
> On Sep 2, 2019, at 3:09 AM, Arvid Heise wrote:
>
> Hi Ken,
>
> that's indeed a very odd issue that you found. I had a hard time to connect
> block size with S3 in the beginning and had to dig into the code. I still
> cannot fully understand why you got two
Hi Ken,
that's indeed a very odd issue that you found. I had a hard time to connect
block size with S3 in the beginning and had to dig into the code. I still
cannot fully understand why you got two different block size values from
the S3 FileSytem. Looking into Hadoop code, I found the following s
Sounds reasonable.
I am adding Arvid to the thread - IIRC he authored that tool in his
Stratosphere days. And my a stroke of luck, he is now working on Flink
again.
@Arvid - what are your thoughts on Ken's suggestions?
On Fri, Aug 30, 2019 at 8:56 PM Ken Krugler
wrote:
> Hi Stephan (switching
Hi Stephan (switching to dev list),
> On Aug 29, 2019, at 2:52 AM, Stephan Ewen wrote:
>
> That is a good point.
>
> Which way would you suggest to go? Not relying on the FS block size at all,
> but using a fix (configurable) block size?
There’s value to not requiring a fixed block size, as t