Thanks Arvid!
Will try to increase the property you recommended and will post the update.
On Sat, Jun 6, 2020, 7:33 AM Arvid Heise wrote:
> Hi Venkata,
>
> you can find them on the Hadoop AWS page (we are just using it as a
> library) [1].
>
> [1]
> https://hadoop.apache.org/docs/current/hadoop
Hi Venkata,
you can find them on the Hadoop AWS page (we are just using it as a
library) [1].
[1]
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration
On Sat, Jun 6, 2020 at 1:26 AM venkata sateesh` kolluru <
vkollur...@gmail.com> wrote:
Hi Kostas and Arvid,
Thanks for your suggestions.
The small files were already created and I am trying to roll few into a big
file while sinking. But due to the custom bucket assigner, it is hard
getting more files with in the same prefix in specified checkinpointing
time.
For example:
/prefix1/
Hi all,
@Venkata, Do you have many small files being created as Arvid suggested? If
yes, then I tend to agree that S3 is probably not the best sink. Although I
did not get that from your description.
In addition, instead of PrintStream you can have a look at the code of the
SimpleStringEncoder in
Hi Venkata,
are the many small files intended or is it rather an issue of our commit on
checkpointing? If so then FLINK-11499 [1] should help you. Design is close
to done, unfortunately implementation will not make it into 1.11.
In any case, I'd look at the parameter fs.s3a.connection.maximum, as
I think S3 is a wrong storage backend for this volumes of small messages.
Try to use a NoSQL database or write multiple messages into one file in S3
(1 or 10)
If you still want to go with your scenario then try a network optimized
instance and use s3a in Flink and configure s3 entropy.
Hi David,
The avg size of each file is around 30KB and I have checkpoint interval of
5 minutes. Some files are even 1 kb, because of checkpoint some files are
merged into 1 big file around 300MB.
With 120 million files and 4Tb, if the rate of transfer is 300 per minute,
it is taking weeks to writ
Hi Venkata.
300 requests per minute look like a 200ms per request, which should be a
normal response time to send a file if there isn't any speed limitation
(how big are the files?).
Have you changed the parallelization to be higher than 1? I also recommend
to limit the source parallelization, be
Hello,
I have posted the same in stackoverflow but didnt get any response. So
posting it here for help.
https://stackoverflow.com/questions/62068787/flink-s3-write-performance-optimization?noredirect=1#comment109814428_62068787
Details:
I am working on a flink application on kubernetes(eks) whi