[ 
https://issues.apache.org/jira/browse/FLINK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478648#comment-17478648
 ] 

Anton Kalashnikov commented on FLINK-25200:
-------------------------------------------

I have done some tests for comparing `upload files` vs  `copy files` to S3. You 
can take a look at the results below. I actually don’t see a big difference 
between the upload and the copy but it is worth noticing that when I did the 
upload I didn’t load something from the local disk I already had prepared data 
in memory so for real upload cases the time will be even worse.
For the implementation of my test, I used `FSDataOutputStream`(as I understand, 
under the hood it uses putObject) for uploading the data and 
`AmazonS3#copyObject` for the copy.
I also noticed that `copyObject` is more sensitive to `socketTimeout` since 
this request waits for finish operation on S3 side which can take a while. So 
it is important to take into account that we should configure it properly if we 
decide to implement copy for S3.

I didn't do it, but perhaps, it also makes sense to check the case when we want 
to upload/copy from many machines. As I understand, it is exactly our case.
----

*512MB* :

Median(upload | copy) :: 4537 | 4739
Mean(upload | copy) :: 5175 | 4571
Min(upload | copy) :: 4365 | 3209
Max(upload | copy) :: 16679 | 7223

Raw upload :: [16679, 4687, 4554, 4675, 4469, 4708, 4666, 4953, 4392, 4505, 
4469, 4483, 4600, 4641, 4365, 4508, 4444, 4521, 4395, 4800]
Raw copy :: [4882, 4893, 4717, 5443, 5643, 3755, 3500, 3411, 4923, 5678, 3334, 
5346, 4364, 3209, 7223, 4930, 4762, 4631, 3212, 3572]
----
*1024MB* :

Median(upload | copy) :: 9227 | 8003
Mean(upload | copy) :: 9161 | 8143
Min(upload | copy) :: 8597 | 6150
Max(upload | copy) :: 9769 | 12075

Raw upload :: [9719, 9577, 9471, 9156, 9415, 9372, 9769, 8631, 9530, 9256, 
9278, 9422, 8690, 8718, 8597, 9198, 8636, 9076, 8995, 8723]
Raw copy :: [9975, 9338, 10134, 6655, 6351, 6150, 6715, 6403, 9591, 12075, 
9391, 9336, 6570, 6598, 6459, 9552, 9292, 9427, 6310, 6552]
----
*1536MB* :

Median(upload | copy) :: 13432 | 14243
Mean(upload | copy) :: 13474 | 18221
Min(upload | copy) :: 12590 | 9184
Max(upload | copy) :: 15073 | 80669

Raw upload :: [14362, 13249, 13547, 13117, 13496, 14310, 13615, 13448, 13253, 
15073, 13598, 12905, 13367, 12590, 13076, 13275, 12676, 13577, 13416, 13537]
Raw copy :: [9593, 14258, 16861, 15293, 9399, 14349, 14297, 38705, 9509, 38107, 
9184, 9343, 14229, 10011, 9747, 80669, 10264, 9704, 14516, 16395]
----
*2048MB* :

Median(upload | copy) :: 17905 | 13381
Mean(upload | copy) :: 18133 | 15410
Min(upload | copy) :: 16714 | 11990
Max(upload | copy) :: 22116 | 20242

Raw upload :: [17859, 18576, 17697, 18226, 18620, 17108, 17881, 22116, 18486, 
17573, 18444, 17785, 18088, 17653, 16714, 18182, 19455, 17149, 17129, 17929]
Raw copy :: [19397, 20242, 18637, 19174, 19832, 12752, 12954, 13136, 17303, 
12760, 13685, 13609, 13153, 12921, 11990, 19949, 18535, 13046, 12127, 13007]
----
 

CC: [~danny.cranmer] , [~pnowojski] 

> Implement duplicating for s3 filesystem
> ---------------------------------------
>
>                 Key: FLINK-25200
>                 URL: https://issues.apache.org/jira/browse/FLINK-25200
>             Project: Flink
>          Issue Type: Sub-task
>          Components: FileSystems
>            Reporter: Dawid Wysakowicz
>            Priority: Major
>             Fix For: 1.15.0
>
>
> We can use https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to