Hi Christian,

It is not a leak but a current limitation of s3cmd.
To perform the put / sync, s3cmd get the complete list of files of source
and destination in memory dict before being to merge them in new dicts
holding the operations that will have to be done "transfer", "copy",
"delete".

So, at this moment, it is expected that it can take a long time to
"prepare" and that it uses a lot of memory.
Doing a fast estimation with 6kb file sizes, i guess you can have at least:
10000000 files.
Just for the local list itself, i think that it is safe to guess that each
"entry" will consume at least around (80 [avg path size] * 2 + 16 [hash] +
10 [a few more bytes]), resulting in around  1.8/2GB of RAM only for that.

FYI, you can use the "-v" and "--progress" flags to have more details about
what is going on.

To fix your situation, what I would advise is to try to partition your task:
let's say that there is 10 big subfolders at the root of the dataset, run
s3cmd on each subfolder instead of on the parent.
Example:
s3cmd sync root/a s3://bucket/mydest/a
s3cmd sync root/b s3://bucket/mydest/b
...
s3cmd sync root/g s3://bucket/mydest/g

instead of:
s3cmd sync root s3://bucket/mydest

The added value of such a partition is that, provided that you have enough
RAM, you could run multiple sync in parallel to speed things up.

Regards,


--
Florent
<http://www.seagate.com>

On Wed, Mar 1, 2017 at 7:50 AM, Christian Bjørnbak <c...@touristonline.dk>
wrote:

> Hi,
>
> I am trying to upload a directory containing 60 GB of jpegs in various
> sizes of 3-6 KB to a ceph storage.
>
> First I tried using sync:
>
> s3cmd sync -P /path-to-src/directory s3://bucket
>
> It takes 24+ hours and at some point the process is killed. I tried a
> couple of times and noticed that while it is running it uses all of the
> source server's memory and swap.
>
> I'm syncing from a 16 GB RAM / 16 GB swap server.
>
> I thought maybe sync keeps the files in memory to compare or something and
> changed to put:
>
> s3cmd put -P --recursive /path-to-src/directory s3://bucket
>
> But I still I experience the same - s3cmd uses all the memory.
>
> Is there an memory leak in s3cmd so it does not remove the file from
> memory after it has been uploaded?
>
>
> Med venlig hilsen / Kind regards,
>
> Christian Bjørnbak
>
> Chefudvikler / Lead Developer
> TouristOnline A/S
> Islands Brygge 43
> 2300 København S
> Denmark
> TLF: +45 32888230 <+45%2032%2088%2082%2030>
> Dir. TLF: +45 32888235 <+45%2032%2088%2082%2035>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! https://urldefense.proofpoint.
> com/v2/url?u=http-3A__sdm.link_slashdot&d=DwICAg&c=
> IGDlg0lD0b-nebmJJ0Kp8A&r=GEhQqSrCDlzPsOu9ww_S8dL0RpfPwWzg7DpciZD7d7Y&m=
> qpOID5eY0r4CZ9xs-XahlzB4gUgC_es4RDJZg-rSHAw&s=
> kltIPljk4EHr1ob1cI00SStcnOveFNnaYdQnnJfW2ug&e=
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.
> sourceforge.net_lists_listinfo_s3tools-2Dgeneral&d=DwICAg&c=IGDlg0lD0b-
> nebmJJ0Kp8A&r=GEhQqSrCDlzPsOu9ww_S8dL0RpfPwWzg7DpciZD7d7Y&m=
> qpOID5eY0r4CZ9xs-XahlzB4gUgC_es4RDJZg-rSHAw&s=
> 5Cu4QwQbU35iqzH7dWEyQ3VC9CIEX60nqUKMuWWyGfM&e=
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to