Thanks for the responses, folks. I appreciate your feedback!
Mike
On Mar 6, 2014, at 7:55 PM, Matt Domsch <m...@domsch.com> wrote:
> Thanks for the kudos. Unfortunately, memory consumption is based on the
> number of objects in the trees being synchronized. On a 32-bit system, it
> tends to hit a python MemoryError syncing trees that are ~1M files in size.
> You are hitting a kernel OOM well before that though. You have several
> options available:
> 1) run on a 64-bit VM with 8+GB RAM (64-bit python is a huge memory hog,
> compared to 32-bit python; you have to have 2x RAM on 64-bit python to have
> equivalent number of objects as on 32-bit python).
> 2) split your sync into multiple subtrees (as you have surmised)
>
>
> There are no significant efforts under way to figure out a better way to
> handle this in s3cmd itself, given how python operates. One option would be
> to add in a sqlite on-disk or in-memory database for transient use in storing
> and comparing the local and remote file lists, but that's a fairly heavy
> undertaking and not one anyone has chosen to develop.
>
> Thanks,
> Matt
>
>
>
>
> On Thu, Mar 6, 2014 at 4:18 PM, WagnerOne <wag...@wagnerone.com> wrote:
> Hi,
>
> I was recently charged with moving a lot of data (TBs) into s3 and discovered
> the great tool that is s3cmd. It's working well and I like the familiar
> rsync-like interactions.
>
> I'm attempting to use s3cmd to copy a directory with tons of small files
> amounting to about 700GB to s3.
>
> During my tests with ~1GB transfers, things went well. When I got to this
> larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data I
> assume) on the local data before the kernel killed the process due to
> excessive RAM consumption.
>
> I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer data
> to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it to a
> m3.medium, which has 4 GB of ram.
>
> When I attempted this failed copy with the m3.medium, s3cmd ran about 3x
> longer before being terminated as above.
>
> I was hoping for a painless, big single sync job, but it's looking like I
> might have to write a wrapper to iterate over the big directories I need to
> copy to get them to a more manageable size for s3cmd.
>
> I'm guessing I've hit a limitation of the implementation as it stands
> currently, but wondered if anyone has suggestions in terms of s3cmd itself.
>
> Thanks and thanks for a great tool!
>
> Mike
>
> # s3cmd --version
> s3cmd version 1.5.0-beta1
>
> # time s3cmd sync --verbose --progress content s3://somewhere
> INFO: Compiling list of local files... Killed
>
> real 214m53.181s
> user 8m34.448s
> sys 4m5.803s
>
> # tail /var/log/messages
> xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
> xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB,
> filers:0kB
>
>
> --
> wag...@wagnerone.com
> "Linux supports the notion of a command line for the same reason that only
> children read books with only pictures in them."
>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries. Built-in WAN optimization and the
> freedom to use Git, Perforce or both. Make the move to Perforce.
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries. Built-in WAN optimization and the
> freedom to use Git, Perforce or both. Make the move to Perforce.
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
wag...@wagnerone.com
"An inglorious peace is better than a dishonorable war."- Mark Twain
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general