Michal Ludvig skrev 2010-10-30 20:45:
>    On 10/30/2010 12:52 AM, Martin Wickman wrote:
>> Using 's3cmd sync' eats up lots of memory.
>>
>> I'm trying to sync some 500.000 files. It gets to about 30% of RAM
>> before it starts "Retrieving list of remote files" which consumes more
>> memory until it starts to swap and I have to kill it.
> Hi Martin,
>
> you're absolutely right, s3cmd is "a bit inefficient" when it comes to
> memory management.
>
> With the current versions the only thing you can do to reduce the memory
> footprint is to split your upload into chunks, e.g. sync
> s3://bucket/dir1/ first, then s3://bucket/dir2/, etc.
>
> I'm going to work on this after 1.0.0 is released.

Thanks, that sounds nice!

Some suggestions:

* You may want to include the parallel patch 
(http://blog.50projects.com/p/s3cmd-modifications.html). It speeds up 
s3cmd _considerably_. We're talking x10 here.

* The 'put' command should have a flag for checking md5/file size before 
uploading, like sync. I hacked it like this:

def do_put_work(item,seq,total):
         cfg = Config()
         s3 = S3(cfg)
         uri_final = S3Uri(item['remote_uri'])

         try:
                 if Config().skip_existing:
                         # Check if file exists already
                         info = s3.object_info(uri_final)
                         remote_size = int(info['headers']
                                        ['content-length'])
                         local_size = int(item['size'])
                         src_md5 = Utils.hash_file_md5(item['full_name'])
                         dst_md5 = info['headers']['etag'].strip('"')

                         if remote_size == local_size and
                                               src_md5 == dst_md5:
                                 output(u"Skipping %s
                                    (already exists remotely)"
                                     % uri_final)
                                 return
         except S3Error, e:
                 pass

* There is a bug in S3.send_request() which causes timeout/hang if there 
is no body in the server response. Fix like this:

def send_request(self, request, body = None, retries = _max_retries):
...
                   if method_string != "HEAD":
                           response["data"] =  http_response.read()
...


/Cheers Martin

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to