Hi Swift Developpers,
We have been using Swift as a IAAS provider for more than two years now, but
this mail is about feedback on the API side. I think it would be great to
include some of the ideas in future revisions of API.
I’ve been developping a few Swift clients in HTML (in Cloudwatt Dashboard) with
CORS, Java with Swing GUI (https://github.com/pierresouchay/swiftbrowser
<https://github.com/pierresouchay/swiftbrowser>) and Go for Swift to filesystem
(https://github.com/pierresouchay/swiftsync/
<https://github.com/pierresouchay/swiftsync/>), so I have now a few ideas about
how improving a bit the API.
The API is quite straightforward and intuitive to use, and writing a client is
now that difficult, but unfortunately, the Large Object support is not easy at
all to deal with.
The biggest issue is that there is now way to know whenever a file is a large
object when performing listings using JSON format, since, AFAIK a large object
is an object with 0 bytes (so its size in bytes is 0), but it also has a hash
of a zero file bytes.
For instance, a signature of such object is :
{"hash": "d41d8cd98f00b204e9800998ecf8427e", "last_modified":
"2015-06-04T10:23:57.618760", "bytes": 0, "name": "5G", "content_type":
"octet/stream"}
which is, exactly the hash of a 0 bytes file :
$ echo -n | md5
d41d8cd98f00b204e9800998ecf8427e
Ok, now lets try HEAD :
$ curl -vv -XHEAD -H X-Auth-Token:$TOKEN
'https://storage.fr1.cloudwatt.com/v1/AUTH_61b8fe6dfd0a4ce69f6622ea74444e0f/large_files/5G
…
< HTTP/1.1 200 OK
< Date: Fri, 09 Oct 2015 19:43:09 GMT
< Content-Length: 5000000000
< Accept-Ranges: bytes
< X-Object-Manifest: large_files/5G/.part-5000000000-
< Last-Modified: Thu, 04 Jun 2015 10:16:33 GMT
< Etag: "479517ec4767ca08ed0547dca003d116"
< X-Timestamp: 1433413437.61876
< Content-Type: octet/stream
< X-Trans-Id: txba36522b0b7743d683a5d-00561818cd
WTF ? While all files have the same value for ETag and hash, this is not the
case for Large files…
Furthermore, the ETag is not the md5 of the whole file, but the hash of the
hash of all manifest files (as described somewhere hidden deeply in the
documentation)
Why this is a problem ?
-------------------------------
Imagine a « naive » client using the API which performs some kind of Sync.
The client download each file and when it syncs, compares the local md5 to the
md5 of the listing… of course, the hash is the hash of a zero bytes files… so
it downloads the file again… and again… and again. Unfortunaly for our naive
client, this is exactly the kind of files we don’t want to download twice…
since the file is probably huge (after all, it has been split for a reason no ?)
I think this is really a design flaw since you need to know everything about
Swift API and extensions to have a proper behavior. The minimum would be to at
least return the same value as the ETag header.
OK, let’s continue…
We are not so Naive… our Swift Sync client know that 0 files needs more work.
* First issue: we have to know whenever the file is a « real » 0 bytes file or
not. You may think most people do not create 0 bytes files after all… this is
dummy. Actually, some I have seen two Object Storage middleware using many 0
bytes files (for instance to store meta data or two set up some kind of
directory like structure). So, in this cas, we need to perform a HEAD request
to each 0 bytes files. If you have 1000 files like this, you have to perform
1000 HEAD requests to finally know that there are not any Large file. Not very
efficient. Your Swift Sync client took 1 second to sync 20G of data with naive
approach, now, you need 5 minutes… hash of 0 bytes is not a good idea at all.
* Second issue: since the hash is the hash of all parts (I have an idea about
why this decision was made, probably for performance reasons), your client
cannot work on files since the hash of local file is not the hash of the Swift
aggregated file (which is the hash of all the hash of manifest). So, it means
you cannot work on existing data, you have to either :
- split all the files in the same way as the manifest, compute the MD5 of each
part, than compute the MD5 of the hashes and compare to the MD5 on server… (ok…
doable, but I gave up with such system)
- have a local database in your client (when you download, store the REAL Hash
of file and store that in fact you have to compare it the the HASH returned by
server)
- perform some kind of crappy heuristics (size + grab the starting bytes of
each data of each part or something like that…)
* Third issue:
- If you don’t want to store the parts of your object file, you have to wait
for all your HEAD requests to finish since it is the only way to guess all the
files that are referenced in your manifest headers.
So summarize, I think the current API really need some refinements about the
listings since a competent developper may trust the bytes value and the hash
value and create an algorithm that does not behave nicely. So, the API looks
easy but is in fact much more complicated than expected.
A few ideas to improve it :
In listings, if an Object is a large object.
- either put the real MD5 of file if it is doable technically… or remove it
(so naive program will work nicely)… same thing about bytes.
- add an optional field in the JSON to tell the object is in fact a large
object. A nice field to explain the object is a large object would be to use
the object-manifest header value. So a client could know the file is a large
file or simply a zero byte object, and also know what are the object that are
in facts parts of a larger one (and do not wait for you thousands of HEAD
requests to finish)
Finally, to help people creating interfaces quickly, add an Option to add CORS
for all containers of an account. In our Cloud provider, we added a REST CALL
in another WebService with CORS enabled that ensures a container has CORS setup
for a Container. So, browsing Swift with HTML5 interfaces is easy. By doing so,
it would - I think - greatly increase the Swift Usage (by not needing any
specific software to browse Swift).
Best Regards
--
Pierre Souchay <pierre.souc...@cloudwatt.com
<mailto:pierre.souc...@cloudwatt.com>>
Software Architect @ CloudWatt
Adresse : ETIK 892, Rue Yves Kermen 92100 Boulogne-Billancourt
N° Standard : +33 1 84 01 04 04
N° Fax : +33 1 84 01 04 05
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev