[ correct the URL ]
On 2014年08月02日 00:42, Osier Yang wrote:
Hi, list,
I managed to setup radosgw in testing environment to see if it's
stable/mature enough
for production use these several days. In the meanwhile, I tried to
read the source code
of radosgw to understand how it actually manages the underlying storage.
The testing result shows the the write performance to a bucket is not
good, as far as I
understood from the code, it's caused by there is only *one* bucket
index object for a
single bucket, which is not nice in principle. And moreover, requests
to the whole bucket
could be blocked if the corresponding bucket index object happens to
be in recovering or
backfilling process. This is not acceptable in production use.
Although I saw Guang Yang
did some work (the prototype patches [1]) to try to resolve the
problem with the bucket
index sharding, I'm not quite confident about if it could solve the
problem from root,
since it looks like radosgw is trying to manage millions or billions
objects in one bucket
with the index, I'm a bit worried about it even the index sharding is
supported.
Another problem I encounted is: when I upgraded radosgw to latest
version (Firefly),
radosgw-admin works well, read request works well too, but all write
request fails. Note
that I didn't do any changes on the config files, it means there is
some compactibilties
problems (client in new version fails to talk with ceph cluster in old
version). The error
looks like:
2014-07-31 10:13:10.045921 7fdb40ddd700 0 ERROR: can't read user
header: ret=-95
2014-07-31 10:13:10.045930 7fdb40ddd700 0 ERROR: sync_user() failed,
user=osier ret=-95
2014-07-31 17:00:56.075066 7fe514fe6780 0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 19974
2014-07-31 17:00:56.197659 7fe514fe6780 0 framework: fastcgi
2014-07-31 17:00:56.197666 7fe514fe6780 0 starting handler: fastcgi
2014-07-31 17:00:56.198941 7fe4f8ff9700 0 ERROR: FCGX_Accept_r
returned -9
2014-07-31 17:00:56.211176 7fe4f9ffb700 0 ERROR: can't read user
header: ret=-95
2014-07-31 17:00:56.211197 7fe4f9ffb700 0 ERROR: sync_user() failed,
user=Bob Dylon ret=-95
2014-07-31 17:00:56.212306 7fe4f9ffb700 0 ERROR: can't read user
header: ret=-95
2014-07-31 17:00:56.212325 7fe4f9ffb700 0 ERROR: sync_user() failed,
user=osier ret=-95
With these two experience, I was starting to think about if radosgw is
stable/mature
enough yet. It seems that dreamhost is the only one using radosgw for
service, though
it seems there are use cases in private environments from google. I
have no way to
demonstrate if it's stable and mature enough for production use except
trying to understand
how it works, however, I guess everybody knows it will be too hard to
go back if a distributed
system is already in production use. So I'm asking here to see if I
could get some advices/
thoughts/suggestions from who already managed to setup radosgw for
production use.
In case of the mail is long/boring enough, I'm submarizing my
questions here:
1) Is radosgw stable/mature enough for production use?
2) How it behaves in performance (especially on writing) in practice?
3) Any potential problems could be caused by addressing the millions
or billions objects with
index objects (even sharding is supported).
4) As far as I understood, it's better to not enable cache with
multiple radosgw deployment,
but is there any other ways to work around?
5) Is there any other potential traps?
Much appreciated in advance.
[1] http://news.gmane.org/gmane.comp.file-systems.ceph.devel
Never mind, it's
http://article.gmane.org/gmane.comp.file-systems.ceph.devel/20428
Regards,
Osier
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com