Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-07 Thread Ilya Dryomov
On Tue, Oct 7, 2014 at 9:46 AM, Christopher Armstrong
 wrote:
> Hi folks,
>
> I'm trying to gather additional information surrounding
> http://tracker.ceph.com/issues/9355 so we can hopefully find the root of
> what's preventing us from successfully mapping RBD volumes inside a Linux
> container.
>
> With the RBD kernel module debugging enabled (and cephx authentication
> disabled so I can echo to the RBD bus) as instructed by joshd, I notice this
> error in my dmesg:
>
> [ 1005.143340] libceph: error -22 on auth protocol 2 init
>
> Not sure this is the root of the issues, but it's certainly a lead. This may
> just be caused by the fact that we've disabled authentication in ceph.conf
> so we can debug this, but was hoping someone from the list could shed some
> light.

Hi Christopher,

I'll try to setup docker and have a look.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Carl-Johan Schenström

On 2014-10-07 03:58, Ariel Silooy wrote:


I'm sorry, but I just have to ask, what kind of 10GbE NIC do you use? If
you dont mind I/we would like to know the exact model/number. Thank you
in advance.


They're Intel X540-AT2's. That's a dual-port card. They weren't that 
much more expensive than the single-port cards when we bought them, and 
might come in handy in the future if/when we get more 10 GbE ports.


--
Carl-Johan Schenström
Driftansvarig / System Administrator
Språkbanken & Svensk nationell datatjänst /
The Swedish Language Bank & Swedish National Data Service
Göteborgs universitet / University of Gothenburg
carl-johan.schenst...@gu.se / +46 709 116769
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD MTBF

2014-10-07 Thread Martin B Nielsen
A bit late getting back on this one.

On Wed, Oct 1, 2014 at 5:05 PM, Christian Balzer  wrote:

> > smartctl states something like
> > Wear = 092%, Hours = 12883, Datawritten = 15321.83 TB avg on those. I
> > think that is ~30TB/day if I'm doing the calc right.
> >
> Something very much does not add up there.
> Either you've written 15321.83 GB on those drives, making it about
> 30GB/day and well withing the Samsung specs, or you've written 10-20 times
> the expected TBW level of those drives...
>

My bad, I forgot to say the Wear indicator here (92%) is sorta backwards -
so it means it still has 92% to go before reaching expected TBW limit.

I agree with what Massimiliano Cuttini wrote later as well - if your io
boundaries are well within the expected TBW of the lifetime I see no reason
to go for more expensive disks. Just monitor for wear and have a few in
stock ready for replacement.

Regarding the table of ssd and vendors:
Brand   Model TBW   €  €/TB
Intel   S3500 120Go   701221,74
Intel   S3500 240Go   140   2251,60
Intel   S3700 100Go   1873  2200,11
Intel   S3700 200Go   3737  4000,10
Samsung 840 pro 120Go 701201,71

I don't disagree with the above - but the table assumes you'll wear out
your SSD. Adjust the wear level and the price will change proportionally -
if you're only writing 50-100TB/year pr ssd then the value will heavily
swing in the cheaper consumer grade ssd favor. It is all about your
estimated usage pattern and whether they're 'good enough' for your scenario
or not (and/or you trust that vendor).

In my experience ceph seldom (ever) maxes out io of a ssd - it is much more
likely to be cpu or network before coming to that.

Cheers,
Martin


>
> In the article I mentioned previously:
>
> http://www.anandtech.com/show/8239/update-on-samsung-850-pro-endurance-vnand-die-size
>
> The author clearly comes with a relationship of durability versus SSD
> size, as one would expect. But the Samsung homepage just stated 150TBW,
> for all those models...
>
> Christian
>
> > Not to advertise or say every samsung 840 ssd is like this:
> > http://www.vojcik.net/samsung-ssd-840-endurance-destruct-test/
> >
> Seen it before, but I have a feeling that this test doesn't quite put the
> same strain on the poor NANDs as Emmanuel's environment.
>
> Christian
>
> > Cheers,
> > Martin
> >
> >
> > On Wed, Oct 1, 2014 at 10:18 AM, Christian Balzer  wrote:
> >
> > > On Wed, 1 Oct 2014 09:28:12 +0200 Kasper Dieter wrote:
> > >
> > > > On Tue, Sep 30, 2014 at 04:38:41PM +0200, Mark Nelson wrote:
> > > > > On 09/29/2014 03:58 AM, Dan Van Der Ster wrote:
> > > > > > Hi Emmanuel,
> > > > > > This is interesting, because we?ve had sales guys telling us that
> > > > > > those Samsung drives are definitely the best for a Ceph journal
> > > > > > O_o !
> > > > >
> > > > > Our sales guys or Samsung sales guys?  :)  If it was ours, let me
> > > > > know.
> > > > >
> > > > > > The conventional wisdom has been to use the Intel DC S3700
> > > > > > because of its massive durability.
> > > > >
> > > > > The S3700 is definitely one of the better drives on the market for
> > > > > Ceph journals.  Some of the higher end PCIE SSDs have pretty high
> > > > > durability (and performance) as well, but cost more (though you can
> > > > > save SAS bay space, so it's a trade-off).
> > > > Intel P3700 could be an alternative with 10 Drive-Writes/Day for 5
> > > > years (see attachment)
> > > >
> > > They're certainly nice and competitively priced (TBW/$ wise at least).
> > > However as I said in another thread, once your SSDs start to outlive
> > > your planned server deployment time (in our case 5 years) that's
> > > probably good enough.
> > >
> > > It's all about finding the balance between cost, speed (BW and IOPS),
> > > durability and space.
> > >
> > > For example I'm currently building a cluster based on 2U, 12 hotswap
> > > bays servers (because I already had 2 floating around) and am using 4
> > > 100GB DC S3700 (at US$200 each) and 8 HDDS in them.
> > > Putting in a 400GB DC P3700 (US$1200( instead and 4 more HDDs would
> > > have pushed me over the budget and left me with a less than 30% "used"
> > > SSD 5 years later, at a time when we clearly can expect these things
> > > to be massively faster and cheaper.
> > >
> > > Now if you're actually having a cluster that would wear out a P3700 in
> > > 5 years (or you're planning to run your machines until they burst into
> > > flames), then that's another story. ^.^
> > >
> > > Christian
> > >
> > > > -Dieter
> > > >
> > > > >
> > > > > >
> > > > > > Anyway, I?m curious what do the SMART counters say on your SSDs??
> > > > > > are they really failing due to worn out P/E cycles or is it
> > > > > > something else?
> > > > > >
> > > > > > Cheers, Dan
> > > > > >
> > > > > >
> > > > > >> On 29 Sep 2014, at 10:31, Emmanuel Lacour
> > > > > >>  wrote:
> > > > > >>
> > > > > >>
> > > > > >> Dear ceph users,
> > > > > >>
> > > > > >>
> > > > > >> we

Re: [ceph-users] SSD MTBF

2014-10-07 Thread Emmanuel Lacour
On Tue, Oct 07, 2014 at 05:24:40PM +0200, Martin B Nielsen wrote:
> 
>I don't disagree with the above - but the table assumes you'll wear out
>your SSD. Adjust the wear level and the price will change proportionally -
>if you're only writing 50-100TB/year pr ssd then the value will heavily
>swing in the cheaper consumer grade ssd favor. It is all about your
>estimated usage pattern and whether they're 'good enough' for your
>scenario or not (and/or you trust that vendor).
>In my experience ceph seldom (ever) maxes out io of a ssd - it is much
>more likely to be cpu or network before coming to that.
> 

I agree with this. In our case, the response is Intel S3700 100Go
without any doubt :)


-- 
Easter-eggs  Spécialiste GNU/Linux
44-46 rue de l'Ouest  -  75014 Paris  -  France -  Métro Gaité
Phone: +33 (0) 1 43 35 00 37-   Fax: +33 (0) 1 43 35 00 76
mailto:elac...@easter-eggs.com  -   http://www.easter-eggs.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-07 Thread Christopher Armstrong
Thank you Ilya! Please let me know if I can help. To give you some
background, I'm one of the core maintainers of Deis, an open-source PaaS
built on Docker and CoreOS. We have Ceph running quite successfully as
implemented in https://github.com/deis/deis/pull/1910 based on Seán
McCord's containerized Ceph work: https://github.com/ulexus/docker-ceph

We are currently only using radosgw. We really need shared volume support,
which is why we're interested in getting RBD mapping working.

Thanks for helping with this!


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Tue, Oct 7, 2014 at 4:05 AM, Ilya Dryomov 
wrote:

> On Tue, Oct 7, 2014 at 9:46 AM, Christopher Armstrong
>  wrote:
> > Hi folks,
> >
> > I'm trying to gather additional information surrounding
> > http://tracker.ceph.com/issues/9355 so we can hopefully find the root of
> > what's preventing us from successfully mapping RBD volumes inside a Linux
> > container.
> >
> > With the RBD kernel module debugging enabled (and cephx authentication
> > disabled so I can echo to the RBD bus) as instructed by joshd, I notice
> this
> > error in my dmesg:
> >
> > [ 1005.143340] libceph: error -22 on auth protocol 2 init
> >
> > Not sure this is the root of the issues, but it's certainly a lead. This
> may
> > just be caused by the fact that we've disabled authentication in
> ceph.conf
> > so we can debug this, but was hoping someone from the list could shed
> some
> > light.
>
> Hi Christopher,
>
> I'll try to setup docker and have a look.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-07 Thread Gregory Farnum
Sorry; I guess this fell off my radar.

The issue here is not that it's waiting for an osdmap; it got the
requested map and went into replay mode almost immediately. In fact
the log looks good except that it seems to finish replaying the log
and then simply fail to transition into active. Generate a new one,
adding in "debug journaled = 20" and "debug filer = 20", and we can
probably figure out how to fix it.
(This diagnosis is much easier in the upcoming Giant!)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Oct 7, 2014 at 7:55 AM, Jasper Siero
 wrote:
> Hello Gregory,
>
> We still have the same problems with our test ceph cluster and didn't receive 
> a reply from you after I send you the requested log files. Do you know if 
> it's possible to get our cephfs filesystem working again or is it better to 
> give up the files on cephfs and start over again?
>
> We restarted the cluster serveral times but it's still degraded:
> [root@th1-mon001 ~]# ceph -w
> cluster c78209f5-55ea-4c70-8968-2231d2b05560
>  health HEALTH_WARN mds cluster is degraded
>  monmap e3: 3 mons at 
> {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
>  election epoch 432, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
>  mdsmap e190: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
>  osdmap e2248: 12 osds: 12 up, 12 in
>   pgmap v197548: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
> 124 GB used, 175 GB / 299 GB avail
>  491 active+clean
>1 active+clean+scrubbing+deep
>
> One placement group stays in the deep scrubbing fase.
>
> Kind regards,
>
> Jasper Siero
>
>
> 
> Van: Jasper Siero
> Verzonden: donderdag 21 augustus 2014 16:43
> Aan: Gregory Farnum
> Onderwerp: RE: [ceph-users] mds isn't working anymore after osd's running full
>
> I did restart it but you are right about the epoch number which has changed 
> but the situation looks the same.
> 2014-08-21 16:33:06.032366 7f9b5f3cd700  1 mds.0.27  need osdmap epoch 1994, 
> have 1993
> 2014-08-21 16:33:06.032368 7f9b5f3cd700  1 mds.0.27  waiting for osdmap 1994 
> (which blacklists
> prior instance)
> I started the mds with the debug options and attached the log.
>
> Thanks,
>
> Jasper
> 
> Van: Gregory Farnum [g...@inktank.com]
> Verzonden: woensdag 20 augustus 2014 18:38
> Aan: Jasper Siero
> CC: ceph-users@lists.ceph.com
> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full
>
> After restarting your MDS, it still says it has epoch 1832 and needs
> epoch 1833? I think you didn't really restart it.
> If the epoch numbers have changed, can you restart it with "debug mds
> = 20", "debug objecter = 20", "debug ms = 1" in the ceph.conf and post
> the resulting log file somewhere?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Wed, Aug 20, 2014 at 12:49 AM, Jasper Siero
>  wrote:
>> Unfortunately that doesn't help. I restarted both the active and standby mds 
>> but that doesn't change the state of the mds. Is there a way to force the 
>> mds to look at the 1832 epoch (or earlier) instead of 1833 (need osdmap 
>> epoch 1833, have 1832)?
>>
>> Thanks,
>>
>> Jasper
>> 
>> Van: Gregory Farnum [g...@inktank.com]
>> Verzonden: dinsdag 19 augustus 2014 19:49
>> Aan: Jasper Siero
>> CC: ceph-users@lists.ceph.com
>> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
>> full
>>
>> On Mon, Aug 18, 2014 at 6:56 AM, Jasper Siero
>>  wrote:
>>> Hi all,
>>>
>>> We have a small ceph cluster running version 0.80.1 with cephfs on five
>>> nodes.
>>> Last week some osd's were full and shut itself down. To help de osd's start
>>> again I added some extra osd's and moved some placement group directories on
>>> the full osd's (which has a copy on another osd) to another place on the
>>> node (as mentioned in
>>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
>>> After clearing some space on the full osd's I started them again. After a
>>> lot of deep scrubbing and two pg inconsistencies which needed to be repaired
>>> everything looked fine except the mds which still is in the replay state and
>>> it stays that way.
>>> The log below says that mds need osdmap epoch 1833 and have 1832.
>>>
>>> 2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map standby
>>> 2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am now
>>> mds.0.25
>>> 2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state
>>> change up:standby --> up:replay
>>> 2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
>>> 2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
>>> 2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 1833,
>>> have 1832
>>> 2014-08-18 12:29:22.274017 7fa786182700  1

Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Massimiliano Cuttini

Hi Christian,

When you say "10 gig infiniband", do you mean QDRx4 Infiniband (usually
flogged as 40Gb/s even though it is 32Gb/s, but who's counting), which
tends to be the same basic hardware as the 10Gb/s Ethernet offerings from
Mellanox?

A brand new 18 port switch of that caliber will only cost about 180$ per
port, too.



I investigate about infiniband but i didn't found affordable prices at all
Moreover how do you connect your /l//egacy node servers/ to your /brand 
new storages/ if you have Infiniband only on storages & switches?
Is there any mixed switch that allow you both to connect with Infiniband 
and Ethernet?


If there is, please send specs because i cannot find just by google it.

Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.86 released (Giant release candidate)

2014-10-07 Thread Sage Weil
This is a release candidate for Giant, which will hopefully be out in 
another week or two (s v0.86).  We did a feature freeze about a month ago 
and since then have been doing only stabilization and bug fixing (and a 
handful on low-risk enhancements).  A fair bit of new functionality went 
into the final sprint, but it's baked for quite a while now and we're 
feeling pretty good about it.

Major items include:

* librados locking refactor to improve scaling and client performance
* local recovery code (LRC) erasure code plugin to trade some
  additional storage overhead for improved recovery performance
* LTTNG tracing framework, with initial tracepoints in librados,
  librbd, and the OSD FileStore backend
* separate monitor audit log for all administrative commands
* asynchronos monitor transaction commits to reduce the impact on
  monitor read requests while processing updates
* low-level tool for working with individual OSD data stores for
  debugging, recovery, and testing
* many MDS improvements (bug fixes, health reporting)

There are still a handful of known bugs in this release, but nothing
severe enough to prevent a release.  By and large we are pretty
pleased with the stability and expect the final Giant release to be
quite reliable.

Please try this out on your non-production clusters for a preview.

Notable Changes
---

* buffer: improve rebuild_page_aligned (Ma Jianpeng)
* build: fix CentOS 5 (Gerben Meijer)
* build: fix build on alpha (Michael Cree, Dmitry Smirnov)
* build: fix yasm check for x32 (Daniel Schepler, Sage Weil)
* ceph-disk: add Scientific Linux support (Dan van der Ster)
* ceph-fuse, libcephfs: fix crash in trim_caps (John Spray)
* ceph-fuse, libcephfs: improve cap trimming (John Spray)
* ceph-fuse, libcephfs: virtual xattrs for rstat (Yan, Zheng)
* ceph.conf: update sample (Sebastien Han)
* ceph.spec: many fixes (Erik Logtenberg, Boris Ranto, Dan Mick, Sandon Van 
Ness)
* ceph_objectstore_tool: vastly improved and extended tool for working offline 
with OSD data stores (David Zafman)
* common: add config diff admin socket command (Joao Eduardo Luis)
* common: add rwlock assertion checks (Yehuda Sadeh)
* crush: clean up CrushWrapper interface (Xioaxi Chen)
* crush: make ruleset ids unique (Xiaoxi Chen, Loic Dachary)
* doc: improve manual install docs (Francois Lafont)
* doc: misc updates (John Wilkins, Loic Dachary, David Moreau Simard, Wido den 
Hollander. Volker Voigt, Alfredo Deza, Stephen Jahl, Dan van der Ster)
* global: write pid file even when running in foreground (Alexandre Oliva)
* hadoop: improve tests (Huamin Chen, Greg Farnum, John Spray)
* journaler: fix locking (Zheng, Yan)
* librados, osd: return ETIMEDOUT on failed notify (Sage Weil)
* librados: fix crash on read op timeout (#9362 Matthias Kiefer, Sage Weil)
* librados: fix shutdown race (#9130 Sage Weil)
* librados: fix watch reregistration on acting set change (#9220 Samuel Just)
* librados: fix watch/notify test (#7934 David Zafman)
* librados: give Objecter fine-grained locks (Yehuda Sadeh, Sage Weil, John 
Spray)
* librados: lttng tracepoitns (Adam Crume)
* librados: pybind: fix reads when \0 is present (#9547 Mohammad Salehe)
* librbd: enforce cache size on read requests (Jason Dillaman)
* librbd: handle blacklisting during shutdown (#9105 John Spray)
* librbd: lttng tracepoints (Adam Crume)
* lttng: tracing infrastructure (Noah Watkins, Adam Crume)
* mailmap: updates (Loic Dachary, Abhishek Lekshmanan, M Ranga Swami Reddy)
* many many coverity fixes, cleanups (Danny Al-Gaaf)
* mds: adapt to new Objecter locking, give types to all Contexts (John Spray)
* mds: add internal health checks (John Spray)
* mds: avoid tight mon reconnect loop (#9428 Sage Weil)
* mds: fix crash killing sessions (#9173 John Spray)
* mds: fix ctime updates (#9514 Greg Farnum)
* mds: fix replay locking (Yan, Zheng)
* mds: fix standby-replay cache trimming (#8648 Zheng, Yan)
* mds: give perfcounters meaningful names (Sage Weil)
* mds: improve health reporting to monitor (John Spray)
* mds: improve journal locking (Zheng, Yan)
* mds: make max file recoveries tunable (Sage Weil)
* mds: prioritize file recovery when appropriate (Sage Weil)
* mds: refactor beacon, improve reliability (John Spray)
* mds: restart on EBLACKLISTED (John Spray)
* mds: track RECALL progress, report failure (#9284 John Spray)
* mds: update segment references during journal write (John Spray, Greg Farnum)
* mds: use meaningful names for clients (John Spray)
* mds: warn clients which aren't revoking caps (Zheng, Yan, John Spray)
* mon: add 'osd reweight-by-pg' command (Sage Weil, Guang Yang)
* mon: add audit log for all admin commands (Joao Eduardo Luis)
* mon: add cluster fingerprint (Sage Weil)
* mon: avoid creating unnecessary rule on pool create (#9304 Loic Dachary)
* mon: do not spam log (Aanchal Agrawal, Sage Weil)
* mon: fix 'osd perf' reported latency (#9269 Samuel Just)
* mon: fix double-free of old MOSDBoot (Sage Weil)
* mon: fix op write l

Re: [ceph-users] max_bucket limit -- safe to disable?

2014-10-07 Thread Daniel Schneller
Hi!

I have re-run our test as follows:

* 4 Rados Gateways, on 4 baremetal machines which have
  a total of 48 spinning rust OSDs.

* Benchmark run on a virtual machine talking to HAProxy
  which balances the requests across the 4 Rados GWs.

* Three instances of the benchmark run in parallel. Each
  instance creates 1000 containers, puts 11 objects into
  each container. Once they have all been created, each
  instances deletes its own containers again.

I configured one of the radosgws with the debug levels
you requested. The tests produced quite an amount of data
(approx. 1GB of text), so I took the liberty to 
pre-process that a bit.

In this run we landed at around 1.2s per container
created (including the objects in them) on average.
This was slightly better than the 1.6s we witnessed
before, but that test ran for much longer, so there might
have been some queue-up effect. 

Interestingly enough the average is actually somewhat
misleading. The logs below show the creation of one
object in a container each, one being the fastest of this
benchmark (at least on the debug-enabled radosgw), one
being the slowest.

The fastest one was completed in 0.013s, the longest one
took 4.93s(!).

The attached logs are cleaned up so that they each show
just a single request and replaced longish, but constant
information with placeholders. Our container names are
of the form “stresstest-xxx” which I shortened
to “” for brevity. Also, I removed the redundant
prefix (date, hour, minute of day). 

The column before the log level looked like a thread-id.
As I focused on a single request, I removed all the lines
that did not match the same id, replacing the actual value
with “”. That makes the logs much easier to read and
understand.

Just in case I might have removed too much information
for the logs to be useful, the complete log is available
in BZIP2 compressed form for download. Just let me know
if you need it, then I will provide a link via direct email.

To me it seems like there might indeed be a contention
issue. It would be interesting to know, if this is correct
and if there are any settings that we could adjust to 
alleviate the issue.

Daniel

==

➜  ~  cat rados_shortest.txt 
21.431185  20 QUERY_STRING=page=swift¶ms=/v1//version
21.431187  20 REMOTE_ADDR=10.102.8.140
21.431188  20 REMOTE_PORT=44007
21.431189  20 REQUEST_METHOD=PUT
21.431190  20 REQUEST_SCHEME=https
21.431191  20 REQUEST_URI=/swift/v1//version
21.431192  20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
21.431193  20 SCRIPT_NAME=/swift/v1//version
21.431194  20 SCRIPT_URI=https://localhost:8405/swift/v1//version
21.431195  20 SCRIPT_URL=/swift/v1//version
21.431196  20 SERVER_ADDR=10.102.9.11
21.431197  20 SERVER_ADMIN=[no address given]
21.431198  20 SERVER_NAME=localhost
21.431199  20 SERVER_PORT=8405
21.431200  20 SERVER_PORT_SECURE=443
21.431201  20 SERVER_PROTOCOL=HTTP/1.1
21.431202  20 SERVER_SIGNATURE=
21.431203  20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
21.431205   1 == starting new request req=0x7f038c019e90 =
21.431219   2 req 980641:0.15::PUT 
/swift/v1//version::initializing
21.431259  10 ver=v1 first= req=version
21.431265  10 s->object=version s->bucket=
21.431269   2 req 980641:0.65:swift:PUT 
/swift/v1//version::getting op
21.431274   2 req 980641:0.70:swift:PUT 
/swift/v1//version:put_obj:authorizing
21.431321  20 get_obj_state: rctx=0x7f030800b720 
obj=.users.swift:documentstore:swift state=0x7f03080f31e8 s->prefetch_data=0
21.431332  10 cache get: name=.users.swift+documentstore:swift : hit
21.431338  20 get_obj_state: s->obj_tag was set empty
21.431344  10 cache get: name=.users.swift+documentstore:swift : hit
21.431369  20 get_obj_state: rctx=0x7f030800b720 
obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
21.431374  10 cache get: name=.users.uid+documentstore : hit
21.431378  20 get_obj_state: s->obj_tag was set empty
21.431382  10 cache get: name=.users.uid+documentstore : hit
21.431401  10 swift_user=documentstore:swift
21.431416  20 build_token 
token=1300646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
21.431467   2 req 980641:0.000262:swift:PUT 
/swift/v1//version:put_obj:reading permissions
21.431493  20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw: 
state=0x7f03080f31e8 s->prefetch_data=0
21.431508  10 cache get: name=.rgw+ : type miss (requested=22, 
cached=19)
21.433081  10 cache put: name=.rgw+
21.433106  10 removing entry: 
name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache LRU
21.433114  10 moving .rgw+ to cache LRU end
21.433120  20 get_obj_state: s->obj_tag was set empty
21.433122  20 Read xattr: user.rgw.idtag
21.433124  20 Read xattr: user.rgw.manifest
21.433129  10 cache get: name=.rgw+ : hit
21.433141  20 rgw_get_bucket_info: bucket instance: 
(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911])
21.433148  20 reading from 
.rgw:.bucket.meta.:default.78418684.118911
21.433169  20 get_ob

Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Scott Laird
I've done this two ways in the past.  Either I'll give each machine an
Infiniband network link and a 1000baseT link and use the Infiniband one as
the private network for Ceph, or I'll throw an Infiniband card into a PC
and run something like Vyatta/VyOS on it and make it a router, so IP
traffic can get out of the IB network.  Of course, those have both been for
test labs.  YMMV.

On Tue Oct 07 2014 at 11:05:23 AM Massimiliano Cuttini 
wrote:

>  Hi Christian,
>
>  When you say "10 gig infiniband", do you mean QDRx4 Infiniband (usually
> flogged as 40Gb/s even though it is 32Gb/s, but who's counting), which
> tends to be the same basic hardware as the 10Gb/s Ethernet offerings from
> Mellanox?
>
> A brand new 18 port switch of that caliber will only cost about 180$ per
> port, too.
>
>
>
> I investigate about infiniband but i didn't found affordable prices at all
> Moreover how do you connect your *l**egacy node servers* to your *brand
> new storages* if you have Infiniband only on storages & switches?
> Is there any mixed switch that allow you both to connect with Infiniband
> and Ethernet?
>
> If there is, please send specs because i cannot find just by google it.
>
> Thanks,
> Max
>
>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] max_bucket limit -- safe to disable?

2014-10-07 Thread Yehuda Sadeh
The logs here don't include the messenger (debug ms = 1). It's hard to
tell what going on from looking at the outliers. Also, in your
previous mail you described a different benchmark, you tested writing
large number of objects into a single bucket, whereas in this test
you're testing multiple bucket creations, which have a completely
different characteristics.


On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller
 wrote:
> Hi!
>
> I have re-run our test as follows:
>
> * 4 Rados Gateways, on 4 baremetal machines which have
>   a total of 48 spinning rust OSDs.
>
> * Benchmark run on a virtual machine talking to HAProxy
>   which balances the requests across the 4 Rados GWs.
>
> * Three instances of the benchmark run in parallel. Each
>   instance creates 1000 containers, puts 11 objects into
>   each container. Once they have all been created, each
>   instances deletes its own containers again.
>
> I configured one of the radosgws with the debug levels
> you requested. The tests produced quite an amount of data
> (approx. 1GB of text), so I took the liberty to
> pre-process that a bit.
>
> In this run we landed at around 1.2s per container
> created (including the objects in them) on average.
> This was slightly better than the 1.6s we witnessed
> before, but that test ran for much longer, so there might
> have been some queue-up effect.
>
> Interestingly enough the average is actually somewhat
> misleading. The logs below show the creation of one
> object in a container each, one being the fastest of this
> benchmark (at least on the debug-enabled radosgw), one
> being the slowest.
>
> The fastest one was completed in 0.013s, the longest one
> took 4.93s(!).
>
> The attached logs are cleaned up so that they each show
> just a single request and replaced longish, but constant
> information with placeholders. Our container names are
> of the form “stresstest-xxx” which I shortened
> to “” for brevity. Also, I removed the redundant
> prefix (date, hour, minute of day).
>
> The column before the log level looked like a thread-id.
> As I focused on a single request, I removed all the lines
> that did not match the same id, replacing the actual value
> with “”. That makes the logs much easier to read and
> understand.
>
> Just in case I might have removed too much information
> for the logs to be useful, the complete log is available
> in BZIP2 compressed form for download. Just let me know
> if you need it, then I will provide a link via direct email.
>
> To me it seems like there might indeed be a contention
> issue. It would be interesting to know, if this is correct
> and if there are any settings that we could adjust to
> alleviate the issue.
>
> Daniel
>
> ==
>
> ➜  ~  cat rados_shortest.txt
> 21.431185  20 QUERY_STRING=page=swift¶ms=/v1//version
> 21.431187  20 REMOTE_ADDR=10.102.8.140
> 21.431188  20 REMOTE_PORT=44007
> 21.431189  20 REQUEST_METHOD=PUT
> 21.431190  20 REQUEST_SCHEME=https
> 21.431191  20 REQUEST_URI=/swift/v1//version
> 21.431192  20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
> 21.431193  20 SCRIPT_NAME=/swift/v1//version
> 21.431194  20
> SCRIPT_URI=https://localhost:8405/swift/v1//version
> 21.431195  20 SCRIPT_URL=/swift/v1//version
> 21.431196  20 SERVER_ADDR=10.102.9.11
> 21.431197  20 SERVER_ADMIN=[no address given]
> 21.431198  20 SERVER_NAME=localhost
> 21.431199  20 SERVER_PORT=8405
> 21.431200  20 SERVER_PORT_SECURE=443
> 21.431201  20 SERVER_PROTOCOL=HTTP/1.1
> 21.431202  20 SERVER_SIGNATURE=
> 21.431203  20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
> 21.431205   1 == starting new request req=0x7f038c019e90 =
> 21.431219   2 req 980641:0.15::PUT
> /swift/v1//version::initializing
> 21.431259  10 ver=v1 first= req=version
> 21.431265  10 s->object=version s->bucket=
> 21.431269   2 req 980641:0.65:swift:PUT
> /swift/v1//version::getting op
> 21.431274   2 req 980641:0.70:swift:PUT
> /swift/v1//version:put_obj:authorizing
> 21.431321  20 get_obj_state: rctx=0x7f030800b720
> obj=.users.swift:documentstore:swift state=0x7f03080f31e8 s->prefetch_data=0
> 21.431332  10 cache get: name=.users.swift+documentstore:swift : hit
> 21.431338  20 get_obj_state: s->obj_tag was set empty
> 21.431344  10 cache get: name=.users.swift+documentstore:swift : hit
> 21.431369  20 get_obj_state: rctx=0x7f030800b720
> obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
> 21.431374  10 cache get: name=.users.uid+documentstore : hit
> 21.431378  20 get_obj_state: s->obj_tag was set empty
> 21.431382  10 cache get: name=.users.uid+documentstore : hit
> 21.431401  10 swift_user=documentstore:swift
> 21.431416  20 build_token
> token=1300646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
> 21.431467   2 req 980641:0.000262:swift:PUT
> /swift/v1//version:put_obj:reading permissions
> 21.431493  20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:
> state=0x7f03080f31e8 s->prefetch_data=0
> 21.431508  10 cache get: name=.rgw+ : t

Re: [ceph-users] Multi node dev environment

2014-10-07 Thread Johnu George (johnugeo)
Even when I try ceph-deploy install --dev , I
am seeing that it is getting installed from official ceph repo. How can I
install ceph from my github repo or my local repo in all ceph nodes? (Or
any other possibility? ). Someone can help me in setting this?

Johnu



On 10/2/14, 1:55 PM, "Somnath Roy"  wrote:

>I think you should just skip 'ceph-deploy install' command and install
>your version of the ceph package in all the nodes manually.
>Otherwise there is ceph-deploy install --dev  you can try out.
>
>Thanks & Regards
>Somnath
>
>-Original Message-
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>Johnu George (johnugeo)
>Sent: Thursday, October 02, 2014 1:08 PM
>To: Loic Dachary
>Cc: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Multi node dev environment
>
>How do I use ceph-deploy in this case?. How do I get ceph-deploy to use
>my privately built ceph package (with my changes) and install them in all
>ceph nodes?
>
>
>Johnu
>
>On 10/2/14, 7:22 AM, "Loic Dachary"  wrote:
>
>>Hi,
>>
>>I would use ceph-deploy
>>http://ceph.com/docs/master/start/quick-start-preflight/#ceph-deploy-se
>>tup  but ... I've only done tests a few times and other people may have
>>a more elaborate answer to this question ;-)
>>
>>Cheers
>>
>>On 02/10/2014 15:44, Johnu George (johnugeo) wrote:> Hi,
>>> I was trying to set up a multi node dev environment. Till now,  I was
>>>building ceph by executing ./configure and make. I then used to test
>>>the features by using vstart in a single node. Instead of it, if I
>>>still need to use the multi node cluster for testing, what is the
>>>proper way to do?.  If I need to run benchmarks(using rados bench or
>>>other benchmarking tools) after any code change, what is the right
>>>practice to test some change in a multi node dev setup? ( Multi node
>>>setup is needed as part of getting right performance results in
>>>benchmark tests)
>>>
>>>
>>> Thanks,
>>> Johnu
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>--
>>Loïc Dachary, Artisan Logiciel Libre
>>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>PLEASE NOTE: The information contained in this electronic mail message is
>intended only for the use of the designated recipient(s) named above. If
>the reader of this message is not the intended recipient, you are hereby
>notified that you have received this message in error and that any
>review, dissemination, distribution, or copying of this message is
>strictly prohibited. If you have received this communication in error,
>please notify the sender by telephone or e-mail (as shown above)
>immediately and destroy any and all copies of this message in your
>possession (whether hard copies or electronically stored copies).
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi node dev environment

2014-10-07 Thread Alfredo Deza
On Tue, Oct 7, 2014 at 5:05 PM, Johnu George (johnugeo)
 wrote:
> Even when I try ceph-deploy install --dev , I
> am seeing that it is getting installed from official ceph repo. How can I
> install ceph from my github repo or my local repo in all ceph nodes? (Or
> any other possibility? ). Someone can help me in setting this?

That is just not possible. Only branches that are pushed to the Ceph
repo are available through the
`--dev` flag because they rely on a URL structure and repo that we maintain.


>
> Johnu
>
>
>
> On 10/2/14, 1:55 PM, "Somnath Roy"  wrote:
>
>>I think you should just skip 'ceph-deploy install' command and install
>>your version of the ceph package in all the nodes manually.
>>Otherwise there is ceph-deploy install --dev  you can try out.
>>
>>Thanks & Regards
>>Somnath
>>
>>-Original Message-
>>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>Johnu George (johnugeo)
>>Sent: Thursday, October 02, 2014 1:08 PM
>>To: Loic Dachary
>>Cc: ceph-users@lists.ceph.com
>>Subject: Re: [ceph-users] Multi node dev environment
>>
>>How do I use ceph-deploy in this case?. How do I get ceph-deploy to use
>>my privately built ceph package (with my changes) and install them in all
>>ceph nodes?
>>
>>
>>Johnu
>>
>>On 10/2/14, 7:22 AM, "Loic Dachary"  wrote:
>>
>>>Hi,
>>>
>>>I would use ceph-deploy
>>>http://ceph.com/docs/master/start/quick-start-preflight/#ceph-deploy-se
>>>tup  but ... I've only done tests a few times and other people may have
>>>a more elaborate answer to this question ;-)
>>>
>>>Cheers
>>>
>>>On 02/10/2014 15:44, Johnu George (johnugeo) wrote:> Hi,
 I was trying to set up a multi node dev environment. Till now,  I was
building ceph by executing ./configure and make. I then used to test
the features by using vstart in a single node. Instead of it, if I
still need to use the multi node cluster for testing, what is the
proper way to do?.  If I need to run benchmarks(using rados bench or
other benchmarking tools) after any code change, what is the right
practice to test some change in a multi node dev setup? ( Multi node
setup is needed as part of getting right performance results in
benchmark tests)


 Thanks,
 Johnu


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>--
>>>Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>>___
>>ceph-users mailing list
>>ceph-users@lists.ceph.com
>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>PLEASE NOTE: The information contained in this electronic mail message is
>>intended only for the use of the designated recipient(s) named above. If
>>the reader of this message is not the intended recipient, you are hereby
>>notified that you have received this message in error and that any
>>review, dissemination, distribution, or copying of this message is
>>strictly prohibited. If you have received this communication in error,
>>please notify the sender by telephone or e-mail (as shown above)
>>immediately and destroy any and all copies of this message in your
>>possession (whether hard copies or electronically stored copies).
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi node dev environment

2014-10-07 Thread Johnu George (johnugeo)
Thanks Alfredo. Is there any other possible way that will work for my
situation? Anything would be helpful

Johnu

On 10/7/14, 2:25 PM, "Alfredo Deza"  wrote:

>On Tue, Oct 7, 2014 at 5:05 PM, Johnu George (johnugeo)
> wrote:
>> Even when I try ceph-deploy install --dev , I
>> am seeing that it is getting installed from official ceph repo. How can
>>I
>> install ceph from my github repo or my local repo in all ceph nodes? (Or
>> any other possibility? ). Someone can help me in setting this?
>
>That is just not possible. Only branches that are pushed to the Ceph
>repo are available through the
>`--dev` flag because they rely on a URL structure and repo that we
>maintain.
>
>
>>
>> Johnu
>>
>>
>>
>> On 10/2/14, 1:55 PM, "Somnath Roy"  wrote:
>>
>>>I think you should just skip 'ceph-deploy install' command and install
>>>your version of the ceph package in all the nodes manually.
>>>Otherwise there is ceph-deploy install --dev  you can try
>>>out.
>>>
>>>Thanks & Regards
>>>Somnath
>>>
>>>-Original Message-
>>>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>>Johnu George (johnugeo)
>>>Sent: Thursday, October 02, 2014 1:08 PM
>>>To: Loic Dachary
>>>Cc: ceph-users@lists.ceph.com
>>>Subject: Re: [ceph-users] Multi node dev environment
>>>
>>>How do I use ceph-deploy in this case?. How do I get ceph-deploy to use
>>>my privately built ceph package (with my changes) and install them in
>>>all
>>>ceph nodes?
>>>
>>>
>>>Johnu
>>>
>>>On 10/2/14, 7:22 AM, "Loic Dachary"  wrote:
>>>
Hi,

I would use ceph-deploy
http://ceph.com/docs/master/start/quick-start-preflight/#ceph-deploy-se
tup  but ... I've only done tests a few times and other people may have
a more elaborate answer to this question ;-)

Cheers

On 02/10/2014 15:44, Johnu George (johnugeo) wrote:> Hi,
> I was trying to set up a multi node dev environment. Till now,  I was
>building ceph by executing ./configure and make. I then used to test
>the features by using vstart in a single node. Instead of it, if I
>still need to use the multi node cluster for testing, what is the
>proper way to do?.  If I need to run benchmarks(using rados bench or
>other benchmarking tools) after any code change, what is the right
>practice to test some change in a multi node dev setup? ( Multi node
>setup is needed as part of getting right performance results in
>benchmark tests)
>
>
> Thanks,
> Johnu
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Loïc Dachary, Artisan Logiciel Libre

>>>
>>>___
>>>ceph-users mailing list
>>>ceph-users@lists.ceph.com
>>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>PLEASE NOTE: The information contained in this electronic mail message
>>>is
>>>intended only for the use of the designated recipient(s) named above. If
>>>the reader of this message is not the intended recipient, you are hereby
>>>notified that you have received this message in error and that any
>>>review, dissemination, distribution, or copying of this message is
>>>strictly prohibited. If you have received this communication in error,
>>>please notify the sender by telephone or e-mail (as shown above)
>>>immediately and destroy any and all copies of this message in your
>>>possession (whether hard copies or electronically stored copies).
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD on openstack glance+cinder CoW?

2014-10-07 Thread Jonathan Proulx
Hi All,

We're running Firefly on the ceph side and Icehouse on the OpenStack
side & I've pulled the recommended nova branch from
https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse

according to 
http://ceph.com/docs/master/rbd/rbd-openstack/#booting-from-a-block-device:

"When Glance and Cinder are both using Ceph block devices, the image
is a copy-on-write clone, so it can create a new volume quickly"

I'm not seeing this, even though I have glance setup in such away that
nova does create copy on write clones when booting ephemeral instances
of the same image.  Cinder downloads the glance RBD than pushes it
back up as full copy.

Since Glance -> Nova is working (has the show_image_direct_url=True
etc...) I suspect a problem with my Cinder config, this is what I
added for rbd support:

[rbd]
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_pool=volumes
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot=false
rbd_max_clone_depth=5
glance_api_version=2
rbd_user=
rbd_secret_uuid=
volume_backend_name=rbd

Note it does *work* just not doing CoW.  Am I missing something here?

Thanks,
-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Openstack keystone with Radosgw

2014-10-07 Thread lakshmi k s
I am trying to integrate OpenStack Keystone with Ceph Object
Store using the link - http://ceph.com/docs/master/radosgw/keystone.  Swift 
V1.0 (without keystone) works
quite fine. But for some reason, Swift v2.0 keystone calls to Ceph Object Store 
always
results in 401 - Unauthorized message. I have tried to get a new token by 
contacting keystone
and used that token for making Swift calls. But no luck. Please note that all
other services like nova list, cinder list work which means Keystone is setup
correctly. But Swift service fails. Only step I did not execute is to install
nss db as I ran into package dependency issues. But I have commented that flag
in ceph.conf . My ceph.conf looks like this below. 
 
[global]
fsid = b35e8496-e809-416a-bd66-aba761d78fac
mon_initial_members = node1
mon_host = 192.0.2.211
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
 
[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring
 
[client.radosgw.gateway]
rgw keystone url = http://192.0.8.2:5000
rgw keystone admin token = 9c2ef11a69044defb9dbfa0f8ab73d86
rgw keystone accepted roles = admin, Member, swiftoperator
rgw keystone token cache size = 100
rgw keystone revocation interval = 600
rgw s3 auth use keystone = false
#nss db path = /var/ceph/nss
host = gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path =
/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = gateway


Output of Swift list
root@overcloud-controller0-fjvtpqjip2hl:~#
swift --debug -V 2.0 -A http://192.0.8.2:5000/v2.0 -U ceph:cephUser -K
"ceph123" list

DEBUG:keystoneclient.session:REQ:
curl -i -X POST http://192.0.8.2:5000/v2.0/tokens -H "Content-Type:
application/json" -H "Accept: application/json" -H
"User-Agent: python-keystoneclient" -d '{"auth":
{"tenantName": "ceph", "passwordCredentials":
{"username": "cephUser", "password":
"ceph123"}}}'
INFO:requests.packages.urllib3.connectionpool:Starting
new HTTP connection (1): 192.0.8.2
DEBUG:requests.packages.urllib3.connectionpool:"POST
/v2.0/tokens HTTP/1.1" 200 3910
DEBUG:keystoneclient.session:RESP:
[200] {'date': 'Tue, 07 Oct 2014 20:05:20 GMT', 'content-type':
'application/json', 'content-length': '3910', 'vary': 'X-Auth-Token'}
RESP
BODY: {"access": {"token": {"issued_at": "2014-10-07T20:05:20.480562",
"expires": "2014-10-08T00:05:20Z", "id":
"45e14981c41f4c8c8055849b39bd4c23", "tenant":
{"description": "", "enabled": true,
"id": "bad9e2232b304f89acb03436635b80cc", "name":
"ceph"}}, "serviceCatalog": [{"endpoints":
[{"adminURL":
"http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc";,
"id": "40e53124619d479ab0c34a99c7619bcc",
"publicURL": "http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc"}],
"endpoints_links": [], "type": "compute",
"name": "nova"}, {"endpoints":
[{"adminURL": "http://192.0.8.2:9696/";, "region":
"regionOne", "internalURL":
"http://192.0.8.2:9696/";, "id":
"4e5fb12504024554a762b46391b46309", "publicURL":
"http://192.0.8.2:9696/"}], "endpoints_links": [],
"type": "network", "name": "neutron"},
{"endpoints": [{"adminURL":
"http://192.0.8.2:8774/v3";, "region":
"regionOne", "internalURL":
"http://192.0.8.2:8774/v3";, "id":
"4e9f7514c3d94bd4b505207cfa52c306", "publicURL":
"http://192.0.8.2:8774/v3"}], "endpoints_links": [],
"type": "computev3", "name": "nova"},
{"endpoints": [{"adminURL": "http://192.0.8.2:9292/";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:9292/";, "id":
"3305668e44fc43f4bb57b45aa599d454", "publicURL":
"http://192.0.8.2:9292/"}], "endpoints_links": [],
"type": "image", "name": "glance"},
{"endpoints": [{"adminURL": "http://192.0.8.2:21131/v1";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:21131/v1";, "id": "7b4ac2efaeba4074988e397bee403caa",
"publicURL": "http://192.0.8.2:21131/v1"}],
"endpoints_links": [], "type": "hp-catalog",
"name": "sherpa"}, {"endpoints":
[{"adminURL": "http://192.0.8.2:8777/";, "region":
"regionOne", "internalURL":
"http://192.0.8.2:8777/";, "id": "2f1de9c2e81049e99cd4da266931780b",
"publicURL": "http://192.0.8.2:8777/"}],
"endpoints_links": [], "type": "metering",
"name": "ceilometer"}, {"endpoints":
[{"adminURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc";,
"id": "0bbc1c8d91574c2083b6b28b237c7004",
"publicURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc"}],
"endpoints_links": [], "type": "volume",
"name": "cinder"}, {"endpoints":
[{"adminURL": "http://192.0.8.2:8773/services/Admin";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:8773/services/Cloud";, "id":
"b15e7b43c7a44831a036f6f01479a6b1", "publicURL":
"http://192.0.8.2:8773/services/Cloud"}],
"endpoints_links": [], "type": "ec2",
"name": "ec2"}, {"endpoints":

[ceph-users] rbd and libceph kernel api

2014-10-07 Thread Shawn Edwards
Are there any docs on what is possible by writing/reading from the rbd
driver's sysfs paths?  Is it documented anywhere?

I've seen at least one blog post:
http://www.sebastien-han.fr/blog/2012/06/24/use-rbd-on-a-client/ about how
you can attach to an rbd using the sysfs interface, but I haven't found
much else.

A broader question: what is possible as far as communicating with ceph
using just the libceph and rbd kernel drivers?

-- 
 Shawn Edwards
 Beware programmers with screwdrivers.  They tend to spill them on their
keyboards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] max_bucket limit -- safe to disable?

2014-10-07 Thread Yehuda Sadeh
This operation stalled quite a bit, seems that it was waiting for the osd:

2.547155 7f036ffc7700  1 -- 10.102.4.11:0/1009401 -->
10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711
.bucket.meta.:default.78418684.122043 [call
version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0
0x7f026802e2c0 con 0x7f040c055ca0
...
7.619750 7f041ddf4700  1 -- 10.102.4.11:0/1009401 <== osd.32
10.102.4.14:6809/7428 208252  osd_op_reply(27514711
.bucket.meta.:default.78418684.122043
[call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6  338+0+336
(3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0

By looking at these logs it seems that there are only 8 pgs on the
.rgw pool, if this is correct then you may want to change that
considering your workload.

Yehuda


On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller
 wrote:
> Hi!
>
> Sorry, I must have missed the enabling of that debug module.
> However, the test setup has been the same all the time -
> I only have the one test-application :)
>
> But maybe I phrased it a bit ambiguously when I wrote
>
>> It then continuously created containers - both empty
>> and such with 10 objects of around 100k random data in them.
>
> 100 kilobytes is the size of a single object, of which we create 10
> per container. The container gets created first, without any
> objects, naturally, then 10 objects are added. One of these objects
> is called “version”, the rest have generated names with a fixed
> prefix and appended 1-9. The version object is the one I picked
> for the example logs I sent earlier.
>
> I hope this makes the setup clearer.
>
> Attached you will find the (now more extensive) logs for the outliers
> again. As you did not say that I garbled the logs, I assume the
> pre-processing was OK, so I have prepared the new data in a similar
> fashion, marking the relevant request with .
>
> I have not removed any lines in between the beginning of the
> “interesting” request and its completion to keep all the network
> traffic log intact. Due to the increased verbosity, I will not post
> the logs inline, but only attach them gzipped.
>
> As before, should the full data set be needed, I can provide
> an archived version.
>
>
>
>
> Thanks for your support!
> Daniel
>
>
>
>
>> On 07 Oct 2014, at 22:45, Yehuda Sadeh  wrote:
>>
>> The logs here don't include the messenger (debug ms = 1). It's hard to
>> tell what going on from looking at the outliers. Also, in your
>> previous mail you described a different benchmark, you tested writing
>> large number of objects into a single bucket, whereas in this test
>> you're testing multiple bucket creations, which have a completely
>> different characteristics.
>>
>>
>> On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller
>>  wrote:
>>> Hi!
>>>
>>> I have re-run our test as follows:
>>>
>>> * 4 Rados Gateways, on 4 baremetal machines which have
>>>  a total of 48 spinning rust OSDs.
>>>
>>> * Benchmark run on a virtual machine talking to HAProxy
>>>  which balances the requests across the 4 Rados GWs.
>>>
>>> * Three instances of the benchmark run in parallel. Each
>>>  instance creates 1000 containers, puts 11 objects into
>>>  each container. Once they have all been created, each
>>>  instances deletes its own containers again.
>>>
>>> I configured one of the radosgws with the debug levels
>>> you requested. The tests produced quite an amount of data
>>> (approx. 1GB of text), so I took the liberty to
>>> pre-process that a bit.
>>>
>>> In this run we landed at around 1.2s per container
>>> created (including the objects in them) on average.
>>> This was slightly better than the 1.6s we witnessed
>>> before, but that test ran for much longer, so there might
>>> have been some queue-up effect.
>>>
>>> Interestingly enough the average is actually somewhat
>>> misleading. The logs below show the creation of one
>>> object in a container each, one being the fastest of this
>>> benchmark (at least on the debug-enabled radosgw), one
>>> being the slowest.
>>>
>>> The fastest one was completed in 0.013s, the longest one
>>> took 4.93s(!).
>>>
>>> The attached logs are cleaned up so that they each show
>>> just a single request and replaced longish, but constant
>>> information with placeholders. Our container names are
>>> of the form “stresstest-xxx” which I shortened
>>> to “” for brevity. Also, I removed the redundant
>>> prefix (date, hour, minute of day).
>>>
>>> The column before the log level looked like a thread-id.
>>> As I focused on a single request, I removed all the lines
>>> that did not match the same id, replacing the actual value
>>> with “”. That makes the logs much easier to read and
>>> understand.
>>>
>>> Just in case I might have removed too much information
>>> for the logs to be useful, the complete log is available
>>> in BZIP2 compressed form for download. Just let me know
>>> if you need it, then I will provide a link via direct email.
>>>
>>> To me it seems like there might indeed be a contentio

Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Christian Balzer
On Tue, 07 Oct 2014 20:40:31 + Scott Laird wrote:

> I've done this two ways in the past.  Either I'll give each machine an
> Infiniband network link and a 1000baseT link and use the Infiniband one
> as the private network for Ceph, or I'll throw an Infiniband card into a
> PC and run something like Vyatta/VyOS on it and make it a router, so IP
> traffic can get out of the IB network.  Of course, those have both been
> for test labs.  YMMV.
> 

That. 

Of course in a production environment you would want something with 2
routers in a failover configuration.
And there are switches/gateways that combine IB and Ethernet, but they
tend to be not so cheap. ^^

More below.

> On Tue Oct 07 2014 at 11:05:23 AM Massimiliano Cuttini
>  wrote:
> 
> >  Hi Christian,
> >
> >  When you say "10 gig infiniband", do you mean QDRx4 Infiniband
> > (usually flogged as 40Gb/s even though it is 32Gb/s, but who's
> > counting), which tends to be the same basic hardware as the 10Gb/s
> > Ethernet offerings from Mellanox?
> >
> > A brand new 18 port switch of that caliber will only cost about 180$
> > per port, too.
> >
> >
> >
> > I investigate about infiniband but i didn't found affordable prices at
> > all.

Then you're doing it wrong or comparing apples to oranges (you of course
need to compare IB switches to similar 10GbE ones).
And the prices of HCA (aka network cards in the servers) and cabling.

> > Moreover how do you connect your *l**egacy node servers* to your
> > *brand new storages* if you have Infiniband only on storages &
> > switches? Is there any mixed switch that allow you both to connect
> > with Infiniband and Ethernet?
> >
> > If there is, please send specs because i cannot find just by google it.
> >
The moment you type in "infiniband et" google will already predict amongst
other pertinent things "infiniband ethernet gateway" and "infiniband
ethernet bridge". 
But even "infiniband ethernet switch" has a link telling you pretty much
what was said here now at the 6th position:
http://www.tomshardware.com/forum/44997-42-connect-infiniband-switch-ethernet

Christian
> > Thanks,
> > Max
> >
> >  ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Scott Laird
IIRC, one thing to look out for is that there are two ways to do IP over
Infiniband.  You can either do IP over Infiniband directly (IPoIB), or
encapsulate Ethernet in Infiniband (EoIB), and then do IP over the fake
Ethernet network.

IPoIB is more common, but I'd assume that IB<->Ethernet bridges really only
bridge EoIB.

On Tue Oct 07 2014 at 5:34:57 PM Christian Balzer  wrote:

> On Tue, 07 Oct 2014 20:40:31 + Scott Laird wrote:
>
> > I've done this two ways in the past.  Either I'll give each machine an
> > Infiniband network link and a 1000baseT link and use the Infiniband one
> > as the private network for Ceph, or I'll throw an Infiniband card into a
> > PC and run something like Vyatta/VyOS on it and make it a router, so IP
> > traffic can get out of the IB network.  Of course, those have both been
> > for test labs.  YMMV.
> >
>
> That.
>
> Of course in a production environment you would want something with 2
> routers in a failover configuration.
> And there are switches/gateways that combine IB and Ethernet, but they
> tend to be not so cheap. ^^
>
> More below.
>
> > On Tue Oct 07 2014 at 11:05:23 AM Massimiliano Cuttini
> >  wrote:
> >
> > >  Hi Christian,
> > >
> > >  When you say "10 gig infiniband", do you mean QDRx4 Infiniband
> > > (usually flogged as 40Gb/s even though it is 32Gb/s, but who's
> > > counting), which tends to be the same basic hardware as the 10Gb/s
> > > Ethernet offerings from Mellanox?
> > >
> > > A brand new 18 port switch of that caliber will only cost about 180$
> > > per port, too.
> > >
> > >
> > >
> > > I investigate about infiniband but i didn't found affordable prices at
> > > all.
>
> Then you're doing it wrong or comparing apples to oranges (you of course
> need to compare IB switches to similar 10GbE ones).
> And the prices of HCA (aka network cards in the servers) and cabling.
>
> > > Moreover how do you connect your *l**egacy node servers* to your
> > > *brand new storages* if you have Infiniband only on storages &
> > > switches? Is there any mixed switch that allow you both to connect
> > > with Infiniband and Ethernet?
> > >
> > > If there is, please send specs because i cannot find just by google it.
> > >
> The moment you type in "infiniband et" google will already predict amongst
> other pertinent things "infiniband ethernet gateway" and "infiniband
> ethernet bridge".
> But even "infiniband ethernet switch" has a link telling you pretty much
> what was said here now at the 6th position:
> http://www.tomshardware.com/forum/44997-42-connect-
> infiniband-switch-ethernet
>
> Christian
> > > Thanks,
> > > Max
> > >
> > >  ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Basic Ceph questions

2014-10-07 Thread Marcus White
Hello,
Some basic Ceph questions, would appreciate your help:) Sorry about
the number and detail in advance!

a. Ceph RADOS is strongly consistent and different from usual object,
does that mean all metadata also, container and account etc is all
consistent and everything is updated in the path of the client
operation itself, for a single site?


b. If it is strongly consistent, is that the case across sites also?
How can it be performant across geo sites if that is the case? If its
choosing consistency over partitioning and availability...For object,
I read somewhere that it is now eventually consistent(local CP,
remotely AP) via DR. Gets a bit confusing with all the literature out
there. If it is DR, isnt that slightly different from the Swift case?

c. For block, is it CP on a single site and then usual DR to another
site using snapshotting?

d. For block, is it just a linux block device or is it SCSI? Is it a
custom device driver running within Linux which hooks into the block
layer? Trying to understand the layering diagram.

e. Do the snapshot, compression features come from the underlying file system?

f. What is the plan for deduplication? If that comes from the local
file system, how would it deduplicate across nodes to achieve the best
dedup ratio?


TIA,
MW
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] max_bucket limit -- safe to disable?

2014-10-07 Thread Daniel Schneller
Hi!

> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.


Thanks. See out pg_num configuration below. We had already suspected
that the 1600 that we had previously (48 OSDs * 100 / triple redundancy)
were not ideal, so we increased the .rgw.buckets pool to 2048.

The number of objects and their size was in an earlier email, but for
completeness I will put them up once again. 

Any other ideas where to look?

==
for i in $(rados df | awk '{ print $1 }' | grep '^\.'); do
   echo $i; echo -n " - “; 
   ceph osd pool get $i pg_num; 
   echo -n " - “; 
   ceph osd pool get $i pgp_num;
done

.intent-log
 - pg_num: 1600
 - pgp_num: 1600
.log
 - pg_num: 1600
 - pgp_num: 1600
.rgw
 - pg_num: 1600
 - pgp_num: 1600
.rgw.buckets
 - pg_num: 2048
 - pgp_num: 2048
.rgw.buckets.index
 - pg_num: 1600
 - pgp_num: 1600
.rgw.control
 - pg_num: 1600
 - pgp_num: 1600
.rgw.gc
 - pg_num: 1600
 - pgp_num: 1600
.rgw.root
 - pg_num: 100
 - pgp_num: 100
.usage
 - pg_num: 1600
 - pgp_num: 1600
.users
 - pg_num: 1600
 - pgp_num: 1600
.users.email
 - pg_num: 1600
 - pgp_num: 1600
.users.swift
 - pg_num: 1600
 - pgp_num: 1600
.users.uid
 - pg_num: 1600
 - pgp_num: 1600
===


> .rgw
> =
> KB: 1,966,932
> objects:9,094,552
> rd:   195,747,645
>  rd KB:   153,585,472
> wr:30,191,844
>  wr KB:10,751,065
> 
> .rgw.buckets
> =
> KB: 2,038,313,855
> objects:   22,088,103
> rd: 5,455,123
>  rd KB:   408,416,317
> wr:   149,377,728
>  wr KB: 1,882,517,472
> 
> .rgw.buckets.index
> =
> KB: 0
> objects:5,374,376
> rd:   267,996,778
>  rd KB:   262,626,106
> wr:   107,142,891
>  wr KB: 0
> 
> .rgw.control
> =
> KB: 0
> objects:8
> rd: 0
>  rd KB: 0
> wr: 0
>  wr KB: 0
> 
> .rgw.gc
> =
> KB: 0
> objects:   32
> rd: 5,554,407
>  rd KB: 5,713,942
> wr: 8,355,934
>  wr KB: 0
> 
> .rgw.root
> =
> KB: 1
> objects:3
> rd:   524
>  rd KB:   346
> wr: 3
>  wr KB: 3


Daniel

> On 08 Oct 2014, at 01:03, Yehuda Sadeh  wrote:
> 
> This operation stalled quite a bit, seems that it was waiting for the osd:
> 
> 2.547155 7f036ffc7700  1 -- 10.102.4.11:0/1009401 -->
> 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711
> .bucket.meta.:default.78418684.122043 [call
> version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0
> 0x7f026802e2c0 con 0x7f040c055ca0
> ...
> 7.619750 7f041ddf4700  1 -- 10.102.4.11:0/1009401 <== osd.32
> 10.102.4.14:6809/7428 208252  osd_op_reply(27514711
> .bucket.meta.:default.78418684.122043
> [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6  338+0+336
> (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0
> 
> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.
> 
> Yehuda
> 
> 
> On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller
>  wrote:
>> Hi!
>> 
>> Sorry, I must have missed the enabling of that debug module.
>> However, the test setup has been the same all the time -
>> I only have the one test-application :)
>> 
>> But maybe I phrased it a bit ambiguously when I wrote
>> 
>>> It then continuously created containers - both empty
>>> and such with 10 objects of around 100k random data in them.
>> 
>> 100 kilobytes is the size of a single object, of which we create 10
>> per container. The container gets created first, without any
>> objects, naturally, then 10 objects are added. One of these objects
>> is called “version”, the rest have generated names with a fixed
>> prefix and appended 1-9. The version object is the one I picked
>> for the example logs I sent earlier.
>> 
>> I hope this makes the setup clearer.
>> 
>> Attached you will find the (now more extensive) logs for the outliers
>> again. As you did not say that I garbled the logs, I assume the
>> pre-processing was OK, so I have prepared the new data in a similar
>> fashion, marking the relevant request with .
>> 
>> I have not removed any lines in between the beginning of the
>> “interesting” request and its completion to keep all the network
>> traffic log intact. Due to the increased verbosity, I will not post
>> the logs inline, but only attach them gzipped.
>> 
>> As before, should the full data set be needed, I can provide
>> an archived version.
>> 
>> 
>> 
>> 
>> Thanks for your support!
>> Daniel
>> 
>> 
>> 
>> 
>>> On 07 Oct 2014, at 22:45, Yehuda Sadeh  wrote:
>>> 
>>> The logs here don't include the messenger (debug ms = 1). It's hard to
>>> tell what goin

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-07 Thread Mark Kirkwood

On 08/10/14 11:02, lakshmi k s wrote:

I am trying to integrate OpenStack Keystone with Ceph Object Store using
the link - http://ceph.com/docs/master/radosgw/keystone.
 Swift V1.0 (without
keystone) works quite fine. But for some reason, Swift v2.0 keystone
calls to Ceph Object Store always results in 401 - Unauthorized message.
I have tried to get a new token by contacting keystone and used that
token for making Swift calls. But no luck. Please note that all other
services like nova list, cinder list work which means Keystone is setup
correctly. But Swift service fails. Only step I did not execute is to
install nss db as I ran into package dependency issues. But I have
commented that flag in ceph.conf . My ceph.conf looks like this below.
[global]
fsid = b35e8496-e809-416a-bd66-aba761d78fac
mon_initial_members = node1
mon_host = 192.0.2.211
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring
[client.radosgw.gateway]
rgw keystone url = http://192.0.8.2:5000
rgw keystone admin token = 9c2ef11a69044defb9dbfa0f8ab73d86
rgw keystone accepted roles = admin, Member, swiftoperator
rgw keystone token cache size = 100
rgw keystone revocation interval = 600
rgw s3 auth use keystone = false
#nss db path = /var/ceph/nss
host = gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = gateway


*Output of Swift list*
root@overcloud-controller0-fjvtpqjip2hl:~# swift --debug -V 2.0 -A
http://192.0.8.2:5000/v2.0 -U ceph:cephUser -K "ceph123" list

DEBUG:keystoneclient.session:REQ: curl -i -X POST
http://192.0.8.2:5000/v2.0/tokens -H "Content-Type: application/json" -H
"Accept: application/json" -H "User-Agent: python-keystoneclient" -d
'{"auth": {"tenantName": "ceph", "passwordCredentials": {"username":
"cephUser", "password": "ceph123"}}}'
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP
connection (1): 192.0.8.2
DEBUG:requests.packages.urllib3.connectionpool:"POST /v2.0/tokens
HTTP/1.1" 200 3910
DEBUG:keystoneclient.session:RESP: [200] {'date': 'Tue, 07 Oct 2014
20:05:20 GMT', 'content-type': 'application/json', 'content-length':
'3910', 'vary': 'X-Auth-Token'}
RESP BODY: {"access": {"token": {"issued_at":
"2014-10-07T20:05:20.480562", "expires": "2014-10-08T00:05:20Z", "id":
"45e14981c41f4c8c8055849b39bd4c23", "tenant": {"description": "",
"enabled": true, "id": "bad9e2232b304f89acb03436635b80cc", "name":
"ceph"}}, "serviceCatalog": [{"endpoints": [{"adminURL":
"http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc";, "region":
"regionOne", "internalURL":
"http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc";, "id":
"40e53124619d479ab0c34a99c7619bcc", "publicURL":
"http://192.0.8.2:8774/v2/bad9e2232b304f89acb03436635b80cc"}],
"endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints":
[{"adminURL": "http://192.0.8.2:9696/";, "region": "regionOne",
"internalURL": "http://192.0.8.2:9696/";, "id":
"4e5fb12504024554a762b46391b46309", "publicURL":
"http://192.0.8.2:9696/"}], "endpoints_links": [], "type": "network",
"name": "neutron"}, {"endpoints": [{"adminURL":
"http://192.0.8.2:8774/v3";, "region": "regionOne", "internalURL":
"http://192.0.8.2:8774/v3";, "id": "4e9f7514c3d94bd4b505207cfa52c306",
"publicURL": "http://192.0.8.2:8774/v3"}], "endpoints_links": [],
"type": "computev3", "name": "nova"}, {"endpoints": [{"adminURL":
"http://192.0.8.2:9292/";, "region": "regionOne", "internalURL":
"http://192.0.8.2:9292/";, "id": "3305668e44fc43f4bb57b45aa599d454",
"publicURL": "http://192.0.8.2:9292/"}], "endpoints_links": [], "type":
"image", "name": "glance"}, {"endpoints": [{"adminURL":
"http://192.0.8.2:21131/v1";, "region": "regionOne", "internalURL":
"http://192.0.8.2:21131/v1";, "id": "7b4ac2efaeba4074988e397bee403caa",
"publicURL": "http://192.0.8.2:21131/v1"}], "endpoints_links": [],
"type": "hp-catalog", "name": "sherpa"}, {"endpoints": [{"adminURL":
"http://192.0.8.2:8777/";, "region": "regionOne", "internalURL":
"http://192.0.8.2:8777/";, "id": "2f1de9c2e81049e99cd4da266931780b",
"publicURL": "http://192.0.8.2:8777/"}], "endpoints_links": [], "type":
"metering", "name": "ceilometer"}, {"endpoints": [{"adminURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc";, "region":
"regionOne", "internalURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc";, "id":
"0bbc1c8d91574c2083b6b28b237c7004", "publicURL":
"http://192.0.8.2:8776/v1/bad9e2232b304f89acb03436635b80cc"}],
"endpoints_links": [], "type": "volume", "name": "cinder"},
{"endpoints": [{"adminURL": "http://192.0.8.2:8773/services/Admin";,
"region": "regionOne", "internalURL":
"http://192.0.8.2:8773/services/Cloud";, "id":
"b15e7b43c7a44831a036f6f01479a6b1", "publicURL":
"http://192.0.8.2:8773/services/

[ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-07 Thread Mark Kirkwood
I have a recent ceph (0.85-1109-g73d7be0) configured to use keystone for 
authentication:


$ cat ceph.conf
...
[client.radosgw.gateway]
host = ceph4
keyring = /etc/ceph/ceph.rados.gateway.keyring
rgw_socket_path = /var/run/ceph/$name.sock
log_file = /var/log/ceph/radosgw.log
rgw_data = /var/lib/ceph/radosgw/$cluster-$id
rgw_dns_name = ceph4
rgw print continue = false
debug rgw = 20
rgw keystone url = http://stack1:35357
rgw keystone admin token = tokentoken
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss/

So ceph4 is the rgw and stack1 is a devstack setup with keystone 
endpoints for S3 and Swift pointing to the ceph4 host:


$ keystone endpoint-list
...
| b884053b2c6f4217ad643c25c001217b | RegionOne | 
http://ceph4 |  http://ceph4 
 |  http://ceph4 | 
be62ab8531d143a7bce5ae6020d13918 |
| d7a8338dd5684f5d8dfde406b0780462 | RegionOne | 
http://ceph4/swift/v1/| http://ceph4/swift/v1/ 
  | http://ceph4/swift/v1/| 
c2d4550d71e94a6a966af810c9ad0568 |


When I create some buckets and keys using the S3 api (Boto) then I can 
list them and their contents (see attached)


demo-bucket02014-10-08T05:02:03.000Z
hello.txt   12  2014-10-08T05:02:06.000Z

When I try a similar thing via swift:
$ swift upload container0 file
Object PUT failed: http://ceph4/swift/v1/container0/local.conf 404 Not 
Found   NoSuchBucket


Hmm - using swift to list containers shows:

$ swift list
/container0
demo-bucket0

So a new bucket has been created, but note a leading '/' has been added 
to the name. Now retrying my simple s3 list gets:


/container0 2014-10-08T05:02:19.000Z
Traceback (most recent call last):
  File "./s3-test-ls.py", line 24, in 
for key in bucket.list():
  File 
"/usr/lib/python2.7/dist-packages/boto/s3/bucketlistresultset.py", line 
30, in bucket_lister

delimiter=delimiter, headers=headers)
  File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 392, 
in get_all_keys

'', headers, **params)
  File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 343, 
in _get_all

response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
encoding="UTF-8"?>NoSuchBucket



I'm guessing the leading '/' is the culprit.

The rgw logs (below) seems to show that the leadoing '/' is stripped off 
and then the bucket cannot be opened or listed - as it does not exist:


2014-10-08 18:39:24.764328 7f195bfd7700  1 == starting new request 
req=0x1284270 =
2014-10-08 18:39:24.764337 7f195bfd7700  2 req 17:0.10::GET 
/container0/::initializing

2014-10-08 18:39:24.764340 7f195bfd7700 10 host=ceph4 rgw_dns_name=ceph4
2014-10-08 18:39:24.764361 7f195bfd7700 10 s->object= 
s->bucket=container0
2014-10-08 18:39:24.764366 7f195bfd7700  2 req 17:0.38:s3:GET 
/container0/::getting op
2014-10-08 18:39:24.764369 7f195bfd7700  2 req 17:0.42:s3:GET 
/container0/:list_bucket:authorizing

2014-10-08 18:39:24.764372 7f195bfd7700 20 s3 keystone: trying keystone auth
2014-10-08 18:39:24.764390 7f195bfd7700 10 get_canon_resource(): 
dest=/container0/
2014-10-08 18:39:24.764420 7f195bfd7700 20 sending request to 
http://stack1:35357/v2.0/s3tokens
2014-10-08 18:39:24.835591 7f195bfd7700  5 s3 keystone: validated token: 
demo:demo expires: 1412750365
2014-10-08 18:39:24.835671 7f195bfd7700 20 get_obj_state: rctx=0x1285820 
obj=.users.uid:f535ae4f66654326807c556acff2697e state=0x12c3348 
s->prefetch_data=0
2014-10-08 18:39:24.835686 7f195bfd7700 10 cache get: 
name=.users.uid+f535ae4f66654326807c556acff2697e : hit
2014-10-08 18:39:24.835694 7f195bfd7700 20 get_obj_state: s->obj_tag was 
set empty
2014-10-08 18:39:24.835700 7f195bfd7700 10 cache get: 
name=.users.uid+f535ae4f66654326807c556acff2697e : hit
2014-10-08 18:39:24.835731 7f195bfd7700  2 req 17:0.071403:s3:GET 
/container0/:list_bucket:reading permissions
2014-10-08 18:39:24.835756 7f195bfd7700 20 get_obj_state: 
rctx=0x7f195bfd61d0 obj=.rgw:container0 state=0x12901c8 s->prefetch_data=0
2014-10-08 18:39:24.835763 7f195bfd7700 10 cache get: 
name=.rgw+container0 : type miss (requested=22, cached=0)

2014-10-08 18:39:24.837125 7f195bfd7700 10 cache put: name=.rgw+container0
2014-10-08 18:39:24.837160 7f195bfd7700 10 moving .rgw+container0 to 
cache LRU end
2014-10-08 18:39:24.837180 7f195bfd7700 10 read_permissions on 
container0(@[]): only_bucket=0 ret=-2002
2014-10-08 18:39:24.837231 7f195bfd7700  2 req 17:0.072903:s3:GET 
/container0/:list_bucket:http status=404
2014-10-08 18:39:24.837239 7f195bfd7700  1 == req done req=0x1284270 
http_status=404 ==

2014-10-08 18:39:24.837253 7f195bfd7700 20 process_request() returned -2002




#!/usr/bin/python

import boto
import boto.s3.connection

access_key = 'redacted'
secret_key

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-07 Thread Mark Kirkwood

On 08/10/14 18:46, Mark Kirkwood wrote:

I have a recent ceph (0.85-1109-g73d7be0) configured to use keystone for
authentication:

$ cat ceph.conf
...
[client.radosgw.gateway]
host = ceph4
keyring = /etc/ceph/ceph.rados.gateway.keyring
rgw_socket_path = /var/run/ceph/$name.sock
log_file = /var/log/ceph/radosgw.log
rgw_data = /var/lib/ceph/radosgw/$cluster-$id
rgw_dns_name = ceph4
rgw print continue = false
debug rgw = 20
rgw keystone url = http://stack1:35357
rgw keystone admin token = tokentoken
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss/

So ceph4 is the rgw and stack1 is a devstack setup with keystone
endpoints for S3 and Swift pointing to the ceph4 host:

$ keystone endpoint-list
...
| b884053b2c6f4217ad643c25c001217b | RegionOne |
http://ceph4 |  http://ceph4
|  http://ceph4 |
be62ab8531d143a7bce5ae6020d13918 |
| d7a8338dd5684f5d8dfde406b0780462 | RegionOne |
http://ceph4/swift/v1/| http://ceph4/swift/v1/
   | http://ceph4/swift/v1/|
c2d4550d71e94a6a966af810c9ad0568 |

When I create some buckets and keys using the S3 api (Boto) then I can
list them and their contents (see attached)

demo-bucket02014-10-08T05:02:03.000Z
 hello.txt122014-10-08T05:02:06.000Z

When I try a similar thing via swift:
$ swift upload container0 file
Object PUT failed: http://ceph4/swift/v1/container0/local.conf 404 Not
Found   NoSuchBucket

Hmm - using swift to list containers shows:

$ swift list
/container0
demo-bucket0

So a new bucket has been created, but note a leading '/' has been added
to the name. Now retrying my simple s3 list gets:

/container02014-10-08T05:02:19.000Z
Traceback (most recent call last):
   File "./s3-test-ls.py", line 24, in 
 for key in bucket.list():
   File
"/usr/lib/python2.7/dist-packages/boto/s3/bucketlistresultset.py", line
30, in bucket_lister
 delimiter=delimiter, headers=headers)
   File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 392,
in get_all_keys
 '', headers, **params)
   File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 343,
in _get_all
 response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
NoSuchBucket


I'm guessing the leading '/' is the culprit.




Well it certainly is. I wondered if the issue might be the apache2 
fastcgi modules, so replaced it with the ceph one from
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-trusty-x86_64-basic/ref/master/ 
. No difference.


The light dawned. My swift endpoint urls all have trailing '/' on them, viz:

 | d7a8338dd5684f5d8dfde406b0780462 | RegionOne | 
http://ceph4/swift/v1/| http://ceph4/swift/v1/ 
 | http://ceph4/swift/v1/| 
c2d4550d71e94a6a966af810c9ad0568 |


Changing  to:

| d7a8338dd5684f5d8dfde406b0780462 | RegionOne | http://ceph4/swift/v1 
  | http://ceph4/swift/v1   | 
http://ceph4/swift/v1| c2d4550d71e94a6a966af810c9ad0568 |



...and redoing a swift upload:
$ swift upload container1 stackrc
stackrc
$ swift list
/container0
container1

Excellent, how about listing the container:

$ swift list container1
stackrc

Ok, so looking good now. Checking the current docs 
http://docs.ceph.com/docs/master/radosgw/keystone/ they do *not* have 
the trailing '/'s, so unless I was looking at some older ones that *did* 
have 'em ...I managed to typo the url's myself. Hmm. Unfortunate.


Regards

Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com