Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Stefan Priebe - Profihost AG
Hi Sage,
Am 11.03.2015 um 04:14 schrieb Sage Weil:
> On Wed, 11 Mar 2015, Christian Balzer wrote:
>> On Tue, 10 Mar 2015 12:34:14 -0700 (PDT) Sage Weil wrote:
>>
>>
>>> Adjusting CRUSH maps
>>> 
>>>
>>> * This point release fixes several issues with CRUSH that trigger
>>>   excessive data migration when adjusting OSD weights.  These are most
>>>   obvious when a very small weight change (e.g., a change from 0 to
>>>   .01) triggers a large amount of movement, but the same set of bugs
>>>   can also lead to excessive (though less noticeable) movement in
>>>   other cases.
>>>
>>>   However, because the bug may already have affected your cluster,
>>>   fixing it may trigger movement *back* to the more correct location.
>>>   For this reason, you must manually opt-in to the fixed behavior.
>>>
>> It would be nice to know at what version of Ceph those bugs were
>> introduced.
> 
> This bug has been present in CRUSH since the beginning.

So peaople upgrading from dumplang have todo the same?

1.) They need to set tunables to optimal (to get firefly tunables)
2.) They have to set those options you mention?

Greets,
Stefan


> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen
On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub  wrote:

>>> What kind of application is that?
>> Commercial Email platform from Openwave.com
> 
> Maybe it could be worked around using an apache rewrite rule. In any case, I 
> opened issue #11091.
Okay, how, by rewriting the response?
Thanks, where can tickets be followed/viewed?

Asked my vendor what confuses their App about the reply. Would be nice if they 
could work against Ceph S3 :)

 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a "feature/bug" in radosGW?
>>> 
>>> That's a bug.
>> Ok, any resolution/work-around to this?
>> 
> Not at the moment. There's already issue #6961, I bumped its priority higher, 
> and we'll take a look at it.
Thanks!

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Gabri Mate
Hi,

May I assume this fix will be in Hammer? So can I use this to fix my
cluster after upgrading Giant to Hammer?

Best regards,
Mate

On 12:34 Tue 10 Mar , Sage Weil wrote:
> This is a bugfix release for firefly.  It fixes a performance regression 
> in librbd, an important CRUSH misbehavior (see below), and several RGW 
> bugs.  We have also backported support for flock/fcntl locks to ceph-fuse 
> and libcephfs.
> 
> We recommend that all Firefly users upgrade.
> 
> For more detailed information, see
>   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
> 
> Adjusting CRUSH maps
> 
> 
> * This point release fixes several issues with CRUSH that trigger
>   excessive data migration when adjusting OSD weights.  These are most
>   obvious when a very small weight change (e.g., a change from 0 to
>   .01) triggers a large amount of movement, but the same set of bugs
>   can also lead to excessive (though less noticeable) movement in
>   other cases.
> 
>   However, because the bug may already have affected your cluster,
>   fixing it may trigger movement *back* to the more correct location.
>   For this reason, you must manually opt-in to the fixed behavior.
> 
>   In order to set the new tunable to correct the behavior::
> 
>  ceph osd crush set-tunable straw_calc_version 1
> 
>   Note that this change will have no immediate effect.  However, from
>   this point forward, any 'straw' bucket in your CRUSH map that is
>   adjusted will get non-buggy internal weights, and that transition
>   may trigger some rebalancing.
> 
>   You can estimate how much rebalancing will eventually be necessary
>   on your cluster with::
> 
>  ceph osd getcrushmap -o /tmp/cm
>  crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
>  crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
>  crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
>  crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b 2>&1
>  wc -l /tmp/a  # num total mappings
>  diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings
> 
>Divide the total number of lines in /tmp/a with the number of lines
>changed.  We've found that most clusters are under 10%.
> 
>You can force all of this rebalancing to happen at once with::
> 
>  ceph osd crush reweight-all
> 
>Otherwise, it will happen at some unknown point in the future when
>CRUSH weights are next adjusted.
> 
> Notable Changes
> ---
> 
> * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
> * crush: fix straw bucket weight calculation, add straw_calc_version 
>   tunable (#10095 Sage Weil)
> * crush: fix tree bucket (Rongzu Zhu)
> * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
> * crushtool: add --reweight (Sage Weil)
> * librbd: complete pending operations before losing image (#10299 Jason 
>   Dillaman)
> * librbd: fix read caching performance regression (#9854 Jason Dillaman)
> * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
> * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
> * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
> * osd: handle no-op write with snapshot (#10262 Sage Weil)
> * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
> * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, 
>   Yehuda Sadeh)
> * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 
>   Yehuda Sadeh)
> * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
> * rgw: fix partial swift GET (#10553 Yehuda Sadeh)
> * rgw: fix quota disable (#9907 Dong Lei)
> * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
> * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
> * rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
> * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda 
>   Sadeh)
> * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
> * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
> * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
> * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
> * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
> * rgw: update swift subuser permission masks when authenticating (#9918 
>   Yehuda Sadeh)
> * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, 
>   Yehuda Sadeh)
> * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
> * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)
> 
> Getting Ceph
> 
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
> * For packages, see http://ceph.com/docs/master/install/get-packages
> * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
> ___
> ceph-users mailing 

Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Dan van der Ster
Hi Sage,

On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil  wrote:
> Adjusting CRUSH maps
> 
>
> * This point release fixes several issues with CRUSH that trigger
>   excessive data migration when adjusting OSD weights.  These are most
>   obvious when a very small weight change (e.g., a change from 0 to
>   .01) triggers a large amount of movement, but the same set of bugs
>   can also lead to excessive (though less noticeable) movement in
>   other cases.
>
>   However, because the bug may already have affected your cluster,
>   fixing it may trigger movement *back* to the more correct location.
>   For this reason, you must manually opt-in to the fixed behavior.
>
>   In order to set the new tunable to correct the behavior::
>
>  ceph osd crush set-tunable straw_calc_version 1
>

Since it's not obvious in this case, does setting straw_calc_version =
1 still allow older firefly clients to connect?

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Firefly Tiering

2015-03-11 Thread Stefan Priebe - Profihost AG
Hi,

has anybody successfully tested tiering while using firefly? How much
does it impact performance vs. a normal pool? I mean is there any
difference between a full SSD pool und a tiering SSD pool with SATA Backend?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen

On 11/03/2015, at 08.19, Steffen W Sørensen  wrote:

> On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub  wrote:
> 
 What kind of application is that?
>>> Commercial Email platform from Openwave.com
>> 
>> Maybe it could be worked around using an apache rewrite rule. In any case, I 
>> opened issue #11091.
> Okay, how, by rewriting the response?
> Thanks, where can tickets be followed/viewed?
> 
> Asked my vendor what confuses their App about the reply. Would be nice if 
> they could work against Ceph S3 :)
> 
> 2. at every create bucket OP the GW create what looks like new containers
> for ACLs in .rgw pool, is this normal
> or howto avoid such multiple objects clottering the GW pools?
> Is there something wrong since I get multiple ACL object for this bucket
> everytime my App tries to recreate same bucket or
> is this a "feature/bug" in radosGW?
 
 That's a bug.
>>> Ok, any resolution/work-around to this?
>>> 
>> Not at the moment. There's already issue #6961, I bumped its priority 
>> higher, and we'll take a look at it.
> Thanks!
BTW running Giant:

[root@rgw ~]# rpm -qa| grep -i ceph
httpd-tools-2.2.22-1.ceph.el6.x86_64
ceph-common-0.87.1-0.el6.x86_64
mod_fastcgi-2.4.7-1.ceph.el6.x86_64
libcephfs1-0.87.1-0.el6.x86_64
xfsprogs-3.1.1-14_ceph.el6.x86_64
ceph-radosgw-0.87.1-0.el6.x86_64
httpd-2.2.22-1.ceph.el6.x86_64
python-ceph-0.87.1-0.el6.x86_64
ceph-0.87.1-0.el6.x86_64

[root@rgw ~]# uname -a
Linux rgw.sprawl.dk 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

[root@rgw ~]# cat /etc/redhat-release 
CentOS release 6.6 (Final)



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Valery Tschopp

Where can I find the debian trusty source package for v0.80.9?

Cheers,
Valery

On 10/03/15 20:34 , Sage Weil wrote:

This is a bugfix release for firefly.  It fixes a performance regression
in librbd, an important CRUSH misbehavior (see below), and several RGW
bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see
   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

Adjusting CRUSH maps


* This point release fixes several issues with CRUSH that trigger
   excessive data migration when adjusting OSD weights.  These are most
   obvious when a very small weight change (e.g., a change from 0 to
   .01) triggers a large amount of movement, but the same set of bugs
   can also lead to excessive (though less noticeable) movement in
   other cases.

   However, because the bug may already have affected your cluster,
   fixing it may trigger movement *back* to the more correct location.
   For this reason, you must manually opt-in to the fixed behavior.

   In order to set the new tunable to correct the behavior::

  ceph osd crush set-tunable straw_calc_version 1

   Note that this change will have no immediate effect.  However, from
   this point forward, any 'straw' bucket in your CRUSH map that is
   adjusted will get non-buggy internal weights, and that transition
   may trigger some rebalancing.

   You can estimate how much rebalancing will eventually be necessary
   on your cluster with::

  ceph osd getcrushmap -o /tmp/cm
  crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
  crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
  crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
  crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b 2>&1
  wc -l /tmp/a  # num total mappings
  diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

Divide the total number of lines in /tmp/a with the number of lines
changed.  We've found that most clusters are under 10%.

You can force all of this rebalancing to happen at once with::

  ceph osd crush reweight-all

Otherwise, it will happen at some unknown point in the future when
CRUSH weights are next adjusted.

Notable Changes
---

* ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
* crush: fix straw bucket weight calculation, add straw_calc_version
   tunable (#10095 Sage Weil)
* crush: fix tree bucket (Rongzu Zhu)
* crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
* crushtool: add --reweight (Sage Weil)
* librbd: complete pending operations before losing image (#10299 Jason
   Dillaman)
* librbd: fix read caching performance regression (#9854 Jason Dillaman)
* librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
* mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
* osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
* osd: handle no-op write with snapshot (#10262 Sage Weil)
* radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
* rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
   Yehuda Sadeh)
* rgw: don't overwrite bucket/object owner when setting ACLs (#10978
   Yehuda Sadeh)
* rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
* rgw: fix partial swift GET (#10553 Yehuda Sadeh)
* rgw: fix quota disable (#9907 Dong Lei)
* rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
* rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
* rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
* rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
   Sadeh)
* rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
* rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
* rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
* rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
* rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
* rgw: update swift subuser permission masks when authenticating (#9918
   Yehuda Sadeh)
* rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
   Yehuda Sadeh)
* rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
* rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
SWITCH
--
Valery Tschopp, Software Engineer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: valery.t

Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Nick Fisk
Hi Stefan,

If the majority of your hot data fits on the cache tier you will see quite a
marked improvement in read performance and similar write performance
(assuming you would have had your hdds backed by SSD journals).

However for data that is not in the cache tier you will get 10-20% less read
performance and anything up to 10x less write performance. This is because a
cache write miss has to read the entire object from the backing store into
the cache and then modify it.

The read performance degradation will probably be fixed in Hammer with proxy
reads, but writes will most likely still be an issue.

Nick


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Stefan Priebe - Profihost AG
> Sent: 11 March 2015 07:27
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Firefly Tiering
> 
> Hi,
> 
> has anybody successfully tested tiering while using firefly? How much does
it
> impact performance vs. a normal pool? I mean is there any difference
> between a full SSD pool und a tiering SSD pool with SATA Backend?
> 
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Stefan Priebe - Profihost AG
Hi Nick,

Am 11.03.2015 um 10:52 schrieb Nick Fisk:
> Hi Stefan,
> 
> If the majority of your hot data fits on the cache tier you will see quite a
> marked improvement in read performance
I don't have writes ;-) just around 5%. 95% are writes.

> and similar write performance
> (assuming you would have had your hdds backed by SSD journals).

similar write performance of SSD cache tier or HDD "backend" tier?

I'm mainly interested in a writeback mode.

> However for data that is not in the cache tier you will get 10-20% less read
> performance and anything up to 10x less write performance. This is because a
> cache write miss has to read the entire object from the backing store into
> the cache and then modify it.
> 
> The read performance degradation will probably be fixed in Hammer with proxy
> reads, but writes will most likely still be an issue.

Why is writing to the HOT part so slow?

Stefan

> Nick
> 
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Stefan Priebe - Profihost AG
>> Sent: 11 March 2015 07:27
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] Firefly Tiering
>>
>> Hi,
>>
>> has anybody successfully tested tiering while using firefly? How much does
> it
>> impact performance vs. a normal pool? I mean is there any difference
>> between a full SSD pool und a tiering SSD pool with SATA Backend?
>>
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph days

2015-03-11 Thread Karan Singh
Check out ceph youtube page.

- Karan -

> On 11 Mar 2015, at 00:45, Tom Deneau  wrote:
> 
> Are the slides or videos from ceph days presentations made available
> somewhere?  I noticed some links in the Frankfurt Ceph day, but not for the
> other Ceph Days.
> 
> -- Tom
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 => pipe.fault, server, going to standby

2015-03-11 Thread LOPEZ Jean-Charles
Hi Florent

What are the « rules » for stripe_unit & object_size ? -> stripe_unit * 
stripe_count = object_size

So in your case set stripe_unit = 2

JC


> On 11 Mar 2015, at 19:59, Florent B  wrote:
> 
> Hi all,
> 
> I'm testing CephFS with Giant and I have a problem when I set these attrs :
> 
> setfattr -n ceph.dir.layout.stripe_unit -v "65536" pool_cephfs01/
> setfattr -n ceph.dir.layout.stripe_count -v "1" pool_cephfs01/
> setfattr -n ceph.dir.layout.object_size -v "1310720" pool_cephfs01/
> setfattr -n ceph.dir.layout.pool -v "cephfs01" pool_cephfs01/ 
> 
> When a client writes files in pool_cephfs01/, It got "failed: Transport 
> endpoint is not connected (107)" and these errors on MDS :
> 
> 10.111.0.6:6801/41706 >> 10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2 
> pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby
> 
> When I set stripe_unit=1048576 & object_size=1048576, it seems working.
> 
> What are the "rules" for stripe_unit & object_size ?
> 
> Thank you.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-11 Thread Karan Singh
Thanks Sage

I will create a “new feature” request on tracker.ceph.com 
 so that this discussion should not get buried under 
mailing list. 

Developers can implement this as per their convenience.



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 10 Mar 2015, at 14:26, Sage Weil  wrote:
> 
> On Tue, 10 Mar 2015, Christian Eichelmann wrote:
>> Hi Sage,
>> 
>> we hit this problem a few monthes ago as well and it took us quite a while to
>> figure out what's wrong.
>> 
>> As a Systemadministrator I don't like the idea that daemons or even init
>> scripts are changing system wide configuration parameters, so I wouldn't like
>> to see the OSDs do it themself.
> 
> This is my general feeling as well.  As we move to systemd, I'd like to 
> have the ceph unit file get away from this entirely and have the admin set 
> these values in /etc/security/limits.conf or /etc/sysctl.d.  The main 
> thing making this problematic right now is that the daemons run as root 
> instead of a 'ceph' user.
> 
>> The idea with the warning is on one hand a good hint, on the other hand it
>> also may confuse people, since changing this setting is not required for
>> common hardware.
> 
> If we make it warn only if it reaches > 50% of the threshold that is 
> probably safe...
> 
> sage
> 
> 
>> 
>> Regards,
>> Christian
>> 
>> On 03/09/2015 08:01 PM, Sage Weil wrote:
>>> On Mon, 9 Mar 2015, Karan Singh wrote:
 Thanks Guys kernel.pid_max=4194303 did the trick.
>>> Great to hear!  Sorry we missed that you only had it at 65536.
>>> 
>>> This is a really common problem that people hit when their clusters start
>>> to grow.  Is there somewhere in the docs we can put this to catch more
>>> users?  Or maybe a warning issued by the osds themselves or something if
>>> they see limits that are low?
>>> 
>>> sage
>>> 
 - Karan -
 
   On 09 Mar 2015, at 14:48, Christian Eichelmann
wrote:
 
 Hi Karan,
 
 as you are actually writing in your own book, the problem is the
 sysctl
 setting "kernel.pid_max". I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.
 
 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.
 
 Set the "kernel.pid_max" setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.
 
 Regards,
 Christian
 
 Am 09.03.2015 11:41, schrieb Karan Singh:
   Hello Community need help to fix a long going Ceph
   problem.
 
   Cluster is unhealthy , Multiple OSDs are DOWN. When i am
   trying to
   restart OSD?s i am getting this error
 
 
   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   : In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc : 129: FAILED
   assert(ret == 0)/
 
 
   *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
   CentOS6.5
   , 3.17.2-1.el6.elrepo.x86_64
 
   Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
   Tried centOS stock kernel 2.6.32  but no Luck
 
   Memory is not a problem more then 150+GB is free
 
 
   Did any one every faced this problem ??
 
   *Cluster status *
   *
   *
   / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
   / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
   1 pgs
   incomplete; 1735 pgs peering; 8938 pgs stale; 1/
   /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
   stuck unclean;
   recovery 6061/31080 objects degraded (19/
   /.501%); 111/196 in osds are down; clock skew detected on
   mon.pouta-s02,
   mon.pouta-s03/
   / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
   .50.3:6789/
   //0}, election epoch 1312, quorum 0,1,2
   pouta-s01,pouta-s02,pouta-s03/
   /   * osdmap e26633: 239 osds: 85 up, 196 in*/
   /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
   10360 objects/
   /4699 GB used, 707 TB / 711 TB avail/
   /6061/31080 objects degraded (19.501%)/
 

Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Stefan Priebe - Profihost AG
Am 11.03.2015 um 11:17 schrieb Nick Fisk:
> 
> 
>> Hi Nick,
>>
>> Am 11.03.2015 um 10:52 schrieb Nick Fisk:
>>> Hi Stefan,
>>>
>>> If the majority of your hot data fits on the cache tier you will see
>>> quite a marked improvement in read performance
>> I don't have writes ;-) just around 5%. 95% are writes.
>>
>>> and similar write performance
>>> (assuming you would have had your hdds backed by SSD journals).
>>
>> similar write performance of SSD cache tier or HDD "backend" tier?
>>
>> I'm mainly interested in a writeback mode.
> 
> Writes on Cache tiering are the same speed as a non cache tiering solution
> (with SSD journals), if the blocks are in the cache. 
> 
> 
>>
>>> However for data that is not in the cache tier you will get 10-20%
>>> less read performance and anything up to 10x less write performance.
>>> This is because a cache write miss has to read the entire object from
>>> the backing store into the cache and then modify it.
>>>
>>> The read performance degradation will probably be fixed in Hammer with
>>> proxy reads, but writes will most likely still be an issue.
>>
>> Why is writing to the HOT part so slow?
>>
> 
> If the object is in the cache tier or currently doesn't exist, then writes
> are fast as it just has to write directly to the cache tier SSD's. However
> if the object is in the slow tier and you write to it, then its very slow.
> This is because it has to read it off the slow tier (~12ms), write it on to
> the cache tier(~.5ms) and then update it (~.5ms).

Mhm sounds correct. So it's better to stuck with journals instead of
using a cache tier.

Stefan

> 
> With a non caching solution, you would have just written straight to the
> journal (~.5ms)
> 
>> Stefan
>>
>>> Nick
>>>
>>>
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Stefan Priebe - Profihost AG
 Sent: 11 March 2015 07:27
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Firefly Tiering

 Hi,

 has anybody successfully tested tiering while using firefly? How much
 does
>>> it
 impact performance vs. a normal pool? I mean is there any difference
 between a full SSD pool und a tiering SSD pool with SATA Backend?

 Greets,
 Stefan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Nick Fisk


> Am 11.03.2015 um 11:17 schrieb Nick Fisk:
> >
> >
> >> Hi Nick,
> >>
> >> Am 11.03.2015 um 10:52 schrieb Nick Fisk:
> >>> Hi Stefan,
> >>>
> >>> If the majority of your hot data fits on the cache tier you will see
> >>> quite a marked improvement in read performance
> >> I don't have writes ;-) just around 5%. 95% are writes.
> >>
> >>> and similar write performance
> >>> (assuming you would have had your hdds backed by SSD journals).
> >>
> >> similar write performance of SSD cache tier or HDD "backend" tier?
> >>
> >> I'm mainly interested in a writeback mode.
> >
> > Writes on Cache tiering are the same speed as a non cache tiering
> > solution (with SSD journals), if the blocks are in the cache.
> >
> >
> >>
> >>> However for data that is not in the cache tier you will get 10-20%
> >>> less read performance and anything up to 10x less write performance.
> >>> This is because a cache write miss has to read the entire object
> >>> from the backing store into the cache and then modify it.
> >>>
> >>> The read performance degradation will probably be fixed in Hammer
> >>> with proxy reads, but writes will most likely still be an issue.
> >>
> >> Why is writing to the HOT part so slow?
> >>
> >
> > If the object is in the cache tier or currently doesn't exist, then
> > writes are fast as it just has to write directly to the cache tier
> > SSD's. However if the object is in the slow tier and you write to it,
then its
> very slow.
> > This is because it has to read it off the slow tier (~12ms), write it
> > on to the cache tier(~.5ms) and then update it (~.5ms).
> 
> Mhm sounds correct. So it's better to stuck with journals instead of using
a
> cache tier.

That's purely down to your workload, but in general if you are doing lots of
writes, a cache tier will probably slow you down at the moment.


> 
> Stefan
> 
> >
> > With a non caching solution, you would have just written straight to
> > the journal (~.5ms)
> >
> >> Stefan
> >>
> >>> Nick
> >>>
> >>>
>  -Original Message-
>  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
>  Behalf Of Stefan Priebe - Profihost AG
>  Sent: 11 March 2015 07:27
>  To: ceph-users@lists.ceph.com
>  Subject: [ceph-users] Firefly Tiering
> 
>  Hi,
> 
>  has anybody successfully tested tiering while using firefly? How
>  much does
> >>> it
>  impact performance vs. a normal pool? I mean is there any
>  difference between a full SSD pool und a tiering SSD pool with SATA
> Backend?
> 
>  Greets,
>  Stefan
>  ___
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>>
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread joel.merr...@gmail.com
For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
 wrote:
> Sure thing, n.b. I increased pg count to see if it would help. Alas not. :)
>
> Thanks again!
>
> health_detail
> https://gist.github.com/199bab6d3a9fe30fbcae
>
> osd_dump
> https://gist.github.com/499178c542fa08cc33bb
>
> osd_tree
> https://gist.github.com/02b62b2501cbd684f9b2
>
> Random selected queries:
> queries/0.19.query
> https://gist.github.com/f45fea7c85d6e665edf8
> queries/1.a1.query
> https://gist.github.com/dd68fbd5e862f94eb3be
> queries/7.100.query
> https://gist.github.com/d4fd1fb030c6f2b5e678
> queries/7.467.query
> https://gist.github.com/05dbcdc9ee089bd52d0c
>
> On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just  wrote:
>> Yeah, get a ceph pg query on one of the stuck ones.
>> -Sam
>>
>> On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:
>>> Stuck unclean and stuck inactive. I can fire up a full query and
>>> health dump somewhere useful if you want (full pg query info on ones
>>> listed in health detail, tree, osd dump etc). There were blocked_by
>>> operations that no longer exist after doing the OSD addition.
>>>
>>> Side note, spent some time yesterday writing some bash to do this
>>> programatically (might be useful to others, will throw on github)
>>>
>>> On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just  wrote:
>>> > What do you mean by "unblocked" but still "stuck"?
>>> > -Sam
>>> >
>>> > On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:
>>> >> On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just  wrote:
>>> >> > You'll probably have to recreate osds with the same ids (empty ones),
>>> >> > let them boot, stop them, and mark them lost.  There is a feature in 
>>> >> > the
>>> >> > tracker to improve this behavior: http://tracker.ceph.com/issues/10976
>>> >> > -Sam
>>> >>
>>> >> Thanks Sam, I've readded the OSDs, they became unblocked but there are
>>> >> still the same number of pgs stuck. I looked at them in some more
>>> >> detail and it seems they all have num_bytes='0'. Tried a repair too,
>>> >> for good measure. Still nothing I'm afraid.
>>> >>
>>> >> Does this mean some underlying catastrophe has happened and they are
>>> >> never going to recover? Following on, would that cause data loss.
>>> >> There are no missing objects and I'm hoping there's appropriate
>>> >> checksumming / replicas to balance that out, but now I'm not so sure.
>>> >>
>>> >> Thanks again,
>>> >> Joel
>>> >
>>> >
>>>
>>>
>>>
>>
>>
>
>
>
> --
> $ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'



-- 
$ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 => pipe.fault, server, going to standby

2015-03-11 Thread Ilya Dryomov
On Wed, Mar 11, 2015 at 1:21 PM, LOPEZ Jean-Charles  wrote:
> Hi Florent
>
> What are the « rules » for stripe_unit & object_size ? -> stripe_unit *
> stripe_count = object_size
>
> So in your case set stripe_unit = 2
>
> JC
>
>
> On 11 Mar 2015, at 19:59, Florent B  wrote:
>
> Hi all,
>
> I'm testing CephFS with Giant and I have a problem when I set these attrs :
>
> setfattr -n ceph.dir.layout.stripe_unit -v "65536" pool_cephfs01/
> setfattr -n ceph.dir.layout.stripe_count -v "1" pool_cephfs01/
> setfattr -n ceph.dir.layout.object_size -v "1310720" pool_cephfs01/
> setfattr -n ceph.dir.layout.pool -v "cephfs01" pool_cephfs01/
>
> When a client writes files in pool_cephfs01/, It got "failed: Transport
> endpoint is not connected (107)" and these errors on MDS :
>
> 10.111.0.6:6801/41706 >> 10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2
> pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby
>
> When I set stripe_unit=1048576 & object_size=1048576, it seems working.
>
> What are the "rules" for stripe_unit & object_size ?

"stripe_unit * stripe_count = object_size" is definitely not correct.
The current rules are:

- object_size is a multiple of stripe_unit
- stripe_unit (and consequently object_size) is 64k-aligned
- stripe_count is at least 1 (i.e. at least 1 object in an object set)

However, the above layout is pretty bogus - there is basically no
striping going on, so it's probably a bug in the way it's handled.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> Am 11.03.2015 um 04:14 schrieb Sage Weil:
> > On Wed, 11 Mar 2015, Christian Balzer wrote:
> >> On Tue, 10 Mar 2015 12:34:14 -0700 (PDT) Sage Weil wrote:
> >>
> >>
> >>> Adjusting CRUSH maps
> >>> 
> >>>
> >>> * This point release fixes several issues with CRUSH that trigger
> >>>   excessive data migration when adjusting OSD weights.  These are most
> >>>   obvious when a very small weight change (e.g., a change from 0 to
> >>>   .01) triggers a large amount of movement, but the same set of bugs
> >>>   can also lead to excessive (though less noticeable) movement in
> >>>   other cases.
> >>>
> >>>   However, because the bug may already have affected your cluster,
> >>>   fixing it may trigger movement *back* to the more correct location.
> >>>   For this reason, you must manually opt-in to the fixed behavior.
> >>>
> >> It would be nice to know at what version of Ceph those bugs were
> >> introduced.
> > 
> > This bug has been present in CRUSH since the beginning.
> 
> So peaople upgrading from dumplang have todo the same?
> 
> 1.) They need to set tunables to optimal (to get firefly tunables)
> 2.) They have to set those options you mention?

Nothing has to (or probably should be) done as part of the upgrade process 
itself.

This tunable can be set without changing to firefly tunables.  It affects 
the monitor-side generation of internal weight values one, and has no 
dependency or compatibility issue with clients or OSDs.  And the bug only 
triggers when a weight is changed.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Dan van der Ster wrote:
> Hi Sage,
> 
> On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil  wrote:
> > Adjusting CRUSH maps
> > 
> >
> > * This point release fixes several issues with CRUSH that trigger
> >   excessive data migration when adjusting OSD weights.  These are most
> >   obvious when a very small weight change (e.g., a change from 0 to
> >   .01) triggers a large amount of movement, but the same set of bugs
> >   can also lead to excessive (though less noticeable) movement in
> >   other cases.
> >
> >   However, because the bug may already have affected your cluster,
> >   fixing it may trigger movement *back* to the more correct location.
> >   For this reason, you must manually opt-in to the fixed behavior.
> >
> >   In order to set the new tunable to correct the behavior::
> >
> >  ceph osd crush set-tunable straw_calc_version 1
> >
> 
> Since it's not obvious in this case, does setting straw_calc_version =
> 1 still allow older firefly clients to connect?

Correct.  The bug only affects the generation of internal weight values 
that are stored in the crush map itself (crush_calc_straw()).  Setting the 
tunable makes the *monitors* behave properly (if adjusting weights via the 
ceph cli) or *crushtool* calculate weights properly if you are compiling 
the crush map via 'crushtool -c ...'.  There is no dependency or 
compatibility issue with clients, and no need to set tunables to 'firefly' 
to set straw_calc_version.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Gabri Mate wrote:
> May I assume this fix will be in Hammer? So can I use this to fix my
> cluster after upgrading Giant to Hammer?

Yes, the fix is also in Hammer, but the same procedure should be followed 
to opt-in to the new behavior.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Loic Dachary
Hi Valery,

They should be here http://ceph.com/debian-testing/

Cheers

On 11/03/2015 10:07, Valery Tschopp wrote:
> Where can I find the debian trusty source package for v0.80.9?
> 
> Cheers,
> Valery
> 
> On 10/03/15 20:34 , Sage Weil wrote:
>> This is a bugfix release for firefly.  It fixes a performance regression
>> in librbd, an important CRUSH misbehavior (see below), and several RGW
>> bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
>> and libcephfs.
>>
>> We recommend that all Firefly users upgrade.
>>
>> For more detailed information, see
>>http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
>>
>> Adjusting CRUSH maps
>> 
>>
>> * This point release fixes several issues with CRUSH that trigger
>>excessive data migration when adjusting OSD weights.  These are most
>>obvious when a very small weight change (e.g., a change from 0 to
>>.01) triggers a large amount of movement, but the same set of bugs
>>can also lead to excessive (though less noticeable) movement in
>>other cases.
>>
>>However, because the bug may already have affected your cluster,
>>fixing it may trigger movement *back* to the more correct location.
>>For this reason, you must manually opt-in to the fixed behavior.
>>
>>In order to set the new tunable to correct the behavior::
>>
>>   ceph osd crush set-tunable straw_calc_version 1
>>
>>Note that this change will have no immediate effect.  However, from
>>this point forward, any 'straw' bucket in your CRUSH map that is
>>adjusted will get non-buggy internal weights, and that transition
>>may trigger some rebalancing.
>>
>>You can estimate how much rebalancing will eventually be necessary
>>on your cluster with::
>>
>>   ceph osd getcrushmap -o /tmp/cm
>>   crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
>>   crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
>>   crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
>>   crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b 2>&1
>>   wc -l /tmp/a  # num total mappings
>>   diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings
>>
>> Divide the total number of lines in /tmp/a with the number of lines
>> changed.  We've found that most clusters are under 10%.
>>
>> You can force all of this rebalancing to happen at once with::
>>
>>   ceph osd crush reweight-all
>>
>> Otherwise, it will happen at some unknown point in the future when
>> CRUSH weights are next adjusted.
>>
>> Notable Changes
>> ---
>>
>> * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
>> * crush: fix straw bucket weight calculation, add straw_calc_version
>>tunable (#10095 Sage Weil)
>> * crush: fix tree bucket (Rongzu Zhu)
>> * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
>> * crushtool: add --reweight (Sage Weil)
>> * librbd: complete pending operations before losing image (#10299 Jason
>>Dillaman)
>> * librbd: fix read caching performance regression (#9854 Jason Dillaman)
>> * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
>> * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
>> * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
>> * osd: handle no-op write with snapshot (#10262 Sage Weil)
>> * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
>> * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
>>Yehuda Sadeh)
>> * rgw: don't overwrite bucket/object owner when setting ACLs (#10978
>>Yehuda Sadeh)
>> * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
>> * rgw: fix partial swift GET (#10553 Yehuda Sadeh)
>> * rgw: fix quota disable (#9907 Dong Lei)
>> * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
>> * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
>> * rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
>> * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
>>Sadeh)
>> * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
>> * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
>> * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
>> * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
>> * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
>> * rgw: update swift subuser permission masks when authenticating (#9918
>>Yehuda Sadeh)
>> * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
>>Yehuda Sadeh)
>> * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
>> * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)
>>
>> Getting Ceph
>> 
>>
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
>> * For packages, see http://ceph.com/docs/mast

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
Ok, you lost all copies from an interval where the pgs went active. The 
recovery from this is going to be complicated and fragile.  Are the 
pools valuable?

-Sam

On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote:

For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
 wrote:

Sure thing, n.b. I increased pg count to see if it would help. Alas not. :)

Thanks again!

health_detail
https://gist.github.com/199bab6d3a9fe30fbcae

osd_dump
https://gist.github.com/499178c542fa08cc33bb

osd_tree
https://gist.github.com/02b62b2501cbd684f9b2

Random selected queries:
queries/0.19.query
https://gist.github.com/f45fea7c85d6e665edf8
queries/1.a1.query
https://gist.github.com/dd68fbd5e862f94eb3be
queries/7.100.query
https://gist.github.com/d4fd1fb030c6f2b5e678
queries/7.467.query
https://gist.github.com/05dbcdc9ee089bd52d0c

On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just  wrote:

Yeah, get a ceph pg query on one of the stuck ones.
-Sam

On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:

Stuck unclean and stuck inactive. I can fire up a full query and
health dump somewhere useful if you want (full pg query info on ones
listed in health detail, tree, osd dump etc). There were blocked_by
operations that no longer exist after doing the OSD addition.

Side note, spent some time yesterday writing some bash to do this
programatically (might be useful to others, will throw on github)

On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just  wrote:

What do you mean by "unblocked" but still "stuck"?
-Sam

On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:

On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just  wrote:

You'll probably have to recreate osds with the same ids (empty ones),
let them boot, stop them, and mark them lost.  There is a feature in the
tracker to improve this behavior: http://tracker.ceph.com/issues/10976
-Sam

Thanks Sam, I've readded the OSDs, they became unblocked but there are
still the same number of pgs stuck. I looked at them in some more
detail and it seems they all have num_bytes='0'. Tried a repair too,
for good measure. Still nothing I'm afraid.

Does this mean some underlying catastrophe has happened and they are
never going to recover? Following on, would that cause data loss.
There are no missing objects and I'm hoping there's appropriate
checksumming / replicas to balance that out, but now I'm not so sure.

Thanks again,
Joel










--
$ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados utility is hung forever

2015-03-11 Thread shylesh kumar
Hi All,

I am trying to create a cluster. Monitor , osds everything is up and
running.

root@localhost ceph-config]# ceph osd lspools
0 data,1 metadata,2 rbd,3 mypool,

I created "mypool" and then trying the command  "rados put test-1
testfile.txt  --pool=mypool" but this will never returns.

How to debug this issue, where can i find logs related to rados utility.

Any links appreciated.


-- 
Thanks
Shylesh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-11 Thread Malcolm Haak
So grep 2.31 and above are 'broken' 

https://bugzilla.novell.com/show_bug.cgi?id=921714

I'm rebuilding with an older grep binary in place. 

Just a heads up, as this does break the init scripts. 

-Original Message-
From: Malcolm Haak 
Sent: Wednesday, 11 March 2015 9:07 PM
To: Malcolm Haak; Samuel Just; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Hi all,

So the init script issue is sorted.. my grep binary is not working correctly.  
I've replaced it and everything seems to be fine. 

Which now has me wondering if the binaries I generated are any good... the bad 
grep might have caused issues with the build...

I'm going to recompile after some more sanity testing..

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Malcolm Haak
Sent: Wednesday, 11 March 2015 8:56 PM
To: Samuel Just; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

I ran ceph-osd via the command line...

It's not really given me much more to go off...  Well except that it's hitting 
an early end of buffer for some reason.

Also I've hit another issue... 

The /etc/init.d/ceph script is not seeing my new mon (I decided to add more 
mon's to see if it would help since the mon map looks like it is the issue)

The script starts the mon fine. And the new mon (on the same host as this 
problem osd) appears to be good. 

The issue is when you do /etc/init.d/ceph status 

It tells you the mon.b is dead.. It seems to be one of the greps that is failing
Specifically 
grep -qwe -i.$daemon_id /proc/\$pid/cmdline
returns 1

What's odd is the same grep works on the other node for mon.a it just doesn't 
work on this node for mon.b

I'm wondering if there is something odd happening. 

Anyway here is the output of the manual start of ceph-osd


# /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c 
/etc/ceph/ceph.conf --cluster ceph -f
starting osd.3 at :/0 osd_data /var/lib/ceph/osd/ceph-3 
/var/lib/ceph/osd/ceph-3/journal
2015-03-11 20:38:56.401205 7f04221e6880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio t   

o force use of aio anyway
2015-03-11 20:38:56.418747 7f04221e6880 -1 osd.3 2757 log_to_monitors 
{default=true}
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7f041192a700
 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xac7cea]
 2: (()+0x10050) [0x7f04210f1050]
 3: (gsignal()+0x37) [0x7f041f5c40f7]
 4: (abort()+0x13a) [0x7f041f5c54ca]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f041fea9fe5]
 6: (()+0x63186) [0x7f041fea8186]
 7: (()+0x631b3) [0x7f041fea81b3]
 8: (()+0x633d2) [0x7f041fea83d2]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc2cea7]
 10: (OSDMap::decode_classic(ceph::buffer::list::iterator&)+0x605) [0xb7b7b5]
 11: (OSDMap::decode(ceph::buffer::list::iterator&)+0x8c) [0xb7bebc]
 12: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb7dfbf]
 13: (OSD::handle_osd_map(MOSDMap*)+0xd37) [0x6cd9a7]
 14: (OSD::_dispatch(Message*)+0x3eb) [0x6d0afb]
 15: (OSD::ms_dispatch(Message*)+0x257) [0x6d1007]
 16: (DispatchQueue::entry()+0x649) [0xc6fe09]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0xb9dd7d]
 18: (()+0x83a4) [0x7f04210e93a4]
 19: (clone()+0x6d) [0x7f041f673a4d]
2015-03-11 20:38:56.471624 7f041192a700 -1 *** Caught signal (Aborted) **
 in thread 7f041192a700

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xac7cea]
 2: (()+0x10050) [0x7f04210f1050]
 3: (gsignal()+0x37) [0x7f041f5c40f7]
 4: (abort()+0x13a) [0x7f041f5c54ca]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f041fea9fe5]
 6: (()+0x63186) [0x7f041fea8186]
 7: (()+0x631b3) [0x7f041fea81b3]
 8: (()+0x633d2) [0x7f041fea83d2]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc2cea7]
 10: (OSDMap::decode_classic(ceph::buffer::list::iterator&)+0x605) [0xb7b7b5]
 11: (OSDMap::decode(ceph::buffer::list::iterator&)+0x8c) [0xb7bebc]
 12: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb7dfbf]
 13: (OSD::handle_osd_map(MOSDMap*)+0xd37) [0x6cd9a7]
 14: (OSD::_dispatch(Message*)+0x3eb) [0x6d0afb]
 15: (OSD::ms_dispatch(Message*)+0x257) [0x6d1007]
 16: (DispatchQueue::entry()+0x649) [0xc6fe09]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0xb9dd7d]
 18: (()+0x83a4) [0x7f04210e93a4]
 19: (clone()+0x6d) [0x7f041f673a4d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

  -308> 2015-03-11 20:38:56.401205 7f04221e6880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_for

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread joel.merr...@gmail.com
I'd like to not have to null them if possible, there's nothing
outlandishly valuable, its more the time to reprovision (users have
stuff on there, mainly testing but I have a nasty feeling some users
won't have backed up their test instances). When you say complicated
and fragile, could you expand?

Thanks again!
Joel

On Wed, Mar 11, 2015 at 1:21 PM, Samuel Just  wrote:
> Ok, you lost all copies from an interval where the pgs went active. The
> recovery from this is going to be complicated and fragile.  Are the pools
> valuable?
> -Sam
>
>
> On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote:
>>
>> For clarity too, I've tried to drop the min_size before as suggested,
>> doesn't make a difference unfortunately
>>
>> On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
>>  wrote:
>>>
>>> Sure thing, n.b. I increased pg count to see if it would help. Alas not.
>>> :)
>>>
>>> Thanks again!
>>>
>>> health_detail
>>> https://gist.github.com/199bab6d3a9fe30fbcae
>>>
>>> osd_dump
>>> https://gist.github.com/499178c542fa08cc33bb
>>>
>>> osd_tree
>>> https://gist.github.com/02b62b2501cbd684f9b2
>>>
>>> Random selected queries:
>>> queries/0.19.query
>>> https://gist.github.com/f45fea7c85d6e665edf8
>>> queries/1.a1.query
>>> https://gist.github.com/dd68fbd5e862f94eb3be
>>> queries/7.100.query
>>> https://gist.github.com/d4fd1fb030c6f2b5e678
>>> queries/7.467.query
>>> https://gist.github.com/05dbcdc9ee089bd52d0c
>>>
>>> On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just  wrote:

 Yeah, get a ceph pg query on one of the stuck ones.
 -Sam

 On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:
>
> Stuck unclean and stuck inactive. I can fire up a full query and
> health dump somewhere useful if you want (full pg query info on ones
> listed in health detail, tree, osd dump etc). There were blocked_by
> operations that no longer exist after doing the OSD addition.
>
> Side note, spent some time yesterday writing some bash to do this
> programatically (might be useful to others, will throw on github)
>
> On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just  wrote:
>>
>> What do you mean by "unblocked" but still "stuck"?
>> -Sam
>>
>> On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:
>>>
>>> On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just  wrote:

 You'll probably have to recreate osds with the same ids (empty
 ones),
 let them boot, stop them, and mark them lost.  There is a feature in
 the
 tracker to improve this behavior:
 http://tracker.ceph.com/issues/10976
 -Sam
>>>
>>> Thanks Sam, I've readded the OSDs, they became unblocked but there
>>> are
>>> still the same number of pgs stuck. I looked at them in some more
>>> detail and it seems they all have num_bytes='0'. Tried a repair too,
>>> for good measure. Still nothing I'm afraid.
>>>
>>> Does this mean some underlying catastrophe has happened and they are
>>> never going to recover? Following on, would that cause data loss.
>>> There are no missing objects and I'm hoping there's appropriate
>>> checksumming / replicas to balance that out, but now I'm not so sure.
>>>
>>> Thanks again,
>>> Joel
>>
>>
>
>

>>>
>>>
>>> --
>>> $ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'
>>
>>
>>
>



-- 
$ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Duplication name Container

2015-03-11 Thread Wido den Hollander
On 03/11/2015 03:23 PM, Jimmy Goffaux wrote:
> Hello All,
> 
> I use Ceph in production for several months. but i have an errors with
> Ceph Rados Gateway for multiple users.
> 
> I am faced with the following error:
> 
> Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists
> 
> Which corresponds to the documentation :
> http://ceph.com/docs/master/radosgw/s3/bucketops/
> 
> By which means I can avoid this kind of problem?
> 

You can not. Bucket names are unique inside the RADOS Gateway. Just as
with Amazon S3.

> Here are my versions used:
> 
> radosgw-agent  => 1.2-1precise
> ceph   => 0.87-1precise
> 
> Thank you for your help
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd laggy algorithm

2015-03-11 Thread Artem Savinov
hello.
ceph transfers osd node in the down status by default , after receiving 3
reports about disabled nodes. Reports are sent per   "osd heartbeat grace"
seconds, but the settings of "mon_osd_adjust_heartbeat_gratse = true,
mon_osd_adjust_down_out_interval = true" timeout to transfer nodes in down
status may vary. Tell me please: what algorithm enables changes timeout for
the transfer nodes occur in down/out status and which parameters are
affected?
thanks.

--
Artem
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Valery Tschopp

Hi Loic,

Nope, only the versions from 0.81-trusty to 0.93-1trusty are available 
in http://ceph.com/debian-testing/pool/main/c/ceph/


But the firefly deb source packages for 0.80.9-1trusty is not available :(

Cheers,
Valery

On 11/03/15 14:11 , Loic Dachary wrote:

Hi Valery,

They should be here http://ceph.com/debian-testing/

Cheers

On 11/03/2015 10:07, Valery Tschopp wrote:

Where can I find the debian trusty source package for v0.80.9?

Cheers,
Valery

On 10/03/15 20:34 , Sage Weil wrote:

This is a bugfix release for firefly.  It fixes a performance regression
in librbd, an important CRUSH misbehavior (see below), and several RGW
bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see
http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

Adjusting CRUSH maps


* This point release fixes several issues with CRUSH that trigger
excessive data migration when adjusting OSD weights.  These are most
obvious when a very small weight change (e.g., a change from 0 to
.01) triggers a large amount of movement, but the same set of bugs
can also lead to excessive (though less noticeable) movement in
other cases.

However, because the bug may already have affected your cluster,
fixing it may trigger movement *back* to the more correct location.
For this reason, you must manually opt-in to the fixed behavior.

In order to set the new tunable to correct the behavior::

   ceph osd crush set-tunable straw_calc_version 1

Note that this change will have no immediate effect.  However, from
this point forward, any 'straw' bucket in your CRUSH map that is
adjusted will get non-buggy internal weights, and that transition
may trigger some rebalancing.

You can estimate how much rebalancing will eventually be necessary
on your cluster with::

   ceph osd getcrushmap -o /tmp/cm
   crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
   crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
   crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
   crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b 2>&1
   wc -l /tmp/a  # num total mappings
   diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

 Divide the total number of lines in /tmp/a with the number of lines
 changed.  We've found that most clusters are under 10%.

 You can force all of this rebalancing to happen at once with::

   ceph osd crush reweight-all

 Otherwise, it will happen at some unknown point in the future when
 CRUSH weights are next adjusted.

Notable Changes
---

* ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
* crush: fix straw bucket weight calculation, add straw_calc_version
tunable (#10095 Sage Weil)
* crush: fix tree bucket (Rongzu Zhu)
* crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
* crushtool: add --reweight (Sage Weil)
* librbd: complete pending operations before losing image (#10299 Jason
Dillaman)
* librbd: fix read caching performance regression (#9854 Jason Dillaman)
* librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
* mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
* osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
* osd: handle no-op write with snapshot (#10262 Sage Weil)
* radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
* rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
* rgw: don't overwrite bucket/object owner when setting ACLs (#10978
Yehuda Sadeh)
* rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
* rgw: fix partial swift GET (#10553 Yehuda Sadeh)
* rgw: fix quota disable (#9907 Dong Lei)
* rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
* rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
* rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
* rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
Sadeh)
* rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
* rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
* rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
* rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
* rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
* rgw: update swift subuser permission masks when authenticating (#9918
Yehuda Sadeh)
* rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
* rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
* rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
* For packages, see http://ceph.c

Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Loic Dachary


On 11/03/2015 16:37, Valery Tschopp wrote:
> Hi Loic,
> 
> Nope, only the versions from 0.81-trusty to 0.93-1trusty are available in 
> http://ceph.com/debian-testing/pool/main/c/ceph/
> 
> But the firefly deb source packages for 0.80.9-1trusty is not available :(

My bad, it's firefly stable point release therefore not in *testing*. It should 
be at http://ceph.com/debian-firefly/

> 
> Cheers,
> Valery
> 
> On 11/03/15 14:11 , Loic Dachary wrote:
>> Hi Valery,
>>
>> They should be here http://ceph.com/debian-testing/
>>
>> Cheers
>>
>> On 11/03/2015 10:07, Valery Tschopp wrote:
>>> Where can I find the debian trusty source package for v0.80.9?
>>>
>>> Cheers,
>>> Valery
>>>
>>> On 10/03/15 20:34 , Sage Weil wrote:
 This is a bugfix release for firefly.  It fixes a performance regression
 in librbd, an important CRUSH misbehavior (see below), and several RGW
 bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
 and libcephfs.

 We recommend that all Firefly users upgrade.

 For more detailed information, see
 http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

 Adjusting CRUSH maps
 

 * This point release fixes several issues with CRUSH that trigger
 excessive data migration when adjusting OSD weights.  These are most
 obvious when a very small weight change (e.g., a change from 0 to
 .01) triggers a large amount of movement, but the same set of bugs
 can also lead to excessive (though less noticeable) movement in
 other cases.

 However, because the bug may already have affected your cluster,
 fixing it may trigger movement *back* to the more correct location.
 For this reason, you must manually opt-in to the fixed behavior.

 In order to set the new tunable to correct the behavior::

ceph osd crush set-tunable straw_calc_version 1

 Note that this change will have no immediate effect.  However, from
 this point forward, any 'straw' bucket in your CRUSH map that is
 adjusted will get non-buggy internal weights, and that transition
 may trigger some rebalancing.

 You can estimate how much rebalancing will eventually be necessary
 on your cluster with::

ceph osd getcrushmap -o /tmp/cm
crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 
 2>&1
crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b 
 2>&1
wc -l /tmp/a  # num total mappings
diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

  Divide the total number of lines in /tmp/a with the number of lines
  changed.  We've found that most clusters are under 10%.

  You can force all of this rebalancing to happen at once with::

ceph osd crush reweight-all

  Otherwise, it will happen at some unknown point in the future when
  CRUSH weights are next adjusted.

 Notable Changes
 ---

 * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
 * crush: fix straw bucket weight calculation, add straw_calc_version
 tunable (#10095 Sage Weil)
 * crush: fix tree bucket (Rongzu Zhu)
 * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
 * crushtool: add --reweight (Sage Weil)
 * librbd: complete pending operations before losing image (#10299 Jason
 Dillaman)
 * librbd: fix read caching performance regression (#9854 Jason Dillaman)
 * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
 * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
 * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
 * osd: handle no-op write with snapshot (#10262 Sage Weil)
 * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
 * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
 Yehuda Sadeh)
 * rgw: don't overwrite bucket/object owner when setting ACLs (#10978
 Yehuda Sadeh)
 * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
 * rgw: fix partial swift GET (#10553 Yehuda Sadeh)
 * rgw: fix quota disable (#9907 Dong Lei)
 * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
 * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
 * rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
 * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
 Sadeh)
 * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
 * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
 * rgw: send ETag on S3 

[ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread kevin parrikar
Hi,
 I am trying hammer 0.93 on Ubuntu 14.04.
rbd is mapped in client ,which is also ubuntu 14.04 .
When i did a stop ceph-osd-all and then a start,client machine crashed and
attached pic was in the console.Not sure if its related to ceph.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread Somnath Roy
Kevin,
This is a known issue and should be fixed in the latest krbd. The problem is, 
it is not backported to 14.04 krbd yet. You need to build it from latest krbd 
source if you want to stick with 14.04.
The workaround is, you need to unmap your clients before restarting osds.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of kevin 
parrikar
Sent: Wednesday, March 11, 2015 11:44 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] client crashed when osd gets restarted - hammer 0.93

Hi,
 I am trying hammer 0.93 on Ubuntu 14.04.
rbd is mapped in client ,which is also ubuntu 14.04 .
When i did a stop ceph-osd-all and then a start,client machine crashed and 
attached pic was in the console.Not sure if its related to ceph.

Thanks



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread kevin parrikar
thanks i will follow this work around.

On Thu, Mar 12, 2015 at 12:18 AM, Somnath Roy 
wrote:

>  Kevin,
>
> This is a known issue and should be fixed in the latest krbd. The problem
> is, it is not backported to 14.04 krbd yet. You need to build it from
> latest krbd source if you want to stick with 14.04.
>
> The workaround is, you need to unmap your clients before restarting osds.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *kevin parrikar
> *Sent:* Wednesday, March 11, 2015 11:44 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] client crashed when osd gets restarted - hammer
> 0.93
>
>
>
> Hi,
>
>  I am trying hammer 0.93 on Ubuntu 14.04.
>
> rbd is mapped in client ,which is also ubuntu 14.04 .
>
> When i did a stop ceph-osd-all and then a start,client machine crashed and
> attached pic was in the console.Not sure if its related to ceph.
>
>
>
> Thanks
>
> --
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
For each of those pgs, you'll need to identify the pg copy you want to 
be the winner and either
1) Remove all of the other ones using ceph-objectstore-tool and 
hopefully the winner you left alone will allow the pg to recover and go 
active.
2) Export the winner using ceph-objectstore-tool, use 
ceph-objectstore-tool to delete *all* copies of the pg, use 
force_create_pg to recreate the pg empty, use ceph-objectstore-tool to 
do a rados import on the exported pg copy.


Also, the pgs which are still down still have replicas which need to be 
brought back or marked lost.

-Sam

On 03/11/2015 07:29 AM, joel.merr...@gmail.com wrote:

I'd like to not have to null them if possible, there's nothing
outlandishly valuable, its more the time to reprovision (users have
stuff on there, mainly testing but I have a nasty feeling some users
won't have backed up their test instances). When you say complicated
and fragile, could you expand?

Thanks again!
Joel

On Wed, Mar 11, 2015 at 1:21 PM, Samuel Just  wrote:

Ok, you lost all copies from an interval where the pgs went active. The
recovery from this is going to be complicated and fragile.  Are the pools
valuable?
-Sam


On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote:

For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
 wrote:

Sure thing, n.b. I increased pg count to see if it would help. Alas not.
:)

Thanks again!

health_detail
https://gist.github.com/199bab6d3a9fe30fbcae

osd_dump
https://gist.github.com/499178c542fa08cc33bb

osd_tree
https://gist.github.com/02b62b2501cbd684f9b2

Random selected queries:
queries/0.19.query
https://gist.github.com/f45fea7c85d6e665edf8
queries/1.a1.query
https://gist.github.com/dd68fbd5e862f94eb3be
queries/7.100.query
https://gist.github.com/d4fd1fb030c6f2b5e678
queries/7.467.query
https://gist.github.com/05dbcdc9ee089bd52d0c

On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just  wrote:

Yeah, get a ceph pg query on one of the stuck ones.
-Sam

On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:

Stuck unclean and stuck inactive. I can fire up a full query and
health dump somewhere useful if you want (full pg query info on ones
listed in health detail, tree, osd dump etc). There were blocked_by
operations that no longer exist after doing the OSD addition.

Side note, spent some time yesterday writing some bash to do this
programatically (might be useful to others, will throw on github)

On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just  wrote:

What do you mean by "unblocked" but still "stuck"?
-Sam

On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:

On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just  wrote:

You'll probably have to recreate osds with the same ids (empty
ones),
let them boot, stop them, and mark them lost.  There is a feature in
the
tracker to improve this behavior:
http://tracker.ceph.com/issues/10976
-Sam

Thanks Sam, I've readded the OSDs, they became unblocked but there
are
still the same number of pgs stuck. I looked at them in some more
detail and it seems they all have num_bytes='0'. Tried a repair too,
for good measure. Still nothing I'm afraid.

Does this mean some underlying catastrophe has happened and they are
never going to recover? Following on, would that cause data loss.
There are no missing objects and I'm hoping there's appropriate
checksumming / replicas to balance that out, but now I'm not so sure.

Thanks again,
Joel






--
$ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Duplication name Container

2015-03-11 Thread Steffen W Sørensen
On 11/03/2015, at 15.31, Wido den Hollander  wrote:

> On 03/11/2015 03:23 PM, Jimmy Goffaux wrote:
>> Hello All,
>> 
>> I use Ceph in production for several months. but i have an errors with
>> Ceph Rados Gateway for multiple users.
>> 
>> I am faced with the following error:
>> 
>> Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists
>> 
>> Which corresponds to the documentation :
>> http://ceph.com/docs/master/radosgw/s3/bucketops/
>> 
>> By which means I can avoid this kind of problem?
>> 
> You can not. Bucket names are unique inside the RADOS Gateway. Just as
> with Amazon S3.
Well it can be avoided but not at the Ceph level but at your Application level 
:)
Either ignore already exist errors in your App or try to verify bucket exists 
before creating buckets... 

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen
On 11/03/2015, at 08.19, Steffen W Sørensen  wrote:

> On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub  wrote:
> 
 What kind of application is that?
>>> Commercial Email platform from Openwave.com
>> 
>> Maybe it could be worked around using an apache rewrite rule. In any case, I 
>> opened issue #11091.
> Okay, how, by rewriting the response?
> Thanks, where can tickets be followed/viewed?
Ah here: http://tracker.ceph.com/projects/rgw/issues

>> Not at the moment. There's already issue #6961, I bumped its priority 
>> higher, and we'll take a look at it.
Please also backport to Giant if possible :)

/Steffen




signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-11 Thread Nick Fisk
I'm not sure if it's something I'm doing wrong or just experiencing an
oddity, but when my cache tier flushes dirty blocks out to the base tier,
the writes seem to hit the OSD's straight away instead of coalescing in the
journals, is this correct?

For example if I create a RBD on a standard 3 way replica pool and run fio
via librbd 128k writes, I see the journals take all the io's until I hit my
filestore_min_sync_interval and then I see it start writing to the
underlying disks.

Doing the same on a full cache tier (to force flushing)  I immediately see
the base disks at a very high utilisation. The journals also have some write
IO at the same time. The only other odd thing I can see via iostat is that
most of the time whilst I'm running Fio, is that I can see the underlying
disks doing very small write IO's of around 16kb with an occasional big
burst of activity.

I know erasure coding+cache tier is slower than just plain replicated pools,
but even with various high queue depths I'm struggling to get much above
100-150 iops compared to a 3 way replica pool which can easily achieve
1000-1500. The base tier is comprised of 40 disks. It seems quite a marked
difference and I'm wondering if this strange journal behaviour is the cause.

Does anyone have any ideas?

Nick


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck unclean "active+remapped" after an osd marked out

2015-03-11 Thread Francois Lafont
Hi,

I was always in the same situation: I couldn't remove an OSD without
have some PGs definitely stuck to the "active+remapped" state.

But I remembered I read on IRC that, before to mark out an OSD, it
could be sometimes a good idea to reweight it to 0. So, instead of
doing [1]:

ceph osd out 3

I have tried [2]:

ceph osd crush reweight osd.3 0 # waiting for the rebalancing...
ceph osd out 3

and it worked. Then I could remove my osd with the online documentation:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

Now, the osd is removed and my cluster is HEALTH_OK. \o/

Now, my question is: why my cluster was definitely stuck to "active+remapped"
with [1] but was not with [2]? Personally, I have absolutely no explanation.
If you have an explanation, I'd love to know it. 

Should the "reweight" command be present in the online documentation?
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
If yes, I can make a pull request on the doc with pleasure. ;)

Regards.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time

2015-03-11 Thread Adolfo R. Brandes
On Wed, Feb 18, 2015 at 9:19 PM, Florian Haas wrote:
>> Hey everyone,
>>
>> I must confess I'm still not fully understanding this problem and
>> don't exactly know where to start digging deeper, but perhaps other
>> users have seen this and/or it rings a bell.
>>
>> System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2
>> different rulesets where the problem applies to hosts and PGs using a
>> bog-standard default crushmap.
>>
>> Symptom: out of the blue, ceph-osd processes on a single OSD node
>> start going to 100% CPU utilization. The problems turns so bad that
>> the machine is effectively becoming CPU bound and can't cope with any
>> client requests anymore. Stopping and restarting all OSDs brings the
>> problem right back, as does rebooting the machine — right after
>> ceph-osd processes start, CPU utilization shoots up again. Stopping
>> and marking out several OSDs on the machine makes the problem go away
>> but obviously causes massive backfilling. All the logs show while CPU
>> utilization is implausibly high are slow requests (which would be
>> expected in a system that can barely do anything).
>>
>> Now I've seen issues like this before on dumpling and firefly, but
>> besides the fact that they have all been addressed and should now be
>> fixed, they always involved the prior mass removal of RBD snapshots.
>> This system only used a handful of snapshots in testing, and is
>> presently not using any snapshots at all.
>>
>> I'll be spending some time looking for clues in the log files of the
>> OSDs that were shut down which caused the problem to go away, but if
>> this sounds familiar to anyone willing to offer clues, I'd be more
>> than interested. :) Thanks!
>>
>> Cheers,
>> Florian
>
>Dan vd Ster was kind enough to pitch in an incredibly helpful off-list
>reply, which I am taking the liberty to paraphrase here:
>
>That "mysterious" OSD madness seems to be caused by NUMA zone reclaim,
>which is enabled by default on Intel machines with recent kernels. It
>can be disabled as follows:
>
>echo 0 > /proc/sys/vm/zone_reclaim_mode
>
>or of course, "sysctl -w vm.zone_reclaim_mode=0" or the corresponding
>sysctl.conf entry.
>
>On the machines affected, that seems to have removed the CPU pegging
>issue, at least it has not reappeared for several days now.
>
>Dan and Sage have discussed the issue recently in this thread:
>http://www.spinics.net/lists/ceph-users/msg14914.html
>
>Thanks a million to Dan.

I'm looking into the original issue Florian describes above.  It seems
that unsetting zone_reclaim_mode wasn't the magical fix we hoped.  After
a couple of weeks, we're seeing pegged CPUs again, but his time we
managed to get a perf top snapshot of it happening.  These are the topmost
(ahem) lines:

8.33% [kernel] [k] _raw_spin_lock
3.14% perf [.] 0x000da124
2.58% [unknown] [.] 0x7f8a2901042d
1.85% libpython2.7.so.1.0 [.] 0x0006dac2
1.61% libc-2.17.so [.] __memcpy_ssse3_back
1.54% perf [.] dso__find_symbol
1.44% libc-2.17.so [.] __strcmp_sse42
1.41% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
1.25% [kernel] [k] native_write_msr_safe
1.24% perf [.] hists__output_resort
1.11% libleveldb.so.1.0.7 [.] 0x0003cde8
0.86% perf [.] perf_evsel__parse_sample
0.81% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
0.76% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
0.73% [kernel] [k] apic_timer_interrupt
0.71% [kernel] [k] page_fault
0.71% [kernel] [k] _raw_spin_lock_irqsave
0.62% libpthread-2.17.so [.] pthread_mutex_unlock
0.62% libc-2.17.so [.] __memcmp_sse4_1
0.61% libc-2.17.so [.] _int_malloc
0.60% perf [.] rb_next
0.58% [kernel] [k] clear_page_c_e
0.56% [kernel] [k] tg_load_down

The server in question was booted without any OSDs.  A few were started after
invoking 'perf top', during which run the CPUs were saturated.

Any ideas?

Cheers!
Adolfo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(

Help
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.com





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hang osd --zap-disk

2015-03-11 Thread Jesus Chavez (jeschave)

I don’t know what is going on =( the system hangs with the message below after 
commaand "ceph-deploy osd --zap-disk create tauro:sdb”

[tauro][WARNING] No data was received after 300 seconds, disconnecting...
[ceph_deploy.osd][DEBUG ] Host tauro is now ready for osd use.
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy osd activate 
tauro:sdb1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks tauro:/dev/sdb1:
[tauro][DEBUG ] connection detected need for sudo
[tauro][DEBUG ] connected to host: tauro
[tauro][DEBUG ] detect platform information from remote host
[tauro][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[ceph_deploy.osd][DEBUG ] activating host tauro disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[tauro][INFO  ] Running command: sudo ceph-disk -v activate --mark-init 
sysvinit --mount /dev/sdb1
[tauro][WARNING] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue 
-- /dev/sdb1
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[tauro][WARNING] DEBUG:ceph-disk:Mounting /dev/sdb1 on 
/var/lib/ceph/tmp/mnt.lNpFro with options noatime,inode64
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o 
noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.lNpFro
[tauro][WARNING] DEBUG:ceph-disk:Cluster uuid is 
fc72a252-15be-40e9-9de1-34593be5668a
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[tauro][WARNING] DEBUG:ceph-disk:Cluster name is ceph
[tauro][WARNING] DEBUG:ceph-disk:OSD uuid is 
bf192166-86e9-4c68-9bff-7ced1c9ba8ee
[tauro][WARNING] DEBUG:ceph-disk:Allocating OSD id...
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph 
--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd create --concise bf192166-86e9-4c68-9bff-7ced1c9ba8ee
[tauro][WARNING] 2015-03-11 17:49:31.782184 7f9cf05a8700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9cec0253f0 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cec025680).fault
[tauro][WARNING] 2015-03-11 17:49:35.782524 7f9cf04a7700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:37.781846 7f9cf05a8700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9ce00030e0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0003370).fault
[tauro][WARNING] 2015-03-11 17:49:41.782566 7f9cf04a7700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:43.782303 7f9cf05a8700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce00025d0).fault
[tauro][WARNING] 2015-03-11 17:49:47.784627 7f9cf04a7700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:49.782712 7f9cf05a8700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0002c60).fault
[tauro][WARNING] 2015-03-11 17:49:53.784690 7f9cf04a7700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9ce0003fb0 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0004240).fault
[tauro][WARNING] 2015-03-11 17:49:55.783248 7f9cf05a8700  0 -- :/1015927 >> 
192.168.4.35:6789/0 pipe(0x7f9ce0004930 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0004bc0)
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.com





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(

Help
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.com





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Add monitor unsuccesful

2015-03-11 Thread Steffen W Sørensen

On 12/03/2015, at 00.55, Jesus Chavez (jeschave)  wrote:

> can anybody tell me a good blog link that explain how to add monitor? I have 
> tried manually and also with ceph-deploy without success =(
Dunno if these might help U:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can not list objects in large bucket

2015-03-11 Thread Sean Sullivan
I have a single radosgw user with 2 s3 keys and 1 swift key. I have created a 
few buckets and I can list all of the contents of bucket A and C but not B with 
either S3 (boto) or python-swiftclient. I am able to list the first 1000 
entries using radosgw-admin 'bucket list --bucket=bucketB' without any issues 
but this doesn't really help.

The odd thing is I can still upload and download objects in the bucket. I just 
can't list them. I tried setting the bucket canned_acl to private and public 
but I still can't list the objects inside.

I'm using ceph .87 (Giant) Here is some info about the cluster::
http://pastebin.com/LvQYnXem -- ceph.conf
http://pastebin.com/efBBPCwa -- ceph -s
http://pastebin.com/tF62WMU9 -- radosgw-admin bucket list
http://pastebin.com/CZ8TkyNG -- python list bucket objects script
http://pastebin.com/TUCyxhMD -- radosgw-admin bucket stats --bucketB
http://pastebin.com/uHbEtGHs -- rados -p .rgw.buckets ls | grep default.20283.2 
(bucketB marker)
http://pastebin.com/WYwfQndV -- Python Error when trying to list BucketB via 
boto

I have no idea why this could be happening outside of the acl. Has anyone seen 
this before? Any idea on how I can get access to this bucket again via 
s3/swift? Also is there a way to list the full list of a bucket via 
radosgw-admin and not the first 9000 lines / 1000 entries, or a way to page 
through them?

EDIT:: I just fixed it (I hope) but the fix doesn't make any sense:

radosgw-admin bucket unlink --uid=user --bucket=bucketB
radosgw-admin bucket link --uid=user --bucket=bucketB 
--bucket-id=default.20283.2

Now with swift or s3 (boto) I am able to list the bucket contents without issue 
^_^

Can someone elaborate on why this works and how it broken in the first place 
when ceph was health_ok the entire time? With 3 replicas how did this happen? 
Could this be a bug?  sorry for the rambling. I am confused and tired ;p



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
Thanks Steffen I have followed everything not sure what is going on, the mon 
keyring and client admin are individual? Per mon host? Or do I need to copy 
from the first initial mon node?

Thanks again!


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

On Mar 11, 2015, at 6:28 PM, Steffen W Sørensen 
mailto:ste...@me.com>> wrote:


On 12/03/2015, at 00.55, Jesus Chavez (jeschave) 
mailto:jesch...@cisco.com>> wrote:

can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(
Dunno if these might help U:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shadow files

2015-03-11 Thread Ben

Anyone got any info on this?

Is it safe to delete shadow files?

On 2015-03-11 10:03, Ben wrote:

We have a large number of shadow files in our cluster that aren't
being deleted automatically as data is deleted.

Is it safe to delete these files?
Is there something we need to be aware of when deleting them?
Is there a script that we can run that will delete these safely?

Is there something wrong with our cluster that it isn't deleting these
files when it should be?

We are using civetweb with radosgw, with tengine ssl proxy infront of 
it


Any advice please
Thanks

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-11 Thread Malcolm Haak
Sorry about all the unrelated grep issues..

So I've rebuilt and reinstalled and it's still broken. 

On the working node, even with the new packages, everything works.
On the new broken node, I've added a mon and it works. But I still cannot start 
an OSD on the new node.

What else do you need from me? I'll get logs run any number of tests.

I've got data in this cluster already, and it's full so I need to expand it, 
I've already got hardware.

Thanks in advance for even having a look


-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: Wednesday, 11 March 2015 1:41 AM
To: Malcolm Haak; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
> Hi Samuel,
> 
> The sha1? I'm going to admit ignorance as to what you are looking for. They 
> are all running the same release if that is what you are asking. 
> Same tarball built into rpms using rpmbuild on both nodes... 
> Only difference being that the other node has been upgraded and the problem 
> node is fresh.
> 
> added the requested config here is the command line output
> 
> microserver-1:/etc # /etc/init.d/ceph start osd.3
> === osd.3 === 
> Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
> 2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
> 2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 --> 
> 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
> 0x7f05ac0290b0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
> my addr 192.168.0.250:0/1002795
> 2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 
> 0 0) 0x7f05aab0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  
> 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
> 2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 --> 
> 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 
> con 0x7f05ac027c40
> 2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 --> 
> 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
> 0x7f05ac029a50 con 0x7f05ac027c40
> 2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 --> 
> 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
> 0x7f05ac029f20 con 0x7f05ac027c40
> 2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 
> 0 0) 0x7f05aab0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 
> (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3 
>  5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
> 2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
> (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3 
>  5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
> 2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 
> (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 --> 
> 192.168.0.10:6789/0 -- mon_command({"prefix": "get_command_descriptions"} v 
> 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.564968 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 9  mon_command_ack([{"prefix": 
> "get_command_descriptions"}]=0  v0) v1  72+0+34995 (1092875540 0 
> 1727986498) 0x7f05aa70 con 0x7f05ac027c40
> 2015-03-11 01:00:13.770122 7f05b2f72700  1 -- 192.168.0.250:0/1002795 --> 
> 192.168.0.10:6789/0 -- mon_command({"prefix": "osd crush create-or-move", 
> "args": ["host=microserver-1", "root=default"], "id": 3, "weight": 1.81} v 0) 
> v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
> 2015-03-11 01:00:13.772299 7f05abfff700  1 -- 192.168.0.250:0/1002795 <== 
> mon.0 192.168.0.10:6789/0 10  mon_command_ack([{"prefix": "osd crush 
> create-or-move", "args": ["host=microserver-1", "root=default"], "id": 3, 
> "w