Re: [ceph-users] osds fails to start with mismatch in id

2014-11-11 Thread Ramakrishna Nishtala (rnishtal)
Hi
It appears that in case of pre-created partitions, ceph-deploy create, unable 
to change the partition guid’s. The parted guid remains as it is.

Ran manually sgdisk on each partition as
sgdisk --change-name="2:ceph data" --partition-guid="2:${osd_uuid}" 
--typecode="2:${ptype2}" /dev/${i}.
The typecode for journal and data picked up from ceph-disk-udev.

Udev working fine now after reboot and not required to make any changes in 
fstab. All osd’s are up too.
ceph -s
cluster 9c6cd1ae-66bf-45ce-b7ba-0256b572a8b7
 health HEALTH_OK
 osdmap e358: 60 osds: 60 up, 60 in
  pgmap v1258: 4096 pgs, 1 pools, 0 bytes data, 0 objects
2802 MB used, 217 TB / 217 TB avail
4096 active+clean

Thanks to all who responded.

Regards,

Rama

From: Daniel Schwager [mailto:daniel.schwa...@dtnet.de]
Sent: Monday, November 10, 2014 10:39 PM
To: 'Irek Fasikhov'; Ramakrishna Nishtala (rnishtal); 'Gregory Farnum'
Cc: 'ceph-us...@ceph.com'
Subject: RE: [ceph-users] osds fails to start with mismatch in id

Hi Ramakrishna,

we use the phy. path (containing the serial number) to a disk to prevent 
complexity and wrong mapping... This path will never change:

/etc/ceph/ceph.conf
[osd.16]
devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z0SDCY-part1
osd_journal = 
/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
...

regards
Danny



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Irek 
Fasikhov
Sent: Tuesday, November 11, 2014 6:36 AM
To: Ramakrishna Nishtala (rnishtal); Gregory Farnum
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osds fails to start with mismatch in id

Hi, Ramakrishna.
I think you understand what the problem is:
[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami
56
[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami
57


Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) 
mailto:rnish...@cisco.com>>:

Hi Greg,

Thanks for the pointer. I think you are right. The full story is like this.



After installation, everything works fine until I reboot. I do observe udevadm 
getting triggered in logs, but the devices do not come up after reboot. Exact 
issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while 
back per the case details.

As a workaround, I copied the contents from /proc/mounts to fstab and that’s 
where I landed into the issue.



After your suggestion, defined as UUID in fstab, but similar problem.

blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid 
explicitly to get the UUID’s. Goes in line with ceph-disk comments.



Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird 
that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc.



Before reboot

lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 -> 
../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e -> 
../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 -> 
../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 -> 
../../sdc2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 -> 
../../sdb2



After reboot

lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 -> 
../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e -> 
../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 -> 
../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 -> 
../../sdb2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 -> 
../../sdh2



Essentially, the transformation here is sdb2->sdh2 and sdc2-> sdb2. In fact I 
haven’t partitioned my sdh at all before the test. The only difference probably 
from the standard procedure is I have pre-created the partitions for the 
journal and data, with parted.



/lib/udev/rules.d  osd rules has four different partition GUID codes,

"45b0969e-9b03-4f30-b4c6-5ec00ceff106",

"45b0969e-9b03-4f30-b4c6-b4b80ceff106",

"4fbd7e29-9d25-41b8-afd0-062c0ceff05d",

"4fbd7e29-9d25-41b8-afd0-5ec00ceff05d",



But all my partitions journal/data are having 
ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code.



Appreciate any help.



Regards,



Rama

=

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.com]
Sent: Sunday, November 09, 2014 3:36 PM
To: Ramakrishna Nishtala (rnishtal)
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osds fails to start with mismatch in id



On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) 
mailto:rnish...@cisco.com>> wrote:

> Hi

>

> I am on ceph 0.87, RHEL 7

>

> Out of 60 few osd’s start and the rest complain about mismatch about

> i

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
Thanks for your reply Sage!

I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
Stop osd.117
Export 8.6ae from osd.117
Remove 8.6ae from osd.117
start osd.117
restart osd.190 after still showing incomplete

After this the PG was still showing incomplete and ceph pg dump_stuck
inactive shows -
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

I then tried an export from OSD 190 to OSD 117 by doing -
Stop osd.190 and osd.117
Export pg 8.6ae from osd.190
Import from file generated in previous step into osd.117
Boot both osd.190 and osd.117

When osd.117 attempts to start it generates an failed assert, full log
is here http://pastebin.com/S4CXrTAL
-1> 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
 0> 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
time 2014-11-11 17:25:18.602626
osd/OSD.h: 715: FAILED assert(ret)

 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xb8231b]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
 3: (OSD::load_pgs()+0x1b78) [0x6aae18]
 4: (OSD::init()+0x71f) [0x6abf5f]
 5: (main()+0x252c) [0x638cfc]
 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
 7: /usr/bin/ceph-osd() [0x651027]

I also attempted the same steps with 8.ca and got the same results.
The below is the current state of the pg with it removed from osd.111
-
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
12:57:58.162789

Any idea of where I can go from here?
One thought I had was setting osd.111 and osd.117 out of the cluster
and once the data is moved I can shut them down and mark them as lost
which would make osd.190 the only replica available for those PG's.

Thanks again

On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil  wrote:
> On Tue, 11 Nov 2014, Matthew Anderson wrote:
>> Just an update, it appears that no data actually exists for those PG's
>> on osd.117 and osd.111 but it's showing as incomplete anyway.
>>
>> So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
>> filled with data.
>> For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
>> filled with data as before.
>>
>> Since all of the required data is on OSD.190, would there be a way to
>> make osd.111 and osd.117 forget they have ever seen the two incomplete
>> PG's and therefore restart backfilling?
>
> Ah, that's good news.  You should know that the copy on osd.190 is
> slightly out of date, but it is much better than losing the entire
> contents of the PG.  More specifically, for 8.6ae the latest version was
> 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
> need to fsck the RBD images after this is all done.
>
> I don't think we've tested this recovery scenario, but I think you'll be
> able to recovery with ceph_objectstore_tool, which has an import/export
> function and a delete function.  First, try removing the newer version of
> the pg on osd.117.  First export it for good measure (even tho it's
> empty):
>
> stop the osd
>
> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
> --journal-path /var/lib/ceph/osd/ceph-117/journal \
> --op export --pgid 8.6ae --file osd.117.8.7ae
>
> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
> --journal-path /var/lib/ceph/osd/ceph-117/journal \
> --op remove --pgid 8.6ae
>
> and restart.  If that doesn't peer, you can also try exporting the pg from
> osd.190 and importing it into osd.117.  I think just removing the
> newer empty pg on osd.117 will do the trick, though...
>
> sage
>
>
>
>>
>>
>> On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
>>  wrote:
>> > Hi All,
>> >
>> > We've had a string of very unfortunate failures and need a hand fixing
>> > the incomplete PG's that we're now left with. We're configured with 3
>> > replicas over different hosts with 5 in total.
>> >
>> > The timeline goes -
>> > -1 week  :: A full server goes offline with a failed backplane. Still
>> > not working
>> > -1 day  ::  OSD 190 fails
>> > -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
>> > out several PG's and blocking IO
>> > Today  :: The first failed osd (osd.190) was cloned to a good drive
>> > with xfs_dump | xfs_restore and now boots fine. The last failed osd
>> > (osd.121) is comple

[ceph-users] Stackforge Puppet Module

2014-11-11 Thread Nick Fisk
Hi,

I'm just looking through the different methods of deploying Ceph and I
particularly liked the idea that the stackforge puppet module advertises of
using discover to automatically add new disks. I understand the principle of
how it should work; using ceph-disk list to find unknown disks, but I would
like to see in a little more detail on how it's been implemented.

I've been looking through the puppet module on Github, but I can't see
anyway where this discovery is carried out.

Could anyone confirm if this puppet modules does currently support the auto
discovery and where  in the code its carried out?

Many Thanks,
Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weight field in osd dump & osd tree

2014-11-11 Thread Mallikarjun Biradar
Hi all

When Issued ceph osd dump it displays weight for that osd as 1 and when
issued osd tree it displays 0.35

output from osd dump:
{ "osd": 20,
  "uuid": "b2a97a29-1b8a-43e4-a4b0-fd9ee351086e",
  "up": 1,
  "in": 1,
  "weight": "1.00",
  "primary_affinity": "1.00",
  "last_clean_begin": 0,
  "last_clean_end": 0,
  "up_from": 103,
  "up_thru": 106,
  "down_at": 0,
  "lost_at": 0,
  "public_addr": "10.242.43.116:6820\/27623",
  "cluster_addr": "10.242.43.116:6821\/27623",
  "heartbeat_back_addr": "10.242.43.116:6822\/27623",
  "heartbeat_front_addr": "10.242.43.116:6823\/27623",
  "state": [
"exists",
"up"]}],

output from osd tree:
# idweight  type name   up/down reweight
-1  7.35root default
-2  2.8 host rack6-storage-5
0   0.35osd.0   up  1
1   0.35osd.1   up  1
2   0.35osd.2   up  1
3   0.35osd.3   up  1
4   0.35osd.4   up  1
5   0.35osd.5   up  1
6   0.35osd.6   up  1
7   0.35osd.7   up  1
-3  2.8 host rack6-storage-4
8   0.35osd.8   up  1
9   0.35osd.9   up  1
10  0.35osd.10  up  1
11  0.35osd.11  up  1
12  0.35osd.12  up  1
13  0.35osd.13  up  1
14  0.35osd.14  up  1
15  0.35osd.15  up  1
-4  1.75host rack6-storage-6
16  0.35osd.16  up  1
17  0.35osd.17  up  1
18  0.35osd.18  up  1
19  0.35osd.19  up  1
20  0.35osd.20  up  1

Please help me to understand this

-regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stackforge Puppet Module

2014-11-11 Thread David Moreau Simard
Hi Nick,

The great thing about puppet-ceph's implementation on Stackforge is that it is 
both unit and integration tested.
You can see the integration tests here: 
https://github.com/ceph/puppet-ceph/tree/master/spec/system

Where I'm getting at is that the tests allow you to see how you can use the 
module to a certain extent.
For example, in the OSD integration tests:
- 
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L24
 and then:
- 
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L82-L110

There's no auto discovery mechanism built-in the module right now. It's kind of 
dangerous, you don't want to format the wrong disks.

Now, this doesn't mean you can't "discover" the disks yourself and pass them to 
the module from your site.pp or from a composition layer.
Here's something I have for my CI environment that uses the $::blockdevices 
fact to discover all devices, split that fact into a list of the devices and 
then reject the drives I don't want (such as the OS disk):

# Assume OS is installed on xvda/sda/vda.
# On an Openstack VM, vdb is ephemeral, we don't want to use vdc.
# WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH!
$block_devices = reject(split($::blockdevices, ','), 
'(xvda|sda|vda|vdc|sr0)')
$devices = prefix($block_devices, '/dev/')

And then you can pass $devices to the module.

Let me know if you have any questions !
--
David Moreau Simard

> On Nov 11, 2014, at 6:23 AM, Nick Fisk  wrote:
> 
> Hi,
> 
> I'm just looking through the different methods of deploying Ceph and I
> particularly liked the idea that the stackforge puppet module advertises of
> using discover to automatically add new disks. I understand the principle of
> how it should work; using ceph-disk list to find unknown disks, but I would
> like to see in a little more detail on how it's been implemented.
> 
> I've been looking through the puppet module on Github, but I can't see
> anyway where this discovery is carried out.
> 
> Could anyone confirm if this puppet modules does currently support the auto
> discovery and where  in the code its carried out?
> 
> Many Thanks,
> Nick
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump & osd tree

2014-11-11 Thread Christian Balzer
On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:

> Hi all
> 
> When Issued ceph osd dump it displays weight for that osd as 1 and when
> issued osd tree it displays 0.35
> 

There are many threads about this, google is your friend. For example:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html

In short, one is the CRUSH weight (usually based on the capacity of the
OSD), the other is the OSD weight (or reweight in the tree display). 

For example think about a cluster with 100 2TB OSDs and you're planning to
replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
speed, so if you would just replace things, more and more data would
migrate to your bigger OSDs, making the whole cluster actually slower.
Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
replacement is complete) will result in them getting the same allocation as
the 2TB ones, keeping things even.

Christian

> output from osd dump:
> { "osd": 20,
>   "uuid": "b2a97a29-1b8a-43e4-a4b0-fd9ee351086e",
>   "up": 1,
>   "in": 1,
>   "weight": "1.00",
>   "primary_affinity": "1.00",
>   "last_clean_begin": 0,
>   "last_clean_end": 0,
>   "up_from": 103,
>   "up_thru": 106,
>   "down_at": 0,
>   "lost_at": 0,
>   "public_addr": "10.242.43.116:6820\/27623",
>   "cluster_addr": "10.242.43.116:6821\/27623",
>   "heartbeat_back_addr": "10.242.43.116:6822\/27623",
>   "heartbeat_front_addr": "10.242.43.116:6823\/27623",
>   "state": [
> "exists",
> "up"]}],
> 
> output from osd tree:
> # idweight  type name   up/down reweight
> -1  7.35root default
> -2  2.8 host rack6-storage-5
> 0   0.35osd.0   up  1
> 1   0.35osd.1   up  1
> 2   0.35osd.2   up  1
> 3   0.35osd.3   up  1
> 4   0.35osd.4   up  1
> 5   0.35osd.5   up  1
> 6   0.35osd.6   up  1
> 7   0.35osd.7   up  1
> -3  2.8 host rack6-storage-4
> 8   0.35osd.8   up  1
> 9   0.35osd.9   up  1
> 10  0.35osd.10  up  1
> 11  0.35osd.11  up  1
> 12  0.35osd.12  up  1
> 13  0.35osd.13  up  1
> 14  0.35osd.14  up  1
> 15  0.35osd.15  up  1
> -4  1.75host rack6-storage-6
> 16  0.35osd.16  up  1
> 17  0.35osd.17  up  1
> 18  0.35osd.18  up  1
> 19  0.35osd.19  up  1
> 20  0.35osd.20  up  1
> 
> Please help me to understand this
> 
> -regards,
> Mallikarjun Biradar


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump & osd tree

2014-11-11 Thread Loic Dachary
Hi Christian,

On 11/11/2014 13:09, Christian Balzer wrote:
> On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:
> 
>> Hi all
>>
>> When Issued ceph osd dump it displays weight for that osd as 1 and when
>> issued osd tree it displays 0.35
>>
> 
> There are many threads about this, google is your friend. For example:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html
> 
> In short, one is the CRUSH weight (usually based on the capacity of the
> OSD), the other is the OSD weight (or reweight in the tree display). 
> 
> For example think about a cluster with 100 2TB OSDs and you're planning to
> replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
> speed, so if you would just replace things, more and more data would
> migrate to your bigger OSDs, making the whole cluster actually slower.
> Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
> replacement is complete) will result in them getting the same allocation as
> the 2TB ones, keeping things even.

It is a great example. Would you like to add it to 
http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If you do 
not have time, I volunteer to do it :-)

Cheers

> 
> Christian
> 
>> output from osd dump:
>> { "osd": 20,
>>   "uuid": "b2a97a29-1b8a-43e4-a4b0-fd9ee351086e",
>>   "up": 1,
>>   "in": 1,
>>   "weight": "1.00",
>>   "primary_affinity": "1.00",
>>   "last_clean_begin": 0,
>>   "last_clean_end": 0,
>>   "up_from": 103,
>>   "up_thru": 106,
>>   "down_at": 0,
>>   "lost_at": 0,
>>   "public_addr": "10.242.43.116:6820\/27623",
>>   "cluster_addr": "10.242.43.116:6821\/27623",
>>   "heartbeat_back_addr": "10.242.43.116:6822\/27623",
>>   "heartbeat_front_addr": "10.242.43.116:6823\/27623",
>>   "state": [
>> "exists",
>> "up"]}],
>>
>> output from osd tree:
>> # idweight  type name   up/down reweight
>> -1  7.35root default
>> -2  2.8 host rack6-storage-5
>> 0   0.35osd.0   up  1
>> 1   0.35osd.1   up  1
>> 2   0.35osd.2   up  1
>> 3   0.35osd.3   up  1
>> 4   0.35osd.4   up  1
>> 5   0.35osd.5   up  1
>> 6   0.35osd.6   up  1
>> 7   0.35osd.7   up  1
>> -3  2.8 host rack6-storage-4
>> 8   0.35osd.8   up  1
>> 9   0.35osd.9   up  1
>> 10  0.35osd.10  up  1
>> 11  0.35osd.11  up  1
>> 12  0.35osd.12  up  1
>> 13  0.35osd.13  up  1
>> 14  0.35osd.14  up  1
>> 15  0.35osd.15  up  1
>> -4  1.75host rack6-storage-6
>> 16  0.35osd.16  up  1
>> 17  0.35osd.17  up  1
>> 18  0.35osd.18  up  1
>> 19  0.35osd.19  up  1
>> 20  0.35osd.20  up  1
>>
>> Please help me to understand this
>>
>> -regards,
>> Mallikarjun Biradar
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied

2014-11-11 Thread ವಿನೋದ್ Vinod H I
Hi,
I am having problems accessing rados gateway using swift interface.
I am using ceph firefly version and have configured a "us" region as
explained in the docs.
There are two zones "us-east" and "us-west".
us-east gateway is running on host ceph-node-1 and us-west gateway is
running on host ceph-node-2.

Here is the output when i try to connect with swift interface.

user1@ceph-node-4:~$ swift -A http://ceph-node-1/auth -U "useast:swift" -K
"FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw" --debug stat
INFO:urllib3.connectionpool:Starting new HTTP connection (1): ceph-node-1
DEBUG:urllib3.connectionpool:Setting read timeout to 
DEBUG:urllib3.connectionpool:"GET /auth HTTP/1.1" 403 23
INFO:swiftclient:REQ: curl -i http://ceph-node-1/auth -X GET
INFO:swiftclient:RESP STATUS: 403 Forbidden
INFO:swiftclient:RESP HEADERS: [('date', 'Tue, 11 Nov 2014 12:30:58 GMT'),
('accept-ranges', 'bytes'), ('content-type', 'application/json'),
('content-length', '23'), ('server', 'Apache/2.2.22 (Ubuntu)')]
INFO:swiftclient:RESP BODY: {"Code":"AccessDenied"}
ERROR:swiftclient:Auth GET failed: http://ceph-node-1/auth 403 Forbidden
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1181,
in _retry
self.url, self.token = self.get_auth()
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1155,
in get_auth
insecure=self.insecure)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 318,
in get_auth
insecure=insecure)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 241,
in get_auth_1_0
http_reason=resp.reason)
ClientException: Auth GET failed: http://ceph-node-1/auth 403 Forbidden
Auth GET failed: http://ceph-node-1/auth 403 Forbidden

The region map is as follows.

vinod@ceph-node-1:~$ radosgw-admin region get
--name=client.radosgw.us-east-1

{ "name": "us",
  "api_name": "us",
  "is_master": "true",
  "endpoints": [],
  "master_zone": "us-east",
  "zones": [
{ "name": "us-east",
  "endpoints": [
"http:\/\/ceph-node-1:80\/"],
  "log_meta": "true",
  "log_data": "true"},
{ "name": "us-west",
  "endpoints": [
"http:\/\/ceph-node-2:80\/"],
  "log_meta": "true",
  "log_data": "true"}],
  "placement_targets": [
{ "name": "default-placement",
  "tags": []}],
  "default_placement": "default-placement"}

The user info is follows.
vinod@ceph-node-1:~$ radosgw-admin user info --uid=useast
--name=client.radosgw.us-east-1
{ "user_id": "useast",
  "display_name": "Region-US Zone-East",
  "email": "",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [
{ "id": "useast:swift",
  "permissions": "full-control"}],
  "keys": [
{ "user": "useast",
  "access_key": "45BEF1XQ3Z94B0LIBTLX",
  "secret_key": "123"},
{ "user": "useast:swift",
  "access_key": "WF2QYTY0LDN66CHJ8JSE",
  "secret_key": ""}],
  "swift_keys": [
{ "user": "useast:swift",
  "secret_key": "FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw"}],
  "caps": [],
  "op_mask": "read, write, delete",
  "system": "true",
  "default_placement": "",
  "placement_tags": [],
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1},
  "user_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1},
  "temp_url_keys": []}

Contents of rgw-us-east.conf file is as follows.

vinod@ceph-node-1:~$ cat /etc/apache2/sites-enabled/rgw-us-east.conf
FastCgiExternalServer /var/www/s3gw.fcgi -socket
/var/run/ceph/client.radosgw.us-east-1.sock



ServerName ceph-node-1
ServerAdmin vinvi...@gmail.com
DocumentRoot /var/www
RewriteEngine On
RewriteRule  ^/(.*) /s3gw.fcgi?%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]



Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
Allow from all
AuthBasicAuthoritative Off



AllowEncodedSlashes On
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined
ServerSignature Off



Can someone point out to me where am i doing wrong?

-- 
Vinod
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-11 Thread Jasper Siero
No problem thanks for helping. 
I don't want to disable the deep scrubbing process itself because its very 
useful but one placement group (3.30) is continuously deep scrubbing and it 
should finish after some time but it won't.

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: maandag 10 november 2014 18:24
Aan: Jasper Siero
CC: ceph-users; John Spray
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

It's supposed to do that; deep scrubbing is an ongoing
consistency-check mechanism. If you really want to disable it you can
set an osdmap flag to prevent it, but you'll have to check the docs
for exactly what that is as I can't recall.
Glad things are working for you; sorry it took so long!
-Greg

On Mon, Nov 10, 2014 at 8:49 AM, Jasper Siero
 wrote:
> Hello John and Greg,
>
> I used the new patch and now the undump succeeded and the mds is working fine 
> and I can mount cephfs again!
>
> I still have one placement group which keeps deep scrubbing even after 
> restarting the ceph cluster:
> dumped all in format plain
> 3.300   0   0   0   0   0   0   
> active+clean+scrubbing+deep 2014-11-10 17:21:15.866965  0'0 
> 2414:418[1,9]   1   [1,9]   1   631'34632014-08-21 
> 15:14:45.430926  602'31312014-08-18 15:14:37.494913
>
> I there a way to solve this?
>
> Kind regards,
>
> Jasper
> 
> Van: Gregory Farnum [g...@gregs42.com]
> Verzonden: vrijdag 7 november 2014 22:42
> Aan: Jasper Siero
> CC: ceph-users; John Spray
> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full
>
> On Thu, Nov 6, 2014 at 11:49 AM, John Spray  wrote:
>> This is still an issue on master, so a fix will be coming soon.
>> Follow the ticket for updates:
>> http://tracker.ceph.com/issues/10025
>>
>> Thanks for finding the bug!
>
> John is off for a vacation, but he pushed a branch wip-10025-firefly
> that if you install that (similar address to the other one) should
> work for you. You'll need to reset and undump again (I presume you
> still have the journal-as-a-file). I'll be merging them in to the
> stable branches pretty shortly as well.
> -Greg
>
>>
>> John
>>
>> On Thu, Nov 6, 2014 at 6:21 PM, John Spray  wrote:
>>> Jasper,
>>>
>>> Thanks for this -- I've reproduced this issue in a development
>>> environment.  We'll see if this is also an issue on giant, and
>>> backport a fix if appropriate.  I'll update this thread soon.
>>>
>>> Cheers,
>>> John
>>>
>>> On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
>>>  wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens 
 Gregory Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
  wrote:
> Hello Greg,
>
> I added the debug options which you mentioned and started the process 
> again:
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --reset-journal 0
> old journal was 9483323613~134233517
> new journal start will be 9621733376 (4176246 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c 
> /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 
> journaldumptgho-mon001
> undump journaldumptgho-mon001
> start 9483323613 len 134213311
> writing header 200.
>  writing 9483323613~1048576
>  writing 9484372189~1048576
>  writing 9485420765~1048576
>  writing 9486469341~1048576
>  writing 9487517917~1048576
>  writing 9488566493~1048576
>  writing 9489615069~1048576
>  writing 9490663645~1048576
>  writing 9491712221~1048576
>  writing 9492760797~1048576
>  writing 9493809373~1048576
>  writing 9494857949~1048576
>  writing 9495906525~1048576
>  writing 9496955101~1048576
>  writing 9498003677~1048576
>  writing 9499052253~1048576
>  writing 9500100829~1048576
>  writing 9501149405~1048576
>  writing 9502197981~1048576
>  writing 9503246557~1048576
>  writing 9504295133~1048576
>  writing 9505343709~1048576
>  writing 9506392285~1048576
>  writing 9507440861~1048576
>  writing 9508489437~1048576
>  writing 9509538013~1048576
>  writing 951

Re: [ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied

2014-11-11 Thread Daniel Schneller

On 2014-11-11 13:12:32 +, ವಿನೋದ್ Vinod H I said:


Hi,
I am having problems accessing rados gateway using swift interface.
I am using ceph firefly version and have configured a "us" region as 
explained in the docs.

There are two zones "us-east" and "us-west".
us-east gateway is running on host ceph-node-1 and us-west gateway is 
running on host ceph-node-2.


[...]



Auth GET failed: http://ceph-node-1/auth 403 Forbidden
[...]



  "swift_keys": [
        { "user": "useast:swift",
          "secret_key": "FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw"}],


We have seen problems when the secret_key has special characters. I am 
not sure if "+" is one of them, but the manual states this somewhere. 
Try setting the key explictly or by re-generating one until you get one 
without any special chars.


Drove me nuts.

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weight field in osd dump & osd tree

2014-11-11 Thread Mallikarjun Biradar
Thanks christian.. Got clear about the concept.. thanks very much :)

On Tue, Nov 11, 2014 at 5:47 PM, Loic Dachary  wrote:

> Hi Christian,
>
> On 11/11/2014 13:09, Christian Balzer wrote:
> > On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote:
> >
> >> Hi all
> >>
> >> When Issued ceph osd dump it displays weight for that osd as 1 and when
> >> issued osd tree it displays 0.35
> >>
> >
> > There are many threads about this, google is your friend. For example:
> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html
> >
> > In short, one is the CRUSH weight (usually based on the capacity of the
> > OSD), the other is the OSD weight (or reweight in the tree display).
> >
> > For example think about a cluster with 100 2TB OSDs and you're planning
> to
> > replace them (bit by bit) with 4TB OSDs. But the hard disks are the same
> > speed, so if you would just replace things, more and more data would
> > migrate to your bigger OSDs, making the whole cluster actually slower.
> > Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the
> > replacement is complete) will result in them getting the same allocation
> as
> > the 2TB ones, keeping things even.
>
> It is a great example. Would you like to add it to
> http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If
> you do not have time, I volunteer to do it :-)
>
> Cheers
>
> >
> > Christian
> >
> >> output from osd dump:
> >> { "osd": 20,
> >>   "uuid": "b2a97a29-1b8a-43e4-a4b0-fd9ee351086e",
> >>   "up": 1,
> >>   "in": 1,
> >>   "weight": "1.00",
> >>   "primary_affinity": "1.00",
> >>   "last_clean_begin": 0,
> >>   "last_clean_end": 0,
> >>   "up_from": 103,
> >>   "up_thru": 106,
> >>   "down_at": 0,
> >>   "lost_at": 0,
> >>   "public_addr": "10.242.43.116:6820\/27623",
> >>   "cluster_addr": "10.242.43.116:6821\/27623",
> >>   "heartbeat_back_addr": "10.242.43.116:6822\/27623",
> >>   "heartbeat_front_addr": "10.242.43.116:6823\/27623",
> >>   "state": [
> >> "exists",
> >> "up"]}],
> >>
> >> output from osd tree:
> >> # idweight  type name   up/down reweight
> >> -1  7.35root default
> >> -2  2.8 host rack6-storage-5
> >> 0   0.35osd.0   up  1
> >> 1   0.35osd.1   up  1
> >> 2   0.35osd.2   up  1
> >> 3   0.35osd.3   up  1
> >> 4   0.35osd.4   up  1
> >> 5   0.35osd.5   up  1
> >> 6   0.35osd.6   up  1
> >> 7   0.35osd.7   up  1
> >> -3  2.8 host rack6-storage-4
> >> 8   0.35osd.8   up  1
> >> 9   0.35osd.9   up  1
> >> 10  0.35osd.10  up  1
> >> 11  0.35osd.11  up  1
> >> 12  0.35osd.12  up  1
> >> 13  0.35osd.13  up  1
> >> 14  0.35osd.14  up  1
> >> 15  0.35osd.15  up  1
> >> -4  1.75host rack6-storage-6
> >> 16  0.35osd.16  up  1
> >> 17  0.35osd.17  up  1
> >> 18  0.35osd.18  up  1
> >> 19  0.35osd.19  up  1
> >> 20  0.35osd.20  up  1
> >>
> >> Please help me to understand this
> >>
> >> -regards,
> >> Mallikarjun Biradar
> >
> >
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett
Ok I believe I’ve made some progress here. I have everything syncing *except* 
data. The data is getting 500s when it tries to sync to the backup zone. I have 
a log from the radosgw with debug cranked up to 20:

2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request 
req=0x7f546800f3b0 =
2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header 
for bucket, generating default
2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 --> 
172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 
statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 
submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call 
statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 
172.16.10.103:6934/14875, have pipe.
2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 
osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 
193.1cf20a5a ondisk+write e47531) v4
2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer sending 48 0x7f534800d770
2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got ACK
2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got ack seq 48
2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got MSG
2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0
2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600
2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got front 190
2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).aborted = 0
2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 
statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6
2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 
0x7f51b4001950 prio 127
2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 
172.16.10.103:6

[ceph-users] long term support version?

2014-11-11 Thread Chad Seys
Hi all,

Did I notice correctly that firefly is going to be supported "long term" 
whereas Giant is not going to be supported as long?

http://ceph.com/releases/v0-80-firefly-released/
This release will form the basis for our long-term supported release Firefly, 
v0.80.x.

http://ceph.com/uncategorized/v0-87-giant-released/
This release will form the basis for the stable release Giant, v0.87.x.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-11 Thread Chad Seys
Thanks Craig,

I'll jiggle the OSDs around to see if that helps.

Otherwise, I'm almost certain removing the pool will work. :/

Have a good one,
Chad.

> I had the same experience with force_create_pg too.
> 
> I ran it, and the PGs sat there in creating state.  I left the cluster
> overnight, and sometime in the middle of the night, they created.  The
> actual transition from creating to active+clean happened during the
> recovery after a single OSD was kicked out.  I don't recall if that single
> OSD was responsible for the creating PGs.  I really can't say what
> un-jammed my creating.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] InInstalling ceph on a single machine with cephdeploy ubuntu 14.04 64 bit

2014-11-11 Thread tejaksjy

Hi,

I am unable to figure out how to install and deploy ceph on a single machine 
with ceph deploy. I have ubuntu 14.04 - 64 bit installed in a virtual machine 
(on windows 8.1 through VMware player)  and have installed devstack on ubuntu. 
I am trying to install ceph on the same machine (Ubuntu) and interface with 
openstack. I have tried the following steps but it says that mkcephfs does not 
exist and I read that it is deprecated and ceph - deploy is there. But 
documentation talks about multiple nodes. I am lost as to how to use ceph 
deploy and install and setup ceph on a single machine. Pl guide me. I tried the 
following steps earlier which was given for mkcephfs.

<<( reference http://eu.ceph.com/docs/wip-6919/start/quick-start/ sudo apt-get 
update && sudo apt-get install ceph (2) Execute hostname -s on the command line 
to retrieve the name of your host. Then, replace {hostname} in the sample 
configuration file with your host name. Execute ifconfig on the command line to 
retrieve the IP address of your host. Then, replace {ip-address} with the IP 
address of your host. Finally, copy the contents of the modified configuration 
file and save it to /etc/ceph/ceph.conf. This file will configure Ceph to 
operate a monitor, two OSD daemons and one metadata server on your local machin

[osd] osd journal size = 1000 filestore xattr use omap = true
# Execute $ hostname to retrieve the name of your host,
# and replace {hostname} with the name of your host.
# For the monitor, replace {ip-address} with the IP
# address of your host.


[mon.a]
host = {hostname}
mon addr = {ip-address}:6789


[osd.0] host = {hostname}

[osd.1] host = {hostname}

[mds.a] host = {hostname}

sudo mkdir /var/lib/ceph/osd/ceph-0 sudo mkdir /var/lib/ceph/osd/ceph-1 sudo 
mkdir /var/lib/ceph/mon/ceph-a sudo mkdir /var/lib/ceph/mds/ceph-a

cd /etc/ceph sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

sudo service ceph start ceph health



>

Regards






Sent from Windows Mail___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long term support version?

2014-11-11 Thread Gregory Farnum
Yep! Every other stable release gets the LFS treatment. We're still fixing
bugs and backporting some minor features to Dumpling, but haven't done any
serious updates to Emperor since Firefly came out. Giant will be superseded
by Hammer in the February timeframe, if I have my dates right.
-Greg
On Tue, Nov 11, 2014 at 8:54 AM Chad Seys  wrote:

> Hi all,
>
> Did I notice correctly that firefly is going to be supported "long term"
> whereas Giant is not going to be supported as long?
>
> http://ceph.com/releases/v0-80-firefly-released/
> This release will form the basis for our long-term supported release
> Firefly,
> v0.80.x.
>
> http://ceph.com/uncategorized/v0-87-giant-released/
> This release will form the basis for the stable release Giant, v0.87.x.
>
> Thanks!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-11 Thread Gregory Farnum
On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer  wrote:
>
> Hello,
>
> One of my clusters has become busy enough (I'm looking at you, evil Window
> VMs that I shall banish elsewhere soon) to experience client noticeable
> performance impacts during deep scrub.
> Before this I instructed all OSDs to deep scrub in parallel at Saturday
> night and that finished before Sunday morning.
> So for now I'll fire them off one by one to reduce the load.
>
> Looking forward, that cluster doesn't need more space so instead of adding
> more hosts and OSDs I was thinking of a cache pool instead.
>
> I suppose that will keep the clients happy while the slow pool gets
> scrubbed.
> Is there anybody who tested cache pools with Firefly and compared the
> performance to Giant?
>
> For testing I'm currently playing with a single storage node and 8 SSD
> backed OSDs.
> Now what very much blew my mind is that a pool with a replication of 1
> still does quite the impressive read orgy, clearly reading all the data in
> the PGs.
> Why? And what is it comparing that data with, the cosmic background
> radiation?

Yeah, cache pools currently do full-object promotions whenever an
object is accessed. There are some ideas and projects to improve this
or reduce its effects, but they're mostly just getting started.
At least, I assume that's what you mean by a read orgy; perhaps you
are seeing something else entirely?

Also, even on cache pools you don't really want to run with 1x
replication as they hold the only copy of whatever data is dirty...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-11 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero
 wrote:
> No problem thanks for helping.
> I don't want to disable the deep scrubbing process itself because its very 
> useful but one placement group (3.30) is continuously deep scrubbing and it 
> should finish after some time but it won't.

Hmm, how are you determining that this one PG won't stop scrubbing?
This doesn't sound like any issues familiar to me.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Hi Guys,

We ran into this issue after we nearly max’ed out the sod’s. Since then, we 
have cleaned up a lot of data in the sod’s but pg’s seem to stuck for last 4 to 
5 days. I have run "ceph osd reweight-by-utilization” and that did not seem to 
work.

Any suggestions? 


ceph -s
cluster 909c7fe9-0012-4c27-8087-01497c661511
 health HEALTH_WARN 224 pgs backfill; 130 pgs backfill_toofull; 86 pgs 
backfilling; 4 pgs degraded; 14 pgs recovery_wait; 324 pgs stuck unclean; 
recovery -11922/573322 objects degraded (-2.079%)
 monmap e5: 5 mons at 
{Lab-mon001=x.x.96.12:6789/0,Lab-mon002=x.x.96.13:6789/0,Lab-mon003=x.x.96.14:6789/0,Lab-mon004=x.x.96.15:6789/0,Lab-mon005=x.x.96.16:6789/0},
 election epoch 28, quorum 0,1,2,3,4 
Lab-mon001,Lab-mon002,Lab-mon003,Lab-mon004,Lab-mon005
 mdsmap e6: 1/1/1 up {0=Lab-mon001=up:active}
 osdmap e10598: 495 osds: 492 up, 492 in
  pgmap v1827231: 21568 pgs, 3 pools, 221 GB data, 184 kobjects
4142 GB used, 4982 GB / 9624 GB avail
-11922/573322 objects degraded (-2.079%)
   9 active+recovery_wait
   21244 active+clean
  90 active+remapped+wait_backfill
   5 active+recovery_wait+remapped
   4 active+degraded+remapped+wait_backfill
 130 active+remapped+wait_backfill+backfill_toofull
  86 active+remapped+backfilling
  client io 0 B/s rd, 0 op/s

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-11 Thread Alexandre DERUMIER
Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with 
a cisco 6500

rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms


(Seem to be lower than your 10gbe nexus)


- Mail original - 

De: "Wido den Hollander"  
À: ceph-users@lists.ceph.com 
Envoyé: Lundi 10 Novembre 2014 17:22:04 
Objet: Re: [ceph-users] Typical 10GbE latency 

On 08-11-14 02:42, Gary M wrote: 
> Wido, 
> 
> Take the switch out of the path between nodes and remeasure.. ICMP-echo 
> requests are very low priority traffic for switches and network stacks. 
> 

I tried with a direct TwinAx and fiber cable. No difference. 

> If you really want to know, place a network analyzer between the nodes 
> to measure the request packet to response packet latency.. The ICMP 
> traffic to the "ping application" is not accurate in the sub-millisecond 
> range. And should only be used as a rough estimate. 
> 

True, I fully agree with you. But, why is everybody showing a lower 
latency here? My latencies are about 40% higher then what I see in this 
setup and other setups. 

> You also may want to install the high resolution timer patch, sometimes 
> called HRT, to the kernel which may give you different results. 
> 
> ICMP traffic takes a different path than the TCP traffic and should not 
> be considered an indicator of defect. 
> 

Yes, I'm aware. But it still doesn't explain me why the latency on other 
systems, which are in production, is lower then on this idle system. 

> I believe the ping app calls the sendto system call.(sorry its been a 
> while since I last looked) Systems calls can take between .1us and .2us 
> each. However, the ping application makes several of these calls and 
> waits for a signal from the kernel. The wait for a signal means the ping 
> application must wait to be rescheduled to report the time.Rescheduling 
> will depend on a lot of other factors in the os. eg, timers, card 
> interrupts other tasks with higher priorities. Reporting the time must 
> add a few more systems calls for this to happen. As the ping application 
> loops to post the next ping request which again requires a few systems 
> calls which may cause a task switch while in each system call. 
> 
> For the above factors, the ping application is not a good representation 
> of network performance due to factors in the application and network 
> traffic shaping performed at the switch and the tcp stacks. 
> 

I think that netperf is probably a better tool, but that also does TCP 
latencies. 

I want the real IP latency, so I assumed that ICMP would be the most 
simple one. 

The other setups I have access to are in production and do not have any 
special tuning, yet their latency is still lower then on this new 
deployment. 

That's what gets me confused. 

Wido 

> cheers, 
> gary 
> 
> 
> On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło 
> mailto:jagiello.luk...@gmail.com>> wrote: 
> 
> Hi, 
> 
> rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 
> 
> 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
> SFI/SFP+ Network Connection (rev 01) 
> 
> at both hosts and Arista 7050S-64 between. 
> 
> Both hosts were part of active ceph cluster. 
> 
> 
> On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander  > wrote: 
> 
> Hello, 
> 
> While working at a customer I've ran into a 10GbE latency which 
> seems 
> high to me. 
> 
> I have access to a couple of Ceph cluster and I ran a simple 
> ping test: 
> 
> $ ping -s 8192 -c 100 -n  
> 
> Two results I got: 
> 
> rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms 
> rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms 
> 
> Both these environment are running with Intel 82599ES 10Gbit 
> cards in 
> LACP. One with Extreme Networks switches, the other with Arista. 
> 
> Now, on a environment with Cisco Nexus 3000 and Nexus 7000 
> switches I'm 
> seeing: 
> 
> rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms 
> 
> As you can see, the Cisco Nexus network has high latency 
> compared to the 
> other setup. 
> 
> You would say the switches are to blame, but we also tried with 
> a direct 
> TwinAx connection, but that didn't help. 
> 
> This setup also uses the Intel 82599ES cards, so the cards don't 
> seem to 
> be the problem. 
> 
> The MTU is set to 9000 on all these networks and cards. 
> 
> I was wondering, others with a Ceph cluster running on 10GbE, 
> could you 
> perform a simple network latency test like this? I'd like to 
> compare the 
> results. 
> 
> -- 
> Wido den Hollander 
> 42on B.V. 
> Ceph trainer and consultant 
> 
> Phone: +31 (0)20 700 9902  
> Skype: contact42on 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com  
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> 
> -- 
> Łukasz Jagiełło 
> lukaszjagielloorg 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
I've done a bit more work tonight and managed to get some more data
back. Osd.121, which was previously completely dead, has made it
through an XFS repair with a more fault tolerant HBA firmware and I
was able to export both of the placement groups required using
ceph_objectstore_tool. The osd would probably boot if I hadn't already
marked it as lost :(

I've basically got it down to two options.

I can import the exported data from osd.121 into osd.190 which would
complete the PG but this fails with a filestore feature mismatch
because the sharded objects feature is missing on the target osd.
Export has incompatible features set
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints}

The second one would be to run a ceph pg force_create_pg on each of
the problem PG's to reset them back to empty and them import the data
using ceph_objectstore_tool import-rados. Unfortunately this has
failed as well when I tested ceph pg force_create_pg on an incomplete
PG in another pool. The PG gets set to creating but then goes back to
incomplete after a few minutes.

I've trawled the mailing list for solutions but have come up empty,
neither problem appears to have been resolved before.

On Tue, Nov 11, 2014 at 5:54 PM, Matthew Anderson
 wrote:
> Thanks for your reply Sage!
>
> I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
> Stop osd.117
> Export 8.6ae from osd.117
> Remove 8.6ae from osd.117
> start osd.117
> restart osd.190 after still showing incomplete
>
> After this the PG was still showing incomplete and ceph pg dump_stuck
> inactive shows -
> pg_stat objects mip degr misp unf bytes log disklog state state_stamp
> v reported up up_primary acting acting_primary last_scrub scrub_stamp
> last_deep_scrub deep_scrub_stamp
> 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
> 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
> 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650
>
> I then tried an export from OSD 190 to OSD 117 by doing -
> Stop osd.190 and osd.117
> Export pg 8.6ae from osd.190
> Import from file generated in previous step into osd.117
> Boot both osd.190 and osd.117
>
> When osd.117 attempts to start it generates an failed assert, full log
> is here http://pastebin.com/S4CXrTAL
> -1> 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
>  0> 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
> time 2014-11-11 17:25:18.602626
> osd/OSD.h: 715: FAILED assert(ret)
>
>  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0xb8231b]
>  2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
>  3: (OSD::load_pgs()+0x1b78) [0x6aae18]
>  4: (OSD::init()+0x71f) [0x6abf5f]
>  5: (main()+0x252c) [0x638cfc]
>  6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
>  7: /usr/bin/ceph-osd() [0x651027]
>
> I also attempted the same steps with 8.ca and got the same results.
> The below is the current state of the pg with it removed from osd.111
> -
> pg_stat objects mip degr misp unf bytes log disklog state state_stamp
> v reported up up_primary acting acting_primary last_scrub scrub_stamp
> last_deep_scrub deep_scrub_stamp
> 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
> 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
> 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
> 12:57:58.162789
>
> Any idea of where I can go from here?
> One thought I had was setting osd.111 and osd.117 out of the cluster
> and once the data is moved I can shut them down and mark them as lost
> which would make osd.190 the only replica available for those PG's.
>
> Thanks again
>
> On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil  wrote:
>> On Tue, 11 Nov 2014, Matthew Anderson wrote:
>>> Just an update, it appears that no data actually exists for those PG's
>>> on osd.117 and osd.111 but it's showing as incomplete anyway.
>>>
>>> So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
>>> filled with data.
>>> For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
>>> filled with data as before.
>>>
>>> Since all of the required data is on OSD.190, would there be a way to
>>> make osd.111 and osd.117 forget they have ever seen the two incomplete
>>> PG's and therefore restart backfilling?
>>
>> Ah, that's good news.  You should know that the copy on osd.190 is
>> slightly out of date, but it is much better than losing the entire
>> contents of the PG.  More specifically, for 8.6ae the latest version was
>> 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
>> need to fsck the RBD images after this is all done.
>>
>> I don't think we've tested

[ceph-users] Not finding systemd files in Giant CentOS7 packages

2014-11-11 Thread Robert LeBlanc
I was trying to get systemd to bring up the monitor using the new systemd
files in Giant. However, I'm not finding the systemd files included in the
CentOS 7 packages. Are they missing or am I confused about how it should
work?

ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
Installed Packages
ceph.x86_64  1:0.87-0.el7.centos
   @Ceph
ceph-common.x86_64   1:0.87-0.el7.centos
   @Ceph
ceph-deploy.noarch   1.5.19-0
  @Ceph-noarch
ceph-release.noarch  1-0.el7
   installed
libcephfs1.x86_641:0.87-0.el7.centos
   @Ceph
python-ceph.x86_64   1:0.87-0.el7.centos
   @Ceph

Thanks,
Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Chad Seys
Find out which OSD it is:

ceph health detail

Squeeze blocks off the affected OSD:

ceph osd reweight OSDNUM 0.8

Repeat with any OSD which becomes toofull.

Your cluster is only about 50% used, so I think this will be enough.

Then when it finishes, allow data back on OSD:

ceph osd reweight OSDNUM 1

Hopefully ceph will someday be taught to move PGs in a better order!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis
Is that radosgw log from the primary or the secondary zone?  Nothing in
that log jumps out at me.

I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
issue with Apache 2.4 on the primary and replication.  It's fixed, just
waiting for the next firefly release.  Although, that causes 40x errors
with Apache 2.4, not 500 errors.

Have you verified that both system users can read and write to both
clusters?  (Just make sure you clean up the writes to the slave cluster).




On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett 
wrote:

> Ok I believe I’ve made some progress here. I have everything syncing
> *except* data. The data is getting 500s when it tries to sync to the backup
> zone. I have a log from the radosgw with debug cranked up to 20:
>
> 2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request
> req=0x7f546800f3b0 =
> 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl
> header for bucket, generating default
> 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 -->
> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write
> e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
> 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381
> submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote,
> 172.16.10.103:6934/14875, have pipe.
> 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415
> 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4
> 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
> 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
> 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ACK
> 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
> 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
> 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got MSG
> 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190
> data=0 off 0
> 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler
> 0/104857600
> 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got front 190
> 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).aborted = 0
> 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
> 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950
> osd_op_reply(1783 statelog.ob

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Thanks Chad. It seems to be working.

—Jiten

On Nov 11, 2014, at 12:47 PM, Chad Seys  wrote:

> Find out which OSD it is:
> 
> ceph health detail
> 
> Squeeze blocks off the affected OSD:
> 
> ceph osd reweight OSDNUM 0.8
> 
> Repeat with any OSD which becomes toofull.
> 
> Your cluster is only about 50% used, so I think this will be enough.
> 
> Then when it finishes, allow data back on OSD:
> 
> ceph osd reweight OSDNUM 1
> 
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Craig Lewis
How many OSDs are nearfull?

I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I
dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a
bit, then put it back to normal once the scheduling deadlock finished.

Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT
then IN, the weight will be set to 1.0.  If you need something that's
persistent, you can use ceph osd crush reweight osd.NUM .
Look at ceph osd tree to get the current weight.

I also recommend stepping towards your goal.  Changing either weight can
cause a lot of unrelated migrations, and the crush weight seems to cause
more than the osd weight.  I step osd weight by 0.125, and crush weight by
0.05.


On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys  wrote:

> Find out which OSD it is:
>
> ceph health detail
>
> Squeeze blocks off the affected OSD:
>
> ceph osd reweight OSDNUM 0.8
>
> Repeat with any OSD which becomes toofull.
>
> Your cluster is only about 50% used, so I think this will be enough.
>
> Then when it finishes, allow data back on OSD:
>
> ceph osd reweight OSDNUM 1
>
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett

> On Nov 11, 2014, at 4:21 PM, Craig Lewis  wrote:
> 
> Is that radosgw log from the primary or the secondary zone?  Nothing in that 
> log jumps out at me.
This is the log from the secondary zone. That HTTP 500 response code coming 
back is the only problem I can find. There are a bunch of 404s from other 
requests to logs and stuff, but I assume those are normal because there’s no 
activity going on. I guess it’s just that cryptic  WARNING: set_req_state_err 
err_no=5 resorting to 500 line that’s the problem. I think I need to get a 
stack trace from that somehow. 

> I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
> issue with Apache 2.4 on the primary and replication.  It's fixed, just 
> waiting for the next firefly release.  Although, that causes 40x errors with 
> Apache 2.4, not 500 errors.
It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug 
fix?

> 
> Have you verified that both system users can read and write to both clusters? 
>  (Just make sure you clean up the writes to the slave cluster).
Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was 
earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
syncing properly, as are the users. It seems like really the only thing that 
isn’t syncing is the .zone.rgw.buckets pool.

Thanks, Aaron 
> 
> 
> 
> 
> On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett  > wrote:
> Ok I believe I’ve made some progress here. I have everything syncing *except* 
> data. The data is getting 500s when it tries to sync to the backup zone. I 
> have a log from the radosgw with debug cranked up to 20:
> 
> 2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request 
> req=0x7f546800f3b0 =
> 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header 
> for bucket, generating default
> 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 
>  --> 172.16.10.103:6934/14875 
>  -- osd_op(client.5673295.0:1783 
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
> v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
> 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 
>  submit_message osd_op(client.5673295.0:1783 
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
> v4 remote, 172.16.10.103:6934/14875 , have 
> pipe.
> 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 
> 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
> v4
> 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 
> 206599450695048354
> 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
> 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
> 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 
>  >> 172.16.10.103:6934/14875 
>  pipe(0x7f53f0005160 sd=61 :33168 s=2 
> pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
> 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
Actually there were 100’s that were too full. We manually set the OSD weights 
to 0.5 and it seems to be recovering.

Thanks of the tips on crush reweight. I will look into it.

—Jiten

On Nov 11, 2014, at 1:37 PM, Craig Lewis  wrote:

> How many OSDs are nearfull?
> 
> I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I 
> dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, 
> then put it back to normal once the scheduling deadlock finished. 
> 
> Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT 
> then IN, the weight will be set to 1.0.  If you need something that's 
> persistent, you can use ceph osd crush reweight osd.NUM .  Look 
> at ceph osd tree to get the current weight.
> 
> I also recommend stepping towards your goal.  Changing either weight can 
> cause a lot of unrelated migrations, and the crush weight seems to cause more 
> than the osd weight.  I step osd weight by 0.125, and crush weight by 0.05.
> 
> 
> On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys  wrote:
> Find out which OSD it is:
> 
> ceph health detail
> 
> Squeeze blocks off the affected OSD:
> 
> ceph osd reweight OSDNUM 0.8
> 
> Repeat with any OSD which becomes toofull.
> 
> Your cluster is only about 50% used, so I think this will be enough.
> 
> Then when it finishes, allow data back on OSD:
> 
> ceph osd reweight OSDNUM 1
> 
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread cwseys
0.5 might be too much.  All the PGs squeezed off of one OSD will need to 
be stored on another.  The fewer you move the less likely a different 
OSD will become toofull.


Better to adjust in small increments as Craig suggested.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-11 Thread Robert LeBlanc
Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you
are only sending one packet so LACP won't help) one direction is 0.061 ms,
double that and you are at 0.122 ms of bits in flight, then there is
context switching, switch latency (store and forward assumed for 1 Gbps),
etc which I'm not sure would fit in the rest of the 0.057 of you min time.
If it is a 8192 byte payload, then I'm really impressed!

On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER 
wrote:

> Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links
> with a cisco 6500
>
> rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms
>
>
> (Seem to be lower than your 10gbe nexus)
>
>
> - Mail original -
>
> De: "Wido den Hollander" 
> À: ceph-users@lists.ceph.com
> Envoyé: Lundi 10 Novembre 2014 17:22:04
> Objet: Re: [ceph-users] Typical 10GbE latency
>
> On 08-11-14 02:42, Gary M wrote:
> > Wido,
> >
> > Take the switch out of the path between nodes and remeasure.. ICMP-echo
> > requests are very low priority traffic for switches and network stacks.
> >
>
> I tried with a direct TwinAx and fiber cable. No difference.
>
> > If you really want to know, place a network analyzer between the nodes
> > to measure the request packet to response packet latency.. The ICMP
> > traffic to the "ping application" is not accurate in the sub-millisecond
> > range. And should only be used as a rough estimate.
> >
>
> True, I fully agree with you. But, why is everybody showing a lower
> latency here? My latencies are about 40% higher then what I see in this
> setup and other setups.
>
> > You also may want to install the high resolution timer patch, sometimes
> > called HRT, to the kernel which may give you different results.
> >
> > ICMP traffic takes a different path than the TCP traffic and should not
> > be considered an indicator of defect.
> >
>
> Yes, I'm aware. But it still doesn't explain me why the latency on other
> systems, which are in production, is lower then on this idle system.
>
> > I believe the ping app calls the sendto system call.(sorry its been a
> > while since I last looked) Systems calls can take between .1us and .2us
> > each. However, the ping application makes several of these calls and
> > waits for a signal from the kernel. The wait for a signal means the ping
> > application must wait to be rescheduled to report the time.Rescheduling
> > will depend on a lot of other factors in the os. eg, timers, card
> > interrupts other tasks with higher priorities. Reporting the time must
> > add a few more systems calls for this to happen. As the ping application
> > loops to post the next ping request which again requires a few systems
> > calls which may cause a task switch while in each system call.
> >
> > For the above factors, the ping application is not a good representation
> > of network performance due to factors in the application and network
> > traffic shaping performed at the switch and the tcp stacks.
> >
>
> I think that netperf is probably a better tool, but that also does TCP
> latencies.
>
> I want the real IP latency, so I assumed that ICMP would be the most
> simple one.
>
> The other setups I have access to are in production and do not have any
> special tuning, yet their latency is still lower then on this new
> deployment.
>
> That's what gets me confused.
>
> Wido
>
> > cheers,
> > gary
> >
> >
> > On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło
> > mailto:jagiello.luk...@gmail.com>> wrote:
> >
> > Hi,
> >
> > rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms
> >
> > 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> >
> > at both hosts and Arista 7050S-64 between.
> >
> > Both hosts were part of active ceph cluster.
> >
> >
> > On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander  > > wrote:
> >
> > Hello,
> >
> > While working at a customer I've ran into a 10GbE latency which
> > seems
> > high to me.
> >
> > I have access to a couple of Ceph cluster and I ran a simple
> > ping test:
> >
> > $ ping -s 8192 -c 100 -n 
> >
> > Two results I got:
> >
> > rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
> > rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
> >
> > Both these environment are running with Intel 82599ES 10Gbit
> > cards in
> > LACP. One with Extreme Networks switches, the other with Arista.
> >
> > Now, on a environment with Cisco Nexus 3000 and Nexus 7000
> > switches I'm
> > seeing:
> >
> > rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
> >
> > As you can see, the Cisco Nexus network has high latency
> > compared to the
> > other setup.
> >
> > You would say the switches are to blame, but we also tried with
> > a direct
> > TwinAx connection, but that didn't help.
> >
> > This setup also uses the Intel 82599ES cards, so the cards don't
> > seem to
> > be the problem.
> >
> > The MTU is set to 9000 on all these networks and cards.
> >
> > I was wondering, others with a Ceph clust

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah
I agree. This was just our brute-force method on our test cluster. We won't do 
this on production cluster.

--Jiten

On Nov 11, 2014, at 2:11 PM, cwseys  wrote:

> 0.5 might be too much.  All the PGs squeezed off of one OSD will need to be 
> stored on another.  The fewer you move the less likely a different OSD will 
> become toofull.
> 
> Better to adjust in small increments as Craig suggested.
> 
> Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-11 Thread Christian Balzer
On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote:

> On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer  wrote:
> >
> > Hello,
> >
> > One of my clusters has become busy enough (I'm looking at you, evil
> > Window VMs that I shall banish elsewhere soon) to experience client
> > noticeable performance impacts during deep scrub.
> > Before this I instructed all OSDs to deep scrub in parallel at Saturday
> > night and that finished before Sunday morning.
> > So for now I'll fire them off one by one to reduce the load.
> >
> > Looking forward, that cluster doesn't need more space so instead of
> > adding more hosts and OSDs I was thinking of a cache pool instead.
> >
> > I suppose that will keep the clients happy while the slow pool gets
> > scrubbed.
> > Is there anybody who tested cache pools with Firefly and compared the
> > performance to Giant?
> >
> > For testing I'm currently playing with a single storage node and 8 SSD
> > backed OSDs.
> > Now what very much blew my mind is that a pool with a replication of 1
> > still does quite the impressive read orgy, clearly reading all the
> > data in the PGs.
> > Why? And what is it comparing that data with, the cosmic background
> > radiation?
> 
> Yeah, cache pools currently do full-object promotions whenever an
> object is accessed. There are some ideas and projects to improve this
> or reduce its effects, but they're mostly just getting started.
Thanks for confirming that, so probably not much better than Firefly
_aside_ from the fact that SSD pools should be quite a bit faster in and
by themselves in Giant. 
Guess there is no other way to find out than to test things, I have a
feeling that determining the "hot" working set otherwise will be rather
difficult.

> At least, I assume that's what you mean by a read orgy; perhaps you
> are seeing something else entirely?
> 
Indeed I did, this was just an observation that any pool with a replica of
1 will still read ALL the data during a deep-scrub. What good would that
do?

> Also, even on cache pools you don't really want to run with 1x
> replication as they hold the only copy of whatever data is dirty...
>
Oh, I agree, this is for testing only. 
Also a replica of 1 doesn't have to mean that the data is unsafe (the OSDs
could be RAIDed). But even though, in production the loss of a single node
shouldn't impact things. And once you go there, a replica of 2 comes
naturally.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis
>
> I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
> issue with Apache 2.4 on the primary and replication.  It's fixed, just
> waiting for the next firefly release.  Although, that causes 40x errors
> with Apache 2.4, not 500 errors.
>
> It is apache 2.4, but I’m actually running 0.80.7 so I probably have that
> bug fix?
>
>
No, the unreleased 0.80.8 has the fix.



>
> Have you verified that both system users can read and write to both
> clusters?  (Just make sure you clean up the writes to the slave cluster).
>
> Yes I can write everywhere and radosgw-agent isn’t getting any 403s like
> it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index
> pool is syncing properly, as are the users. It seems like really the only
> thing that isn’t syncing is the .zone.rgw.buckets pool.
>

That's pretty much the same behavior I was seeing with Apache 2.4.

Try downgrading the primary cluster to Apache 2.2.  In my testing, the
secondary cluster could run 2.2 or 2.4.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.88 released

2014-11-11 Thread Sage Weil
This is the first development release after Giant.  The two main
features merged this round are the new AsyncMessenger (an alternative
implementation of the network layer) from Haomai Wang at UnitedStack,
and support for POSIX file locks in ceph-fuse and libcephfs from Yan,
Zheng.  There is also a big pile of smaller items that re merged while
we were stabilizing Giant, including a range of smaller performance
and bug fixes and some new tracepoints for LTTNG.

Notable Changes
---

* ceph-disk: Scientific Linux support (Dan van der Ster)
* ceph-disk: respect --statedir for keyring (Loic Dachary)
* ceph-fuse, libcephfs: POSIX file lock support (Yan, Zheng)
* ceph-fuse, libcephfs: fix cap flush overflow (Greg Farnum, Yan, Zheng)
* ceph-fuse, libcephfs: fix root inode xattrs (Yan, Zheng)
* ceph-fuse, libcephfs: preserve dir ordering (#9178 Yan, Zheng)
* ceph-fuse, libcephfs: trim inodes before reconnecting to MDS (Yan, 
  Zheng)
* ceph: do not parse injectargs twice (Loic Dachary)
* ceph: make 'ceph -s' output more readable (Sage Weil)
* ceph: new 'ceph tell mds.$name_or_rank_or_gid' (John Spray)
* ceph: test robustness (Joao Eduardo Luis)
* ceph_objectstore_tool: behave with sharded flag (#9661 David Zafman)
* cephfs-journal-tool: fix journal import (#10025 John Spray)
* cephfs-journal-tool: skip up to expire_pos (#9977 John Spray)
* cleanup rados.h definitions with macros (Ilya Dryomov)
* common: shared_cache unit tests (Cheng Cheng)
* config: add $cctid meta variable (Adam Crume)
* crush: fix buffer overrun for poorly formed rules (#9492 Johnu George)
* crush: improve constness (Loic Dachary)
* crushtool: add --location  command (Sage Weil, Loic Dachary)
* default to libnss instead of crypto++ (Federico Gimenez)
* doc: ceph osd reweight vs crush weight (Laurent Guerby)
* doc: document the LRC per-layer plugin configuration (Yuan Zhou)
* doc: erasure code doc updates (Loic Dachary)
* doc: misc updates (Alfredo Deza, VRan Liu)
* doc: preflight doc fixes (John Wilkins)
* doc: update PG count guide (Gerben Meijer, Laurent Guerby, Loic Dachary)
* keyvaluestore: misc fixes (Haomai Wang)
* keyvaluestore: performance improvements (Haomai Wang)
* librados: add rados_pool_get_base_tier() call (Adam Crume)
* librados: cap buffer length (Loic Dachary)
* librados: fix objecter races (#9617 Josh Durgin)
* libradosstriper: misc fixes (Sebastien Ponce)
* librbd: add missing python docstrings (Jason Dillaman)
* librbd: add readahead (Adam Crume)
* librbd: fix cache tiers in list_children and snap_unprotect (Adam Crume)
* librbd: fix performance regression in ObjectCacher (#9513 Adam Crume)
* librbd: lttng tracepoints (Adam Crume)
* librbd: misc fixes (Xinxin Shu, Jason Dillaman)
* mds: fix sessionmap lifecycle bugs (Yan, Zheng)
* mds: initialize root inode xattr version (Yan, Zheng)
* mds: introduce auth caps (John Spray)
* mds: misc bugs (Greg Farnum, John Spray, Yan, Zheng, Henry Change)
* misc coverity fixes (Danny Al-Gaaf)
* mon: add 'ceph osd rename-bucket ...' command (Loic Dachary)
* mon: clean up auth list output (Loic Dachary)
* mon: fix 'osd crush link' id resolution (John Spray)
* mon: fix misc error paths (Joao Eduardo Luis)
* mon: fix paxos off-by-one corner case (#9301 Sage Weil)
* mon: new 'ceph pool ls [detail]' command (Sage Weil)
* mon: wait for writeable before cross-proposing (#9794 Joao Eduardo Luis)
* msgr: avoid useless new/delete (Haomai Wang)
* msgr: fix delay injection bug (#9910 Sage Weil, Greg Farnum)
* msgr: new AsymcMessenger alternative implementation (Haomai Wang)
* msgr: prefetch data when doing recv (Yehuda Sadeh)
* osd: add erasure code corpus (Loic Dachary)
* osd: add misc tests (Loic Dachary, Danny Al-Gaaf)
* osd: cleanup boost optionals (William Kennington)
* osd: expose non-journal backends via ceph-osd CLI (Hoamai Wang)
* osd: fix JSON output for stray OSDs (Loic Dachary)
* osd: fix ioprio options (Loic Dachary)
* osd: fix transaction accounting (Jianpeng Ma)
* osd: misc optimizations (Xinxin Shu, Zhiqiang Wang, Xinze Chi)
* osd: use FIEMAP_FLAGS_SYNC instead of fsync (Jianpeng Ma)
* rados: fix put of /dev/null (Loic Dachary)
* rados: parse command-line arguments more strictly (#8983 Adam Crume)
* rbd-fuse: fix memory leak (Adam Crume)
* rbd-replay-many (Adam Crume)
* rbd-replay: --anonymize flag to rbd-replay-prep (Adam Crume)
* rbd: fix 'rbd diff' for non-existent objects (Adam Crume)
* rbd: fix error when striping with format 1 (Sebastien Han)
* rbd: fix export for image sizes over 2GB (Vicente Cheng)
* rbd: use rolling average for rbd bench-write throughput (Jason Dillaman)
* rgw: send explicit HTTP status string (Yehuda Sadeh)
* rgw: set length for keystone token validation request (#7796 Yehuda 
  Sadeh, Mark Kirkwood)
* udev: fix rules for CentOS7/RHEL7 (Loic Dachary)
* use clock_gettime instead of gettimeofday (Jianpeng Ma)
* vstart.sh: set up environment for s3-tests (Luis Pabon)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http

[ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-11 Thread Scott Laird
I'm having a problem with my cluster.  It's running 0.87 right now, but I
saw the same behavior with 0.80.5 and 0.80.7.

The problem is that my logs are filling up with "replacing existing (lossy)
channel" log lines (see below), to the point where I'm filling drives to
100% almost daily just with logs.

It doesn't appear to be network related, because it happens even when
talking to other OSDs on the same host.  The logs pretty much all point to
port 0 on the remote end.  Is this an indicator that it's failing to
resolve port numbers somehow, or is this normal at this point in connection
setup?

The systems that are causing this problem are somewhat unusual; they're
running OSDs in Docker containers, but they *should* be configured to run
as root and have full access to the host's network stack.  They manage to
work, mostly, but things are still really flaky.

Also, is there documentation on what the various fields mean, short of
digging through the source?  And how does Ceph resolve OSD numbers into
host/port addresses?


2014-11-12 01:50:40.802604 7f7828db8700  0 -- 10.2.0.36:6819/1 >>
10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.802708 7f7816538700  0 -- 10.2.0.36:6830/1 >>
10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.803346 7f781ba8d700  0 -- 10.2.0.36:6819/1 >>
10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.803944 7f781996c700  0 -- 10.2.0.36:6830/1 >>
10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.804185 7f7816538700  0 -- 10.2.0.36:6819/1 >>
10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1
c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.805235 7f7813407700  0 -- 10.2.0.36:6819/1 >>
10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1
c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.806364 7f781bc8f700  0 -- 10.2.0.36:6819/1 >>
10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1
c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1)

2014-11-12 01:50:40.806425 7f781aa7d700  0 -- 10.2.0.36:6830/1 >>
10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1
c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress

2014-11-11 Thread Mallikarjun Biradar
Hi Greg,

I am using 0.86

refering to osd logs to check scrub behaviour.. Please have look at log
snippet from osd log

##Triggered scrub on osd.10--->
2014-11-12 16:24:21.393135 7f5026f31700  0 log_channel(default) log [INF] :
0.4 scrub ok
2014-11-12 16:24:24.393586 7f5026f31700  0 log_channel(default) log [INF] :
0.20 scrub ok
2014-11-12 16:24:30.393989 7f5026f31700  0 log_channel(default) log [INF] :
0.21 scrub ok
2014-11-12 16:24:33.394764 7f5026f31700  0 log_channel(default) log [INF] :
0.23 scrub ok
2014-11-12 16:24:34.395293 7f5026f31700  0 log_channel(default) log [INF] :
0.36 scrub ok
2014-11-12 16:24:35.941704 7f5026f31700  0 log_channel(default) log [INF] :
1.1 scrub ok
2014-11-12 16:24:39.533780 7f5026f31700  0 log_channel(default) log [INF] :
1.d scrub ok
2014-11-12 16:24:41.811185 7f5026f31700  0 log_channel(default) log [INF] :
1.44 scrub ok
2014-11-12 16:24:54.257384 7f5026f31700  0 log_channel(default) log [INF] :
1.5b scrub ok
2014-11-12 16:25:02.973101 7f5026f31700  0 log_channel(default) log [INF] :
1.67 scrub ok
2014-11-12 16:25:17.597546 7f5026f31700  0 log_channel(default) log [INF] :
1.6b scrub ok
##Previous scrub is still in progress, triggered scrub on osd.10 again-->
CEPH re-started scrub operation
20104-11-12 16:25:19.394029 7f5026f31700  0 log_channel(default) log [INF]
: 0.4 scrub ok
2014-11-12 16:25:22.402630 7f5026f31700  0 log_channel(default) log [INF] :
0.20 scrub ok
2014-11-12 16:25:24.695565 7f5026f31700  0 log_channel(default) log [INF] :
0.21 scrub ok
2014-11-12 16:25:25.408821 7f5026f31700  0 log_channel(default) log [INF] :
0.23 scrub ok
2014-11-12 16:25:29.467527 7f5026f31700  0 log_channel(default) log [INF] :
0.36 scrub ok
2014-11-12 16:25:32.558838 7f5026f31700  0 log_channel(default) log [INF] :
1.1 scrub ok
2014-11-12 16:25:35.763056 7f5026f31700  0 log_channel(default) log [INF] :
1.d scrub ok
2014-11-12 16:25:38.166853 7f5026f31700  0 log_channel(default) log [INF] :
1.44 scrub ok
2014-11-12 16:25:40.602758 7f5026f31700  0 log_channel(default) log [INF] :
1.5b scrub ok
2014-11-12 16:25:42.169788 7f5026f31700  0 log_channel(default) log [INF] :
1.67 scrub ok
2014-11-12 16:25:45.851419 7f5026f31700  0 log_channel(default) log [INF] :
1.6b scrub ok
2014-11-12 16:25:51.259453 7f5026f31700  0 log_channel(default) log [INF] :
1.a8 scrub ok
2014-11-12 16:25:53.012220 7f5026f31700  0 log_channel(default) log [INF] :
1.a9 scrub ok
2014-11-12 16:25:54.009265 7f5026f31700  0 log_channel(default) log [INF] :
1.cb scrub ok
2014-11-12 16:25:56.516569 7f5026f31700  0 log_channel(default) log [INF] :
1.e2 scrub ok


 -Thanks & regards,
Mallikarjun Biradar

On Tue, Nov 11, 2014 at 12:18 PM, Gregory Farnum  wrote:

> On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar
>  wrote:
> > Hi all,
> >
> > Triggering shallow scrub on OSD where scrub is already in progress,
> restarts
> > scrub from beginning on that OSD.
> >
> >
> > Steps:
> > Triggered shallow scrub on an OSD (Cluster is running heavy IO)
> > While scrub is in progress, triggered shallow scrub again on that OSD.
> >
> > Observed behavior, is scrub restarted from beginning on that OSD.
> >
> > Please let me know, whether its expected behaviour?
>
> What version of Ceph are you seeing this on? How are you identifying
> that scrub is restarting from the beginning? It sounds sort of
> familiar to me, but I thought this was fixed so it was a no-op if you
> issue another scrub. (That's not authoritative though; I might just be
> missing a reason we want to restart it.)
> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados mkpool fails, but not ceph osd pool create

2014-11-11 Thread Gauvain Pocentek

Hi all,

I'm facing a problem on a ceph deployment. rados mkpool always fails:

# rados -n client.admin mkpool test
error creating pool test: (2) No such file or directory

rados lspool and rmpool commands work just fine, and the following also 
works:


# ceph osd pool create test 128 128
pool 'test' created

I've enabled rados debug but it really didn't help much. Should I look 
at mons or osds debug logs?


Any idea about what could be happening?

Thanks,
Gauvain Pocentek

Objectif Libre - Infrastructure et Formations Linux
http://www.objectif-libre.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com