[ceph-users] How to configure placement_targets?

2016-01-11 Thread Yang Honggang


The parameter passed to create_bucket was wrong. The right way:

// Create bucket 'mmm-1' in placement target 'fast-placement'
// 'bj' is my region name, 'fast-placement' is my placement target name.
bucket = conn.create_bucket('mmm-1', location='*bj:fast-placement*')
// Create bucket 'mmm-2' in placement target 'default-placement'
bucket = conn.create_bucket('mmm-2', location='*bj:default-placement*')

thx

joseph

On 01/07/2016 09:04 PM, Yang Honggang wrote:

Hello,
*
**How to configure placement_targets?
Which step is wrong in my following steps?
*
I want to use different pools to hold user's buckets. Two pools are 
created,

one is '.bj-dz.rgw.buckets', the other is '.bj-dz.rgw.buckets.hot'.

1. Two placement targets are added to region map. Targets tags are 
'hdd' and 'ssd'.


begin--

"placement_targets": [
{
"name": "default-placement",
"tags": [
"*hdd*"
]
},
{
"name": "fast-placement",
"tags": [
"*ssd*"
]
}
],
"default_placement": "default-placement"
}

---end

2. Map the placement targets to the ceph backend pools.

- begin --

"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": ".bj-dz.rgw.buckets.index",
"data_pool": "*.bj-dz.rgw.buckets*",
"data_extra_pool": ""
}
},
{
"key": "fast-placement",
"val": {
"index_pool": ".bj-dz.rgw.buckets.index",
"data_pool": "*.bj-dz.rgw.buckets.hot*",
"data_extra_pool": ""
}
}
]
--- end --

3. In order to enable the user create buckets in these two
   placement_target/ceph pools, I add two placement_tags for my user.

-- begin 
"keys": [
{
"user": "testuser",
"access_key": "6R2MJWR863EREUDD0KTZ",
"secret_key": 
"74eHNNQa1oLBlvZfO2CC2hIU8cobSYxTgeRDtXtH"

}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "*default-placement*",
"placement_tags": [
*  "ssd",**
**"hdd"*
],
-end--

4. Test if it works.

- Test if we can create a bucket 'y-ssd' in 'fast-placement 
target'/'pool .bj-dz.rgw.buckets.hot'.


--- begin -

bucket = conn.create_bucket('y-ssd', location='fast-placement')

--- end -

- Test if we can create a bucket 'y-normal' in 
'default-placement'/'pool .bj-dz.rgw.buckets'.


--- begin 

bucket = conn.create_bucket('y-normal', location='default-placement')

 end 

Result is: ALL these two buckets are created in pool 
'.*bj-dz.rgw.buckets*'.


5. In order to test if my 'fast-placement' target works, I change the 
user's
   'default_placement' to '*fast-placement*'. And repeat the test in 
step 4.


Reusult is: All those two buckets are created in pool 
'*.bj-dz.rgw.buckets.hot*'.


6. conclusion

*   The user's default placement setting works fine. But the placement 
tags seems not work.**

**   Why???**

**   I don't know if this is the right way to use different placement 
targets. *


thx

joseph

///Complete Region map/zone map/ user 
info


/ Region map //
# radosgw-admin regionmap get --name client.radosgw.bj-dz-1 --cluster 
gang2

{
"regions": [
{
"key": "bj",
"val": {
"name": "bj",
"api_name": "bj",
"is_master": "true",
"endpoints": [
"http:\/\/s3.mydomain.com:80\/"
],
"hostnames": [],
"master_zone": "bj-dz",
"zones": [
{
"name": "bj-dz",
"endpoints": [
"http:\/\/s3.mydomain.com:80\/"
],
"log_meta": "true",
"log_data": 

Re: [ceph-users] krdb vDisk best practice ?

2016-01-11 Thread Wolf F.
Just in case anyone in future comes up with the same question: 

I ran the following Test-case: 

3 identical Debian VM's. 4GB Ram, 4 vCores. Virtio for vDisks. On the same 
Pool. vDisks mounted at /home/test

1x 120GB 
12x 10GB JBOD via LVM
12x 10GB Raid 0

Then separately i wrote 100GB of Data using dd onto /home/test/testfile. 

All 3 Benchmarks had statistically the same write speeds. However the 
CPU-Consumption was about 5% higher on LVM and about 35% when using mdadm 
Raid-0 for 12 vDisks.



Question: Does Ceph have a upper limit for how "BIG" i can make virtio based 
vDisks on a EC-Pool ?


- Original Message -
> From: "Wolf F." 
> To: ceph-users@lists.ceph.com
> Sent: Saturday, January 2, 2016 9:21:46 PM
> Subject: krdb vDisk best practice ?

> Running a single node Proxmox "cluster", with Ceph on top. 1 Mon. Same node.
> I have 24 HDD (no dedicated journal) and 8 SSD split via "custom crush 
> location
> hook".
> Cache-Tier (SSD-OSD) for a EC-pool (HDD-OSD) providing access for proxmox via
> krdb.
> 15 TB Capacity (Assortment of Disk sizes/speeds). Vdisks are Virtio and XFS.
> OSD's are XFS as well.
> 
> While setting up a virtual OpenmediaVault (VM) the following Question arose
> regarding vDisks (virtio) and their best practice.
> 
> 
> Q1: How does the amount and size of vDisks affect Write/Read performance? Do i
> bottleneck myself with overhead (single Mon)? Or does it maybe not matter at
> all?
> 
> Values are academic examples.
> 120x 100GB vDisks - In OMV as Raid0
> 120x 100GB vDisks - In OMV as JBOD
> 
> 12x 1TB vDisk - In OMV as Raid0
> 12x 1TB vDisk - In OMV as JBOD
> 
> 2x 6TB vDisk - In OMV as Raid0
> 2x 6TB vDisk - In OMV as JBOD
> 
> Q2: How does this best practice change if i add 2 more nodes (same config) and
> by implication 2 more mons?
> 
> 
> Not been able to find much on this topic.
> 
> kind regards,
> Wolf F.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Henrik Korkuc

On 16-01-11 04:10, Rafael Lopez wrote:

Thanks for the replies guys.

@Steve, even when you remove due to failing, have you noticed that the 
cluster rebalances twice using the documented steps? You may not if 
you don't wait for the initial recovery after 'ceph osd out'. If you 
do 'ceph osd out' and immediately 'ceph osd crush remove', RH support 
has told me that this effectively 'cancels' the original move 
triggered from 'ceph osd out' and starts permanently remapping... 
which still doesn't really explain why we have to do the ceph osd out 
in the first place..


It needs to be tested, but I think it may not allow to do crush remove 
before doing osd out (e.g. you shouldn't be removing osds from crush 
which are in cluster). At least it was the case with up OSDs when I was 
doing some testing


@Dan, good to hear it works, I will try that method next time and see 
how it goes!



On 8 January 2016 at 03:08, Steve Taylor 
mailto:steve.tay...@storagecraft.com>> 
wrote:


If I’m not mistaken, marking an osd out will remap its placement
groups temporarily, while removing it from the crush map will
remap the placement groups permanently. Additionally, other
placement groups from other osds could get remapped permanently
when an osd is removed from the crush map. I would think the only
benefit to marking an osd out before stopping it would be a
cleaner redirection of client I/O before the osd disappears, which
may be worthwhile if you’re removing a healthy osd.

As for reweighting to 0 prior to removing an osd, it seems like
that would give the osd the ability to participate in the recovery
essentially in read-only fashion (plus deletes) until it’s empty,
so objects wouldn’t become degraded as placement groups are
backfilling onto other osds. Again, this would really only be
useful if you’re removing a healthy osd. If you’re removing an osd
where other osds in different failure domains are known to be
unhealthy, it seems like this would be a really good idea.

I usually follow the documented steps you’ve outlined myself, but
I’m typically removing osds due to failed/failing drives while the
rest of the cluster is healthy.



*Steve Taylor*| Senior Software Engineer | StorageCraft Technology
Corporation 
380 Data Drive Suite 300 | Draper | Utah | 84020
*Office: *801.871.2799 | *Fax: *801.545.4705



If you are not the intended recipient of this message, be advised
that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender
and delete it, together with any attachments.

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com
] *On Behalf Of *Rafael
Lopez
*Sent:* Wednesday, January 06, 2016 4:53 PM
*To:* ceph-users@lists.ceph.com 
*Subject:* [ceph-users] double rebalance when removing osd

Hi all,

I am curious what practices other people follow when removing OSDs
from a cluster. According to the docs, you are supposed to:

1. ceph osd out

2. stop daemon

3. ceph osd crush remove

4. ceph auth del

5. ceph osd rm

What value does ceph osd out (1) add to the removal process and
why is it in the docs ? We have found (as have others) that by
outing(1) and then crush removing (3), the cluster has to do two
recoveries. Is it necessary? Can you just do a crush remove
without step 1?

I found this earlier message from GregF which he seems to affirm
that just doing the crush remove is fine:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html

This recent blog post from Sebastien that suggests reweighting to
0 first, but havent tested it:

http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

I thought that by marking it out, it sets the reweight to 0
anyway, so not sure how this would make a difference in terms of
two rebalances but maybe there is a subtle difference.. ?

Thanks,

Raf

-- 


Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions




--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone: +61 3 9905 9118 
Mobile:   +61 4 27 682 670
Email rafael.lo...@monash.edu 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.c

Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Andy Allan
On 11 January 2016 at 02:10, Rafael Lopez  wrote:

> @Steve, even when you remove due to failing, have you noticed that the 
> cluster rebalances twice using the documented steps? You may not if you don't 
> wait for the initial recovery after 'ceph osd out'. If you do 'ceph osd out' 
> and immediately 'ceph osd crush remove', RH support has told me that this 
> effectively 'cancels' the original move triggered from 'ceph osd out' and 
> starts permanently remapping... which still doesn't really explain why we 
> have to do the ceph osd out in the first place..

This topic was last discussed in December - the documentation for
removing an OSD from the cluster is not helpful. Unfortunately it
doesn't look like anyone is going to fix the documentation.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627

Basically, when you want to remove an OSD, there's an alternative
sequence of commands that avoids the double-rebalance.

The better approach is to reweight the OSD to zero first, then wait
for the (one and only) rebalance, then mark out and remove. Here's
more details from the previous thread:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-11 Thread Nick Fisk
Looks like it has been done

https://github.com/zhouyuan/ceph/commit/f352b8b908e8788d053cbe15fa3632b226a6758d


> -Original Message-
> From: Robert LeBlanc [mailto:rob...@leblancnet.us]
> Sent: 08 January 2016 18:23
> To: Nick Fisk 
> Cc: Wade Holler ; hnuzhoulin
> ; Ceph-User 
> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench
> result degrade
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Are you backporting that to hammer? We'd love it.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Fri, Jan 8, 2016 at 9:28 AM, Nick Fisk  wrote:
> > There was/is a bug in Infernalis and older, where objects will always get
> promoted on the 2nd read/write regardless of what you set the
> min_recency_promote settings to. This can have a dramatic effect on
> performance. I wonder if this is what you are experiencing?
> >
> > This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 .
> >
> > You can compile the changes above to see if it helps or I have a .deb for
> Infernalis where this is fixed if it's easier.
> >
> > Nick
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Wade Holler
> >> Sent: 08 January 2016 16:14
> >> To: hnuzhoulin ; ceph-de...@vger.kernel.org
> >> Cc: ceph-us...@ceph.com
> >> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods
> >> bench result degrade
> >>
> >> My experience is performance degrades dramatically when dirty objects
> >> are flushed.
> >>
> >> Best Regards,
> >> Wade
> >>
> >>
> >> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
> >> Hi,guyes
> >> Recentlly,I am testing  cache-tier using writeback mode.but I found a
> >> strange things.
> >> the performance  using rados bench degrade.Is it correct?
> >> If so,how to explain.following some info about my test:
> >>
> >> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
> >> other one used as OSD),four sata as OSD.
> >>
> >> before using cache-tier:
> >> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
> >> 
> >> Total time run: 301.236355
> >> Total writes made:  6041
> >> Write size: 4194304
> >> Bandwidth (MB/sec): 80.216
> >>
> >> Stddev Bandwidth:   10.5358
> >> Max bandwidth (MB/sec): 104
> >> Min bandwidth (MB/sec): 0
> >> Average Latency:0.797838
> >> Stddev Latency: 0.619098
> >> Max latency:4.89823
> >> Min latency:0.158543
> >>
> >> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
> >> Total time run:133.563980
> >> Total reads made: 6041
> >> Read size:4194304
> >> Bandwidth (MB/sec):180.917
> >>
> >> Average Latency:   0.353559
> >> Max latency:   1.83356
> >> Min latency:   0.027878
> >>
> >> after configure cache-tier:
> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add
> coldstorage
> >> hotstorage pool 'hotstorage' is now (or already was) a tier of
> >> 'coldstorage'
> >>
> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
> >> hotstorage writeback set cache-mode for pool 'hotstorage' to
> >> writeback
> >>
> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
> >> coldstorage hotstorage overlay for 'coldstorage' is now (or already
> >> was) 'hotstorage'
> >>
> >> oot@ubuntu:~# ceph osd dump|grep storage pool 6 'coldstorage'
> >> replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins
> >> pg_num 512 pgp_num 512 last_change 216 lfor 216 flags hashpspool
> >> tiers 7 read_tier 7 write_tier 7 stripe_width 0 pool 7 'hotstorage'
> >> replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins
> >> pg_num 128 pgp_num 128 last_change 228 flags
> >> hashpspool,incomplete_clones tier_of 6 cache_mode writeback
> >> target_bytes
> >> 1000 hit_set bloom{false_positive_probability: 0.05, target_size:
> >> 0, seed: 0} 3600s x6 stripe_width 0
> >> -
> >> rados bench -p coldstorage 300 write --no-cleanup Total time run:
> >> 302.207573 Total writes made: 4315 Write size: 4194304 Bandwidth
> >> (MB/sec): 57.113
> >>
> >> Stddev Bandwidth: 23.9375
> >> Max bandwidth (MB/sec): 104
> >> Min bandwidth (MB/sec): 0
> >> Average Latency: 1.1204
> >> Stddev Latency: 0.717092
> >> Max latency: 6.97288
> >> Min latency: 0.158371
> >>
> >> root@ubuntu:/# rados bench -p coldstorage 300 seq Total time run:
> >> 153.869741 Total reads made: 4315 Read size: 4194304 Bandwidth
> >> (MB/sec): 112.173
> >>
> >> Average Latency: 0.570487
> >> Max latency: 1.75137
> >> Min latency: 0.039635
> >>
> >>
> >> ceph.conf:
> >> 
> >> [global]
> >> fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
> >> mon_initial_members = ceph2, ceph3, ceph4 mon_host =
> >> 1

Re: [ceph-users] Infernalis upgrade breaks when journal on separate partition

2016-01-11 Thread Stillwell, Bryan
On 1/10/16, 2:26 PM, "ceph-users on behalf of Stuart Longland"
 wrote:

>On 05/01/16 07:52, Stuart Longland wrote:
>>> I ran into this same issue, and found that a reboot ended up setting
>>>the
>>> > ownership correctly.  If you look at
>>>/lib/udev/rules.d/95-ceph-osd.rules
>>> > you'll see the magic that makes it happen
>> Ahh okay, good-o, so a reboot should be fine.  I guess adding chown-ing
>> of journal files would be a good idea (maybe it's version specific, but
>> chown -R did not follow the symlink and change ownership for me).
>
>Well, it seems I spoke to soon.  Not sure what logic the udev rules use
>to identify ceph journals, but it doesn't seem to pick up on the
>journals in our case as after a reboot, those partitions are owned by
>root:disk with permissions 0660.

This is handled by the UUIDs of the GPT partitions, and since you're using
MS-DOS
partition tables it won't work correctly.  I would recommend switching to
GPT
partition tables if you can.

Bryan




This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Steve Taylor
Rafael,

Yes, the cluster still rebalances twice when removing a failed osd. An osd that 
is marked out for any reason but still exists in the crush map gets its 
placement groups remapped to different osds until it comes back in, at which 
point those pgs are remapped back. When an osd is removed from the crush map, 
its pgs get mapped to new osds permanently. The mappings may be completely 
different for these two cases, which is why you get double rebalancing even 
when those two operations happen without the osd coming back in in between.

In the case of a failed osd, I usually don't worry about it and just follow the 
documented steps because I'm marking an osd out and then removing it from the 
crush map immediately, so the first rebalance does almost nothing by the time 
the second overrides it, which matches what you were told by support. If this 
is a problem for you or if you're removing an osd that's still functional to 
some degree, then reweighting to 0, waiting for the single rebalance, then 
following the removal steps is probably your best bet.


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: Andy Allan [mailto:gravityst...@gmail.com] 
Sent: Monday, January 11, 2016 4:09 AM
To: Rafael Lopez 
Cc: Steve Taylor ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] double rebalance when removing osd

On 11 January 2016 at 02:10, Rafael Lopez  wrote:

> @Steve, even when you remove due to failing, have you noticed that the 
> cluster rebalances twice using the documented steps? You may not if you don't 
> wait for the initial recovery after 'ceph osd out'. If you do 'ceph osd out' 
> and immediately 'ceph osd crush remove', RH support has told me that this 
> effectively 'cancels' the original move triggered from 'ceph osd out' and 
> starts permanently remapping... which still doesn't really explain why we 
> have to do the ceph osd out in the first place..

This topic was last discussed in December - the documentation for removing an 
OSD from the cluster is not helpful. Unfortunately it doesn't look like anyone 
is going to fix the documentation.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627

Basically, when you want to remove an OSD, there's an alternative sequence of 
commands that avoids the double-rebalance.

The better approach is to reweight the OSD to zero first, then wait for the 
(one and only) rebalance, then mark out and remove. Here's more details from 
the previous thread:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where is the fsid field coming from in ceph -s ?

2016-01-11 Thread Gregory Farnum
On Sat, Jan 9, 2016 at 1:58 AM, Oliver Dzombic  wrote:
> Hi,
>
> fighting to add a new mon it somehow happend by mistake, that a new
> cluster id got generated.
>
> So the output of "ceph -s" show a new cluster id.
>
> But the osd/mon are still running on the old cluster id.
>
> Changing the osd/mon to the new cluster id makes them got refused by the
> cluster.
>
> Even restarting the cluster with the old cluster id does not help.
>
> "ceph -s" is still showing the new cluster id.
>
> So where is this cluster id exactly coming from ?
>
> I could not find it up until now, where this new cluster id is
> mentioned, that we can change it back to the old one.
>
> Thank you !

FSIDs are stored in the osd and monitor data stores. If you generated
a new mon with a different one, the best thing to do is just destroy
it and then create a new monitor the cluster properly as you
presumably also don't have matching monitor keys, etc
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where is the fsid field coming from in ceph -s ?

2016-01-11 Thread Oliver Dzombic
Hi Greg,

thank you for your time !

In my situation, i overwrite the old ID with the new one. I dont know
how. I thought thats impossible, but a running cluster with 4 mon's
suddenly just changed its ID.

So the cluster has now the new ID. As i can see, i cant change the ID
running some command.

A command to change the cluster fsid would be great :-)

So i know, for now, i have to adjust everything to the new ID.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 11.01.2016 um 18:59 schrieb Gregory Farnum:
> On Sat, Jan 9, 2016 at 1:58 AM, Oliver Dzombic  wrote:
>> Hi,
>>
>> fighting to add a new mon it somehow happend by mistake, that a new
>> cluster id got generated.
>>
>> So the output of "ceph -s" show a new cluster id.
>>
>> But the osd/mon are still running on the old cluster id.
>>
>> Changing the osd/mon to the new cluster id makes them got refused by the
>> cluster.
>>
>> Even restarting the cluster with the old cluster id does not help.
>>
>> "ceph -s" is still showing the new cluster id.
>>
>> So where is this cluster id exactly coming from ?
>>
>> I could not find it up until now, where this new cluster id is
>> mentioned, that we can change it back to the old one.
>>
>> Thank you !
> 
> FSIDs are stored in the osd and monitor data stores. If you generated
> a new mon with a different one, the best thing to do is just destroy
> it and then create a new monitor the cluster properly as you
> presumably also don't have matching monitor keys, etc
> -Greg
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-11 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Currently set as DNM. :( I guess the author has not updated the PR as
requested. If needed, I can probably submit a new PR as we would
really like to see this in the next Hammer release. I just need to
know if I need to get involved. I don't want to take credit for Nick's
work, so I've been waiting.

Thanks
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jan 11, 2016 at 4:29 AM, Nick Fisk  wrote:
> Looks like it has been done
>
> https://github.com/zhouyuan/ceph/commit/f352b8b908e8788d053cbe15fa3632b226a6758d
>
>
>> -Original Message-
>> From: Robert LeBlanc [mailto:rob...@leblancnet.us]
>> Sent: 08 January 2016 18:23
>> To: Nick Fisk
>> Cc: Wade Holler ; hnuzhoulin
>> ; Ceph-User
>> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench
>> result degrade
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Are you backporting that to hammer? We'd love it.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Fri, Jan 8, 2016 at 9:28 AM, Nick Fisk  wrote:
>> > There was/is a bug in Infernalis and older, where objects will always get
>> promoted on the 2nd read/write regardless of what you set the
>> min_recency_promote settings to. This can have a dramatic effect on
>> performance. I wonder if this is what you are experiencing?
>> >
>> > This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 .
>> >
>> > You can compile the changes above to see if it helps or I have a .deb for
>> Infernalis where this is fixed if it's easier.
>> >
>> > Nick
>> >
>> >> -Original Message-
>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> >> Of Wade Holler
>> >> Sent: 08 January 2016 16:14
>> >> To: hnuzhoulin ; ceph-de...@vger.kernel.org
>> >> Cc: ceph-us...@ceph.com
>> >> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods
>> >> bench result degrade
>> >>
>> >> My experience is performance degrades dramatically when dirty objects
>> >> are flushed.
>> >>
>> >> Best Regards,
>> >> Wade
>> >>
>> >>
>> >> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
>> >> Hi,guyes
>> >> Recentlly,I am testing  cache-tier using writeback mode.but I found a
>> >> strange things.
>> >> the performance  using rados bench degrade.Is it correct?
>> >> If so,how to explain.following some info about my test:
>> >>
>> >> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
>> >> other one used as OSD),four sata as OSD.
>> >>
>> >> before using cache-tier:
>> >> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
>> >> 
>> >> Total time run: 301.236355
>> >> Total writes made:  6041
>> >> Write size: 4194304
>> >> Bandwidth (MB/sec): 80.216
>> >>
>> >> Stddev Bandwidth:   10.5358
>> >> Max bandwidth (MB/sec): 104
>> >> Min bandwidth (MB/sec): 0
>> >> Average Latency:0.797838
>> >> Stddev Latency: 0.619098
>> >> Max latency:4.89823
>> >> Min latency:0.158543
>> >>
>> >> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
>> >> Total time run:133.563980
>> >> Total reads made: 6041
>> >> Read size:4194304
>> >> Bandwidth (MB/sec):180.917
>> >>
>> >> Average Latency:   0.353559
>> >> Max latency:   1.83356
>> >> Min latency:   0.027878
>> >>
>> >> after configure cache-tier:
>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add
>> coldstorage
>> >> hotstorage pool 'hotstorage' is now (or already was) a tier of
>> >> 'coldstorage'
>> >>
>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
>> >> hotstorage writeback set cache-mode for pool 'hotstorage' to
>> >> writeback
>> >>
>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
>> >> coldstorage hotstorage overlay for 'coldstorage' is now (or already
>> >> was) 'hotstorage'
>> >>
>> >> oot@ubuntu:~# ceph osd dump|grep storage pool 6 'coldstorage'
>> >> replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins
>> >> pg_num 512 pgp_num 512 last_change 216 lfor 216 flags hashpspool
>> >> tiers 7 read_tier 7 write_tier 7 stripe_width 0 pool 7 'hotstorage'
>> >> replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins
>> >> pg_num 128 pgp_num 128 last_change 228 flags
>> >> hashpspool,incomplete_clones tier_of 6 cache_mode writeback
>> >> target_bytes
>> >> 1000 hit_set bloom{false_positive_probability: 0.05, target_size:
>> >> 0, seed: 0} 3600s x6 stripe_width 0
>> >> -
>> >> rados bench -p coldstorage 300 write --no-cleanup Total time run:
>> >> 302.207573 Total writes made: 4315 Write size: 4194304 Bandwidth
>> >> (MB/sec): 57.113
>> >>
>> >> Stddev Bandwidth: 23.9375
>> >> Max

Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-11 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

https://github.com/ceph/ceph/pull/7024
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jan 11, 2016 at 1:47 PM, Robert LeBlanc  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Currently set as DNM. :( I guess the author has not updated the PR as
> requested. If needed, I can probably submit a new PR as we would
> really like to see this in the next Hammer release. I just need to
> know if I need to get involved. I don't want to take credit for Nick's
> work, so I've been waiting.
>
> Thanks
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Jan 11, 2016 at 4:29 AM, Nick Fisk  wrote:
>> Looks like it has been done
>>
>> https://github.com/zhouyuan/ceph/commit/f352b8b908e8788d053cbe15fa3632b226a6758d
>>
>>
>>> -Original Message-
>>> From: Robert LeBlanc [mailto:rob...@leblancnet.us]
>>> Sent: 08 January 2016 18:23
>>> To: Nick Fisk
>>> Cc: Wade Holler ; hnuzhoulin
>>> ; Ceph-User
>>> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench
>>> result degrade
>>>
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA256
>>>
>>> Are you backporting that to hammer? We'd love it.
>>> - 
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Fri, Jan 8, 2016 at 9:28 AM, Nick Fisk  wrote:
>>> > There was/is a bug in Infernalis and older, where objects will always get
>>> promoted on the 2nd read/write regardless of what you set the
>>> min_recency_promote settings to. This can have a dramatic effect on
>>> performance. I wonder if this is what you are experiencing?
>>> >
>>> > This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 .
>>> >
>>> > You can compile the changes above to see if it helps or I have a .deb for
>>> Infernalis where this is fixed if it's easier.
>>> >
>>> > Nick
>>> >
>>> >> -Original Message-
>>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>> >> Of Wade Holler
>>> >> Sent: 08 January 2016 16:14
>>> >> To: hnuzhoulin ; ceph-de...@vger.kernel.org
>>> >> Cc: ceph-us...@ceph.com
>>> >> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods
>>> >> bench result degrade
>>> >>
>>> >> My experience is performance degrades dramatically when dirty objects
>>> >> are flushed.
>>> >>
>>> >> Best Regards,
>>> >> Wade
>>> >>
>>> >>
>>> >> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
>>> >> Hi,guyes
>>> >> Recentlly,I am testing  cache-tier using writeback mode.but I found a
>>> >> strange things.
>>> >> the performance  using rados bench degrade.Is it correct?
>>> >> If so,how to explain.following some info about my test:
>>> >>
>>> >> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
>>> >> other one used as OSD),four sata as OSD.
>>> >>
>>> >> before using cache-tier:
>>> >> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
>>> >> 
>>> >> Total time run: 301.236355
>>> >> Total writes made:  6041
>>> >> Write size: 4194304
>>> >> Bandwidth (MB/sec): 80.216
>>> >>
>>> >> Stddev Bandwidth:   10.5358
>>> >> Max bandwidth (MB/sec): 104
>>> >> Min bandwidth (MB/sec): 0
>>> >> Average Latency:0.797838
>>> >> Stddev Latency: 0.619098
>>> >> Max latency:4.89823
>>> >> Min latency:0.158543
>>> >>
>>> >> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
>>> >> Total time run:133.563980
>>> >> Total reads made: 6041
>>> >> Read size:4194304
>>> >> Bandwidth (MB/sec):180.917
>>> >>
>>> >> Average Latency:   0.353559
>>> >> Max latency:   1.83356
>>> >> Min latency:   0.027878
>>> >>
>>> >> after configure cache-tier:
>>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add
>>> coldstorage
>>> >> hotstorage pool 'hotstorage' is now (or already was) a tier of
>>> >> 'coldstorage'
>>> >>
>>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
>>> >> hotstorage writeback set cache-mode for pool 'hotstorage' to
>>> >> writeback
>>> >>
>>> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
>>> >> coldstorage hotstorage overlay for 'coldstorage' is now (or already
>>> >> was) 'hotstorage'
>>> >>
>>> >> oot@ubuntu:~# ceph osd dump|grep storage pool 6 'coldstorage'
>>> >> replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins
>>> >> pg_num 512 pgp_num 512 last_change 216 lfor 216 flags hashpspool
>>> >> tiers 7 read_tier 7 write_tier 7 stripe_width 0 pool 7 'hotstorage'
>>> >> replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins
>>> >> pg_num 128 pgp_num 128 last_change 228 flags
>>> >> hashpspool,incomplete_clones tier_of 6 cache_mode writeback
>>> >> target_bytes
>>> >> 100

Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Shinobu Kinjo
Based on my research, 0.2 is better than 0.0.
Probably it depends though.

 > ceph osd crush reweight osd.X 0.0

Rgds,
Shinobu

- Original Message -
From: "Andy Allan" 
To: "Rafael Lopez" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, January 11, 2016 8:08:38 PM
Subject: Re: [ceph-users] double rebalance when removing osd

On 11 January 2016 at 02:10, Rafael Lopez  wrote:

> @Steve, even when you remove due to failing, have you noticed that the 
> cluster rebalances twice using the documented steps? You may not if you don't 
> wait for the initial recovery after 'ceph osd out'. If you do 'ceph osd out' 
> and immediately 'ceph osd crush remove', RH support has told me that this 
> effectively 'cancels' the original move triggered from 'ceph osd out' and 
> starts permanently remapping... which still doesn't really explain why we 
> have to do the ceph osd out in the first place..

This topic was last discussed in December - the documentation for
removing an OSD from the cluster is not helpful. Unfortunately it
doesn't look like anyone is going to fix the documentation.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627

Basically, when you want to remove an OSD, there's an alternative
sequence of commands that avoids the double-rebalance.

The better approach is to reweight the OSD to zero first, then wait
for the (one and only) rebalance, then mark out and remove. Here's
more details from the previous thread:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Rafael Lopez
I removed some osds from a host yesterday using the reweight method and it
worked well. There was only one rebalance and then I could perform the rest
of the documented removal steps immediately with no further recovery. I
reweighted to 0.0.

Shinobu, can you explain why you have found 0.2 is better than 0.0? What
happens when you use 0.2 and what happens when you use 0.0 ?

Rafael



On 12 January 2016 at 09:13, Shinobu Kinjo  wrote:

> Based on my research, 0.2 is better than 0.0.
> Probably it depends though.
>
>  > ceph osd crush reweight osd.X 0.0
>
> Rgds,
> Shinobu
>
> - Original Message -
> From: "Andy Allan" 
> To: "Rafael Lopez" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, January 11, 2016 8:08:38 PM
> Subject: Re: [ceph-users] double rebalance when removing osd
>
> On 11 January 2016 at 02:10, Rafael Lopez  wrote:
>
> > @Steve, even when you remove due to failing, have you noticed that the
> cluster rebalances twice using the documented steps? You may not if you
> don't wait for the initial recovery after 'ceph osd out'. If you do 'ceph
> osd out' and immediately 'ceph osd crush remove', RH support has told me
> that this effectively 'cancels' the original move triggered from 'ceph osd
> out' and starts permanently remapping... which still doesn't really explain
> why we have to do the ceph osd out in the first place..
>
> This topic was last discussed in December - the documentation for
> removing an OSD from the cluster is not helpful. Unfortunately it
> doesn't look like anyone is going to fix the documentation.
>
> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627
>
> Basically, when you want to remove an OSD, there's an alternative
> sequence of commands that avoids the double-rebalance.
>
> The better approach is to reweight the OSD to zero first, then wait
> for the (one and only) rebalance, then mark out and remove. Here's
> more details from the previous thread:
>
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629
>
> Thanks,
> Andy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone:+61 3 9905 9118 <%2B61%203%209905%9118>
Mobile:   +61 4 27 682 670
Email rafael.lo...@monash.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Shinobu Kinjo
I'm not pretty sure about how it works internally.
But if 0.0 works fine to you, that's good.

Rgds,
Shinobu

- Original Message -
From: "Rafael Lopez" 
To: "Shinobu Kinjo" 
Cc: "Andy Allan" , ceph-users@lists.ceph.com
Sent: Tuesday, January 12, 2016 7:20:37 AM
Subject: Re: [ceph-users] double rebalance when removing osd

I removed some osds from a host yesterday using the reweight method and it
worked well. There was only one rebalance and then I could perform the rest
of the documented removal steps immediately with no further recovery. I
reweighted to 0.0.

Shinobu, can you explain why you have found 0.2 is better than 0.0? What
happens when you use 0.2 and what happens when you use 0.0 ?

Rafael



On 12 January 2016 at 09:13, Shinobu Kinjo  wrote:

> Based on my research, 0.2 is better than 0.0.
> Probably it depends though.
>
>  > ceph osd crush reweight osd.X 0.0
>
> Rgds,
> Shinobu
>
> - Original Message -
> From: "Andy Allan" 
> To: "Rafael Lopez" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, January 11, 2016 8:08:38 PM
> Subject: Re: [ceph-users] double rebalance when removing osd
>
> On 11 January 2016 at 02:10, Rafael Lopez  wrote:
>
> > @Steve, even when you remove due to failing, have you noticed that the
> cluster rebalances twice using the documented steps? You may not if you
> don't wait for the initial recovery after 'ceph osd out'. If you do 'ceph
> osd out' and immediately 'ceph osd crush remove', RH support has told me
> that this effectively 'cancels' the original move triggered from 'ceph osd
> out' and starts permanently remapping... which still doesn't really explain
> why we have to do the ceph osd out in the first place..
>
> This topic was last discussed in December - the documentation for
> removing an OSD from the cluster is not helpful. Unfortunately it
> doesn't look like anyone is going to fix the documentation.
>
> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627
>
> Basically, when you want to remove an OSD, there's an alternative
> sequence of commands that avoids the double-rebalance.
>
> The better approach is to reweight the OSD to zero first, then wait
> for the (one and only) rebalance, then mark out and remove. Here's
> more details from the previous thread:
>
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629
>
> Thanks,
> Andy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone:+61 3 9905 9118 <%2B61%203%209905%9118>
Mobile:   +61 4 27 682 670
Email rafael.lo...@monash.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tree output

2016-01-11 Thread Wade Holler
Does anyone else have any suggestions here? I am increasingly concerned
about my config if other folks aren't seeing this.

I could change to a manual crushmap but otherwise have no need to.

I emailed the Ceph-dev list but have not had a response yet.

Best Regards
Wade
On Fri, Jan 8, 2016 at 11:12 AM Wade Holler  wrote:

> It is not set in the conf file.  So why do I still have this behavior ?
>
> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
>
>> Yeah,this setting can not see in asok config.
>> You just set it in ceph.conf and restart mon and osd service(sorry I
>> forget if these restart is necessary)
>>
>> what I use this config is when I changed crushmap manually,and I do not
>> want the service init script to rebuild crushmap as default way.
>>
>> maybe this is not siut for your problem.just have a try.
>>
>> 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler  写道:
>>
>> That is not set as far as I can tell.  Actually it is strange that I
>> don't see that setting at all.
>>
>> [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep
>> crush
>>
>> [root@cpn1 ~]# grep update /etc/ceph/ceph.conf
>>
>> [root@cpn1 ~]#
>>
>> On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen  wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>> Do you have by any chance disabled automatic crushmap updates in your
>>> ceph config?
>>>
>>> osd crush update on start = false
>>>
>>> If this is the case, and you move disks around hosts, they won't update
>>> their position/host in the crushmap, even if the crushmap does not reflect
>>> reality.
>>>
>>> Regards,
>>>
>>> Mart
>>>
>>>
>>>
>>>
>>>
>>> On 01/08/2016 02:16 AM, Wade Holler wrote:
>>>
>>> Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs
>>> per node,  but I will only include a sample:
>>>
>>> ceph osd tree | head -35
>>>
>>> ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>>
>>>  -1 130.98450 root default
>>>
>>>  -2   5.82153 host cpn1
>>>
>>>   4   0.72769 osd.4  up  1.0  1.0
>>>
>>>  14   0.72769 osd.14 up  1.0  1.0
>>>
>>>   3   0.72769 osd.3  up  1.0  1.0
>>>
>>>  24   0.72769 osd.24 up  1.0  1.0
>>>
>>>   5   0.72769 osd.5  up  1.0  1.0
>>>
>>>   2   0.72769 osd.2  up  1.0  1.0
>>>
>>>  17   0.72769 osd.17 up  1.0  1.0
>>>
>>>  69   0.72769 osd.69 up  1.0  1.0
>>>
>>>  -3   6.54922 host cpn3
>>>
>>>   7   0.72769 osd.7  up  1.0  1.0
>>>
>>>   8   0.72769 osd.8  up  1.0  1.0
>>>
>>>   9   0.72769 osd.9  up  1.0  1.0
>>>
>>>   0   0.72769 osd.0  up  1.0  1.0
>>>
>>>  28   0.72769 osd.28 up  1.0  1.0
>>>
>>>  10   0.72769 osd.10 up  1.0  1.0
>>>
>>>   1   0.72769 osd.1  up  1.0  1.0
>>>
>>>   6   0.72769 osd.6  up  1.0  1.0
>>>
>>>  29   0.72769 osd.29 up  1.0  1.0
>>>
>>>  -4   2.91077 host cpn4
>>>
>>>
>>> Compared with the actual processes that are running:
>>>
>>>
>>> [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd
>>>
>>> ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 6 --setuser ceph --setgroup ceph
>>>
>>> ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
>>>
>>> ceph   92673   1 18 16:19 ?00:42:48 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 8 --setuser ceph --setgroup ceph
>>>
>>> ceph   92681   1 19 16:19 ?00:45:52 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 7 --setuser ceph --setgroup ceph
>>>
>>> ceph   92701   1 15 16:19 ?00:36:05 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 12 --setuser ceph --setgroup ceph
>>>
>>> ceph   92748   1 14 16:19 ?00:34:07 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
>>>
>>> ceph   92756   1 16 16:19 ?00:38:40 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 9 --setuser ceph --setgroup ceph
>>>
>>> ceph   92758   1 17 16:19 ?00:39:28 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 13 --setuser ceph --setgroup ceph
>>>
>>> ceph   92777   1 19 16:19 ?00:46:17 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
>>>
>>> ceph   92988   1 18 16:19 ?00:42:47 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
>>>
>>> ceph   93058   1 18 16:19 ?00:43:18 /usr/bin/ceph-osd
>>> -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
>>>
>>> ceph   93078   1 17 16:19 ?

Re: [ceph-users] ceph osd tree output

2016-01-11 Thread John Spray
On Mon, Jan 11, 2016 at 10:32 PM, Wade Holler  wrote:
> Does anyone else have any suggestions here? I am increasingly concerned
> about my config if other folks aren't seeing this.
>
> I could change to a manual crushmap but otherwise have no need to.

What did you use to deploy ceph?  What init system are you using to
start the ceph daemons?  Also, usual questions, what distro, what ceph
version?

The script that updates the OSD host locations in the crush map is
meant to be invoked by the init system before the ceph-osd process,
but if you're starting it in some unconventional way then that might
not be happening.

John

>
> I emailed the Ceph-dev list but have not had a response yet.
>
> Best Regards
> Wade
>
> On Fri, Jan 8, 2016 at 11:12 AM Wade Holler  wrote:
>>
>> It is not set in the conf file.  So why do I still have this behavior ?
>>
>> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
>>>
>>> Yeah,this setting can not see in asok config.
>>> You just set it in ceph.conf and restart mon and osd service(sorry I
>>> forget if these restart is necessary)
>>>
>>> what I use this config is when I changed crushmap manually,and I do not
>>> want the service init script to rebuild crushmap as default way.
>>>
>>> maybe this is not siut for your problem.just have a try.
>>>
>>> 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler  写道:
>>>
>>> That is not set as far as I can tell.  Actually it is strange that I
>>> don't see that setting at all.
>>>
>>> [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep
>>> crush
>>>
>>> [root@cpn1 ~]# grep update /etc/ceph/ceph.conf
>>>
>>> [root@cpn1 ~]#
>>>
>>>
>>> On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen  wrote:



 Hi,

 Do you have by any chance disabled automatic crushmap updates in your
 ceph config?

 osd crush update on start = false

 If this is the case, and you move disks around hosts, they won't update
 their position/host in the crushmap, even if the crushmap does not reflect
 reality.

 Regards,

 Mart





 On 01/08/2016 02:16 AM, Wade Holler wrote:

 Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs
 per node,  but I will only include a sample:

 ceph osd tree | head -35

 ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY

  -1 130.98450 root default

  -2   5.82153 host cpn1

   4   0.72769 osd.4  up  1.0  1.0

  14   0.72769 osd.14 up  1.0  1.0

   3   0.72769 osd.3  up  1.0  1.0

  24   0.72769 osd.24 up  1.0  1.0

   5   0.72769 osd.5  up  1.0  1.0

   2   0.72769 osd.2  up  1.0  1.0

  17   0.72769 osd.17 up  1.0  1.0

  69   0.72769 osd.69 up  1.0  1.0

  -3   6.54922 host cpn3

   7   0.72769 osd.7  up  1.0  1.0

   8   0.72769 osd.8  up  1.0  1.0

   9   0.72769 osd.9  up  1.0  1.0

   0   0.72769 osd.0  up  1.0  1.0

  28   0.72769 osd.28 up  1.0  1.0

  10   0.72769 osd.10 up  1.0  1.0

   1   0.72769 osd.1  up  1.0  1.0

   6   0.72769 osd.6  up  1.0  1.0

  29   0.72769 osd.29 up  1.0  1.0

  -4   2.91077 host cpn4


 Compared with the actual processes that are running:


 [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd

 ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f
 --cluster ceph --id 6 --setuser ceph --setgroup ceph

 ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f
 --cluster ceph --id 0 --setuser ceph --setgroup ceph

 ceph   92673   1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f
 --cluster ceph --id 8 --setuser ceph --setgroup ceph

 ceph   92681   1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f
 --cluster ceph --id 7 --setuser ceph --setgroup ceph

 ceph   92701   1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f
 --cluster ceph --id 12 --setuser ceph --setgroup ceph

 ceph   92748   1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f
 --cluster ceph --id 10 --setuser ceph --setgroup ceph

 ceph   92756   1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f
 --cluster ceph --id 9 --setuser ceph --setgroup ceph

 ceph  

Re: [ceph-users] ceph osd tree output

2016-01-11 Thread Wade Holler
Deployment method: ceph-deploy
Centos 7.2, systemctl
Infernalis.  This also happened when I was testing @ Jewell.

I am restarting ( or stop/start ) the ceph old processes ( after they die
or something ), with:

systemctl stop|start|restart ceph.target

Is there another way that it is more appropriate ?


Thank you ahead of time for your help!


Best Regards,
Wade

On Mon, Jan 11, 2016 at 5:43 PM John Spray  wrote:

> On Mon, Jan 11, 2016 at 10:32 PM, Wade Holler 
> wrote:
> > Does anyone else have any suggestions here? I am increasingly concerned
> > about my config if other folks aren't seeing this.
> >
> > I could change to a manual crushmap but otherwise have no need to.
>
> What did you use to deploy ceph?  What init system are you using to
> start the ceph daemons?  Also, usual questions, what distro, what ceph
> version?
>
> The script that updates the OSD host locations in the crush map is
> meant to be invoked by the init system before the ceph-osd process,
> but if you're starting it in some unconventional way then that might
> not be happening.
>
> John
>
> >
> > I emailed the Ceph-dev list but have not had a response yet.
> >
> > Best Regards
> > Wade
> >
> > On Fri, Jan 8, 2016 at 11:12 AM Wade Holler 
> wrote:
> >>
> >> It is not set in the conf file.  So why do I still have this behavior ?
> >>
> >> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin 
> wrote:
> >>>
> >>> Yeah,this setting can not see in asok config.
> >>> You just set it in ceph.conf and restart mon and osd service(sorry I
> >>> forget if these restart is necessary)
> >>>
> >>> what I use this config is when I changed crushmap manually,and I do not
> >>> want the service init script to rebuild crushmap as default way.
> >>>
> >>> maybe this is not siut for your problem.just have a try.
> >>>
> >>> 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler 
> 写道:
> >>>
> >>> That is not set as far as I can tell.  Actually it is strange that I
> >>> don't see that setting at all.
> >>>
> >>> [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep
> >>> crush
> >>>
> >>> [root@cpn1 ~]# grep update /etc/ceph/ceph.conf
> >>>
> >>> [root@cpn1 ~]#
> >>>
> >>>
> >>> On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen 
> wrote:
> 
> 
> 
>  Hi,
> 
>  Do you have by any chance disabled automatic crushmap updates in your
>  ceph config?
> 
>  osd crush update on start = false
> 
>  If this is the case, and you move disks around hosts, they won't
> update
>  their position/host in the crushmap, even if the crushmap does not
> reflect
>  reality.
> 
>  Regards,
> 
>  Mart
> 
> 
> 
> 
> 
>  On 01/08/2016 02:16 AM, Wade Holler wrote:
> 
>  Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs
>  per node,  but I will only include a sample:
> 
>  ceph osd tree | head -35
> 
>  ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> 
>   -1 130.98450 root default
> 
>   -2   5.82153 host cpn1
> 
>    4   0.72769 osd.4  up  1.0  1.0
> 
>   14   0.72769 osd.14 up  1.0  1.0
> 
>    3   0.72769 osd.3  up  1.0  1.0
> 
>   24   0.72769 osd.24 up  1.0  1.0
> 
>    5   0.72769 osd.5  up  1.0  1.0
> 
>    2   0.72769 osd.2  up  1.0  1.0
> 
>   17   0.72769 osd.17 up  1.0  1.0
> 
>   69   0.72769 osd.69 up  1.0  1.0
> 
>   -3   6.54922 host cpn3
> 
>    7   0.72769 osd.7  up  1.0  1.0
> 
>    8   0.72769 osd.8  up  1.0  1.0
> 
>    9   0.72769 osd.9  up  1.0  1.0
> 
>    0   0.72769 osd.0  up  1.0  1.0
> 
>   28   0.72769 osd.28 up  1.0  1.0
> 
>   10   0.72769 osd.10 up  1.0  1.0
> 
>    1   0.72769 osd.1  up  1.0  1.0
> 
>    6   0.72769 osd.6  up  1.0  1.0
> 
>   29   0.72769 osd.29 up  1.0  1.0
> 
>   -4   2.91077 host cpn4
> 
> 
>  Compared with the actual processes that are running:
> 
> 
>  [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd
> 
>  ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd
> -f
>  --cluster ceph --id 6 --setuser ceph --setgroup ceph
> 
>  ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd
> -f
>  --cluster ceph --id 0 --setuser ceph --setgroup ceph
> 
>  ceph   

[ceph-users] ceph instability problem

2016-01-11 Thread Csaba Tóth
Dear Ceph Developers!

First of all i would tell you i love this software! I am still a beginner
using Ceph, but i like it very much, and i see the potential in it, so i
would use it in the future too, if i can.

A little background before i tell my problem:
I have a smaller bunch of servers, 5 for ceph (2 mon, 4 osd, 2 mds, yes,
there is overlapping between them) and 5 for workload. I use ceph for rbd
and cephfs too. All of my work servers are without hdd, i use PXE boot to
load kernel, than a small script in initrd mounts the rbd image. And i use
cephfs for common datas, like session files, etc. All of the servers use
Ubuntu Linux, trusty for ceph machines and wily for work machines. The
Cephfs has a big amont of load, because PostgreSQL works into there too.
I use Hammer at the moment, the latest version, my plan is to move to
infernalis when the first bugfix version comes out.

My problem is that the rbd driver in kernel is not production stable. All
of my work servers are crash randomly, they can stay alive maximum for a
week. I have another servers, what use only Cephfs (so they have real hdd,
and not use the rbd based virtual hdd), and they are rock stable. Only the
5 work server, what use the rbd as root hdd crashes. But they all the time.
And this makes very big headache to me, i need to restart everything after
some day, and this is insane.. I tried every other things (networking,
etc), but now i am very sure the problem is with the rbd driver in kernel.

Please tell me how can i make bug report, to help in fixing this bug. I
don't know how to make core dump, but if anyone tells me i do everything!
Really, this is very annoying to me.

Except this (and the lack of real redundancy of mds) Ceph works very well!
:)

Thanks for advance,
Csaba
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] an osd feign death,but ceph health is ok

2016-01-11 Thread hnuzhoulin

Hi,guys.
right now,I face a problem in my openstack+ceph.
some vm can not start and some occur blue screen。

the output of ceph -s say the cluster is OK.

So I using following command to check the volume first:
rbd ls -p volumes|while read line;do rbd info $line -p volumes ;done

then quickly I get a error:
2016-01-11 15:23:13.574314 7f8d95d8a700  0 -- 10.1.41.52:0/1003136 >>
192.168.1.5:6805/18353 pipe(0x22d3bc0 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x22d3e50).fault

so I restart the osd which pid is 18353 in host 192.168.1.5

then it spend some time to recovery,when it become OK again.I can restart
vm as normal now.

so my problem is,how can I know whether occur feign death that ceph -s can
not monitor.

My ceoh is firefly 0.80.7

Thanks


-
hnuzhou...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com