Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-10 Thread hnuzhoulin

Thanks .
The version waht I used is firefly 0.80.11.
I will check and keep tracking with this bug.

在 Sat, 09 Jan 2016 00:28:13 +0800,Nick Fisk  写道:

There was/is a bug in Infernalis and older, where objects will always  
get promoted on the 2nd read/write regardless of what you set the  
min_recency_promote settings to. This can have a dramatic effect on  
performance. I wonder if this is what you are experiencing?


This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 .

You can compile the changes above to see if it helps or I have a .deb  
for Infernalis where this is fixed if it's easier.


Nick


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Wade Holler
Sent: 08 January 2016 16:14
To: hnuzhoulin ; ceph-de...@vger.kernel.org
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] using cache-tier with writeback mode, raods  
bench

result degrade

My experience is performance degrades dramatically when dirty objects  
are

flushed.

Best Regards,
Wade


On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin   
wrote:

Hi,guyes
Recentlly,I am testing  cache-tier using writeback mode.but I found a
strange things.
the performance  using rados bench degrade.Is it correct?
If so,how to explain.following some info about my test:

storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
other
one used as OSD),four sata as OSD.

before using cache-tier:
root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup

Total time run: 301.236355
Total writes made:  6041
Write size: 4194304
Bandwidth (MB/sec): 80.216

Stddev Bandwidth:   10.5358
Max bandwidth (MB/sec): 104
Min bandwidth (MB/sec): 0
Average Latency:0.797838
Stddev Latency: 0.619098
Max latency:4.89823
Min latency:0.158543

root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
Total time run:133.563980
Total reads made: 6041
Read size:4194304
Bandwidth (MB/sec):180.917

Average Latency:   0.353559
Max latency:   1.83356
Min latency:   0.027878

after configure cache-tier:
root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage
hotstorage
pool 'hotstorage' is now (or already was) a tier of 'coldstorage'

root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
hotstorage writeback
set cache-mode for pool 'hotstorage' to writeback

root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
coldstorage hotstorage
overlay for 'coldstorage' is now (or already was) 'hotstorage'

oot@ubuntu:~# ceph osd dump|grep storage
pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216
flags
hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0
pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1
object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags
hashpspool,incomplete_clones tier_of 6 cache_mode writeback
target_bytes
1000 hit_set bloom{false_positive_probability: 0.05,  
target_size:

0, seed: 0} 3600s x6 stripe_width 0
-
rados bench -p coldstorage 300 write --no-cleanup
Total time run: 302.207573
Total writes made: 4315
Write size: 4194304
Bandwidth (MB/sec): 57.113

Stddev Bandwidth: 23.9375
Max bandwidth (MB/sec): 104
Min bandwidth (MB/sec): 0
Average Latency: 1.1204
Stddev Latency: 0.717092
Max latency: 6.97288
Min latency: 0.158371

root@ubuntu:/# rados bench -p coldstorage 300 seq
Total time run: 153.869741
Total reads made: 4315
Read size: 4194304
Bandwidth (MB/sec): 112.173

Average Latency: 0.570487
Max latency: 1.75137
Min latency: 0.039635


ceph.conf:

[global]
fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
mon_initial_members = ceph2, ceph3, ceph4
mon_host = 10.**.**.241,10.**.**.242,10.**.**.243
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 3
osd_pool_default_min_size = 1
auth_supported = cephx
osd_journal_size = 10240
osd_mkfs_type = xfs
osd crush update on start = false

[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = false
rbd_cache_size = 33554432
rbd_cache_max_dirty = 25165824
rbd_cache_target_dirty = 16777216
rbd_cache_max_dirty_age = 1
rbd_cache_block_writes_upfront = false
[osd]
filestore_omap_header_cache_size = 4
filestore_fd_cache_size = 4
filestore_fiemap = true
client_readahead_min = 2097152
client_readahead_max_bytes = 0
client_readahead_max_periods = 4
filestore_journal_writeahead = false
filestore_max_sync_interval = 10
filestore_queue_max_ops = 500
filestore_queue_max_bytes = 1048576000
filestore_queue_committing_max_ops = 5000
filestore_queue_committing_max_bytes = 1048576000
keyvaluestore_queue_max_ops = 500
keyvaluest

Re: [ceph-users] Infernalis upgrade breaks when journal on separate partition

2016-01-10 Thread Stuart Longland
On 05/01/16 07:52, Stuart Longland wrote:
>> I ran into this same issue, and found that a reboot ended up setting the
>> > ownership correctly.  If you look at /lib/udev/rules.d/95-ceph-osd.rules
>> > you'll see the magic that makes it happen
> Ahh okay, good-o, so a reboot should be fine.  I guess adding chown-ing
> of journal files would be a good idea (maybe it's version specific, but
> chown -R did not follow the symlink and change ownership for me).

Well, it seems I spoke to soon.  Not sure what logic the udev rules use
to identify ceph journals, but it doesn't seem to pick up on the
journals in our case as after a reboot, those partitions are owned by
root:disk with permissions 0660.

Adding 'ceph' user to the 'disk' group oddly enough isn't sufficient.
For the record:
> root@bneprdsn0:~# fdisk -l /dev/sdc
> 
> Disk /dev/sdc: 60.0 GB, 60022480896 bytes
> 255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0001c576
> 
>Device Boot  Start End  Blocks   Id  System
> /dev/sdc1   *204817006591 8502272   83  Linux
> /dev/sdc217006592   117231407501124085  Extended
> /dev/sdc5170086405895167920971520   83  Linux
> /dev/sdc658953728   10089676720971520   83  Linux
> /dev/sdc7   100898816   117229567 8165376   82  Linux swap / Solaris

sdc5 and sdc6 are the journals for sda1 and sdb1.  Journal disk here is
a Intel 520-series 60GB SSD, shared with the host OS, formatted with a
MS-DOS disklabel.  I'm not sure what partition type the Ceph udev rules
expect.
-- 
 _ ___ Stuart Longland - Systems Engineer
\  /|_) |   T: +61 7 3535 9619
 \/ | \ | 38b Douglas StreetF: +61 7 3535 9699
   SYSTEMSMilton QLD 4064   http://www.vrt.com.au
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-10 Thread Rafael Lopez
Thanks for the replies guys.

@Steve, even when you remove due to failing, have you noticed that the
cluster rebalances twice using the documented steps? You may not if you
don't wait for the initial recovery after 'ceph osd out'. If you do 'ceph
osd out' and immediately 'ceph osd crush remove', RH support has told me
that this effectively 'cancels' the original move triggered from 'ceph osd
out' and starts permanently remapping... which still doesn't really explain
why we have to do the ceph osd out in the first place..

@Dan, good to hear it works, I will try that method next time and see how
it goes!


On 8 January 2016 at 03:08, Steve Taylor 
wrote:

> If I’m not mistaken, marking an osd out will remap its placement groups
> temporarily, while removing it from the crush map will remap the placement
> groups permanently. Additionally, other placement groups from other osds
> could get remapped permanently when an osd is removed from the crush map. I
> would think the only benefit to marking an osd out before stopping it would
> be a cleaner redirection of client I/O before the osd disappears, which may
> be worthwhile if you’re removing a healthy osd.
>
>
>
> As for reweighting to 0 prior to removing an osd, it seems like that would
> give the osd the ability to participate in the recovery essentially in
> read-only fashion (plus deletes) until it’s empty, so objects wouldn’t
> become degraded as placement groups are backfilling onto other osds. Again,
> this would really only be useful if you’re removing a healthy osd. If
> you’re removing an osd where other osds in different failure domains are
> known to be unhealthy, it seems like this would be a really good idea.
>
>
>
> I usually follow the documented steps you’ve outlined myself, but I’m
> typically removing osds due to failed/failing drives while the rest of the
> cluster is healthy.
> --
>
> *Steve Taylor* | Senior Software Engineer | StorageCraft Technology
> Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> *Office: *801.871.2799 | *Fax: *801.545.4705
> --
>
> If you are not the intended recipient of this message, be advised that any
> dissemination or copying of this message is prohibited.
> If you received this message erroneously, please notify the sender and
> delete it, together with any attachments.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Rafael Lopez
> *Sent:* Wednesday, January 06, 2016 4:53 PM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] double rebalance when removing osd
>
>
>
> Hi all,
>
>
>
> I am curious what practices other people follow when removing OSDs from a
> cluster. According to the docs, you are supposed to:
>
>
>
> 1. ceph osd out
>
> 2. stop daemon
>
> 3. ceph osd crush remove
>
> 4. ceph auth del
>
> 5. ceph osd rm
>
>
>
> What value does ceph osd out (1) add to the removal process and why is it
> in the docs ? We have found (as have others) that by outing(1) and then
> crush removing (3), the cluster has to do two recoveries. Is it necessary?
> Can you just do a crush remove without step 1?
>
>
>
> I found this earlier message from GregF which he seems to affirm that just
> doing the crush remove is fine:
>
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html
>
>
>
> This recent blog post from Sebastien that suggests reweighting to 0 first,
> but havent tested it:
>
> http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/
>
>
>
> I thought that by marking it out, it sets the reweight to 0 anyway, so not
> sure how this would make a difference in terms of two rebalances but maybe
> there is a subtle difference.. ?
>
>
>
> Thanks,
>
> Raf
>
>
>
> --
>
> Senior Storage Engineer - Automation and Delivery
> Infrastructure Services - eSolutions
>
>


-- 
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone:+61 3 9905 9118 <%2B61%203%209905%9118>
Mobile:   +61 4 27 682 670
Email rafael.lo...@monash.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com