Re: [ceph-users] rbd format v2 support

2015-06-07 Thread Ilya Dryomov
On Fri, Jun 5, 2015 at 6:47 AM, David Z  wrote:
> Hi Ceph folks,
>
> We want to use rbd format v2, but find it is not supported on kernel 3.10.0 
> of centos 7:
>
> [ceph@ ~]$ sudo rbd map zhi_rbd_test_1
> rbd: sysfs write failed
> rbd: map failed: (22) Invalid argument
> [ceph@ ~]$ dmesg | tail
> [662453.664746] rbd: image zhi_rbd_test_1: unsupported stripe unit (got 8192 
> want 4194304)
>
> As it described in ceph doc, it should be available from kernel 3.11. But I 
> checked the code of kernel 3.12, 3.14 and even 4.1. This piece of code is 
> still there, see below links. Do I miss some codes or info?

What you are referring to is called "fancy striping" and it is
unsupported (work is underway but it's been slow going).  However
because v2 images with *default* striping parameters disk format wise
are the same as v1 images, you can map a v2 image provided you didn't
specify custom --stripe-unit or --stripe-count on rbd create.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Orphan PG

2015-06-07 Thread Marek Dohojda
Thank you.  Unfortunately this won't work because 0.21 is already being
creating:
~# ceph pg force_create_pg 0.21
pg 0.21 already creating


I think, and I am guessing here since I don't know internals that well,
that 0.21 started to be created but since its OSD disappear it never
finished and it keeps trying.

On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada  wrote:

> Marek Dohojda:
>
> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map
>>
>> #ceph pg map 0.21
>> osdmap e579 pg 0.21 (0.21) -> up [] acting []
>>
>> #ceph pg dump_stuck stale
>> ok
>> pg_stat state   up  up_primary  acting  acting_primary
>> 0.22stale+active+clean  [5,1,6] 5   [5,1,6] 5
>> 0.1fstale+active+clean  [2,0,4] 2   [2,0,4] 2
>> 
>>
>> # ceph osd stat
>>  osdmap e579: 14 osds: 14 up, 14 in
>>
>> If I do
>> #ceph pg 0.21 query
>>
>> The command freezes and never returns any output.
>>
>> I suspect that the problem is that these PGs were created but the OSD
>> that they were initially created under disappeared.  So I believe that I
>> should just remove these PGs, but honestly I don’t see how.
>>
>> Does anybody have any ideas as to what to do next?
>>
>
> ceph pg force_create_pg 0.21
>
> We've been playing last week with this same scenario: we stopped on
> purpose the 3 OSD with the replicas of one PG to find out how it affected
> to the cluster and we ended up with a stale PG and 400 requests blocked for
> a long time. After trying several commands to get the cluster back the one
> that made the difference was force_create_pg and later moving the OSD with
> blocked requests out of the cluster.
>
> Hope that helps,
> Alex
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Orphan PG

2015-06-07 Thread Alex Muntada
That happened also to us, but after moving the OSDs with blocked requests
out of the cluster it eventually regained health OK.

Running ceph health details should list those OSDs. Do you have any?
El dia 07/06/2015 16:16, "Marek Dohojda"  va
escriure:

> Thank you.  Unfortunately this won't work because 0.21 is already being
> creating:
> ~# ceph pg force_create_pg 0.21
> pg 0.21 already creating
>
>
> I think, and I am guessing here since I don't know internals that well,
> that 0.21 started to be created but since its OSD disappear it never
> finished and it keeps trying.
>
> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada  wrote:
>
>> Marek Dohojda:
>>
>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map
>>>
>>> #ceph pg map 0.21
>>> osdmap e579 pg 0.21 (0.21) -> up [] acting []
>>>
>>> #ceph pg dump_stuck stale
>>> ok
>>> pg_stat state   up  up_primary  acting  acting_primary
>>> 0.22stale+active+clean  [5,1,6] 5   [5,1,6] 5
>>> 0.1fstale+active+clean  [2,0,4] 2   [2,0,4] 2
>>> 
>>>
>>> # ceph osd stat
>>>  osdmap e579: 14 osds: 14 up, 14 in
>>>
>>> If I do
>>> #ceph pg 0.21 query
>>>
>>> The command freezes and never returns any output.
>>>
>>> I suspect that the problem is that these PGs were created but the OSD
>>> that they were initially created under disappeared.  So I believe that I
>>> should just remove these PGs, but honestly I don’t see how.
>>>
>>> Does anybody have any ideas as to what to do next?
>>>
>>
>> ceph pg force_create_pg 0.21
>>
>> We've been playing last week with this same scenario: we stopped on
>> purpose the 3 OSD with the replicas of one PG to find out how it affected
>> to the cluster and we ended up with a stale PG and 400 requests blocked for
>> a long time. After trying several commands to get the cluster back the one
>> that made the difference was force_create_pg and later moving the OSD with
>> blocked requests out of the cluster.
>>
>> Hope that helps,
>> Alex
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Orphan PG

2015-06-07 Thread Marek Dohojda
I think this is the issue.  look at ceph health detail you will see that
0.21 and others are orphan:
HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22 pgs
stuck unclean; too many PGs per OSD (456 > max 300)
pg 0.21 is stuck inactive since forever, current state creating, last
acting []
pg 0.7 is stuck inactive since forever, current state creating, last acting
[]
pg 5.2 is stuck inactive since forever, current state creating, last acting
[]
pg 1.7 is stuck inactive since forever, current state creating, last acting
[]
pg 0.34 is stuck inactive since forever, current state creating, last
acting []
pg 0.33 is stuck inactive since forever, current state creating, last
acting []
pg 5.1 is stuck inactive since forever, current state creating, last acting
[]
pg 0.1b is stuck inactive since forever, current state creating, last
acting []
pg 0.32 is stuck inactive since forever, current state creating, last
acting []
pg 1.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.31 is stuck inactive since forever, current state creating, last
acting []
pg 2.0 is stuck inactive since forever, current state creating, last acting
[]
pg 5.7 is stuck inactive since forever, current state creating, last acting
[]
pg 1.0 is stuck inactive since forever, current state creating, last acting
[]
pg 2.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.16 is stuck inactive since forever, current state creating, last
acting []
pg 0.15 is stuck inactive since forever, current state creating, last
acting []
pg 0.2b is stuck inactive since forever, current state creating, last
acting []
pg 0.3f is stuck inactive since forever, current state creating, last
acting []
pg 0.27 is stuck inactive since forever, current state creating, last
acting []
pg 0.3c is stuck inactive since forever, current state creating, last
acting []
pg 0.3a is stuck inactive since forever, current state creating, last
acting []
pg 0.21 is stuck unclean since forever, current state creating, last acting
[]
pg 0.7 is stuck unclean since forever, current state creating, last acting
[]
pg 5.2 is stuck unclean since forever, current state creating, last acting
[]
pg 1.7 is stuck unclean since forever, current state creating, last acting
[]
pg 0.34 is stuck unclean since forever, current state creating, last acting
[]
pg 0.33 is stuck unclean since forever, current state creating, last acting
[]
pg 5.1 is stuck unclean since forever, current state creating, last acting
[]
pg 0.1b is stuck unclean since forever, current state creating, last acting
[]
pg 0.32 is stuck unclean since forever, current state creating, last acting
[]
pg 1.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.31 is stuck unclean since forever, current state creating, last acting
[]
pg 2.0 is stuck unclean since forever, current state creating, last acting
[]
pg 5.7 is stuck unclean since forever, current state creating, last acting
[]
pg 1.0 is stuck unclean since forever, current state creating, last acting
[]
pg 2.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.16 is stuck unclean since forever, current state creating, last acting
[]
pg 0.15 is stuck unclean since forever, current state creating, last acting
[]
pg 0.2b is stuck unclean since forever, current state creating, last acting
[]
pg 0.3f is stuck unclean since forever, current state creating, last acting
[]
pg 0.27 is stuck unclean since forever, current state creating, last acting
[]
pg 0.3c is stuck unclean since forever, current state creating, last acting
[]
pg 0.3a is stuck unclean since forever, current state creating, last acting
[]


On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada  wrote:

> That happened also to us, but after moving the OSDs with blocked requests
> out of the cluster it eventually regained health OK.
>
> Running ceph health details should list those OSDs. Do you have any?
> El dia 07/06/2015 16:16, "Marek Dohojda" 
> va escriure:
>
> Thank you.  Unfortunately this won't work because 0.21 is already being
>> creating:
>> ~# ceph pg force_create_pg 0.21
>> pg 0.21 already creating
>>
>>
>> I think, and I am guessing here since I don't know internals that well,
>> that 0.21 started to be created but since its OSD disappear it never
>> finished and it keeps trying.
>>
>> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada  wrote:
>>
>>> Marek Dohojda:
>>>
>>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map

 #ceph pg map 0.21
 osdmap e579 pg 0.21 (0.21) -> up [] acting []

 #ceph pg dump_stuck stale
 ok
 pg_stat state   up  up_primary  acting  acting_primary
 0.22stale+active+clean  [5,1,6] 5   [5,1,6] 5
 0.1fstale+active+clean  [2,0,4] 2   [2,0,4] 2
 

 # ceph osd stat
  osdmap e579: 14 osds: 14 up, 14 in

 If I do
 #ceph pg 0.21 query

 The command freezes and 

Re: [ceph-users] Orphan PG

2015-06-07 Thread Marek Dohojda
Incidentally I am having similar issues with other PG:

For instance:
pg 0.23 is stuck stale for 302497.994355, current state stale+active+clean,
last acting [5,2,4]


when I do:
# ceph pg 0.23 query

or
# ceph pg 5.5 query

It also freezes.  I can't seem to see anything unusual in the log files, or
in any display of OSDs.

On Sun, Jun 7, 2015 at 8:41 AM, Marek Dohojda 
wrote:

> I think this is the issue.  look at ceph health detail you will see that
> 0.21 and others are orphan:
> HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22
> pgs stuck unclean; too many PGs per OSD (456 > max 300)
> pg 0.21 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.34 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.33 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.1 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.1b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.32 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.31 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.16 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.15 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.2b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3f is stuck inactive since forever, current state creating, last
> acting []
> pg 0.27 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3c is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3a is stuck inactive since forever, current state creating, last
> acting []
> pg 0.21 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.34 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.33 is stuck unclean since forever, current state creating, last
> acting []
> pg 5.1 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.1b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.32 is stuck unclean since forever, current state creating, last
> acting []
> pg 1.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.31 is stuck unclean since forever, current state creating, last
> acting []
> pg 2.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 2.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.16 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.15 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.2b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3f is stuck unclean since forever, current state creating, last
> acting []
> pg 0.27 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3c is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3a is stuck unclean since forever, current state creating, last
> acting []
>
>
> On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada  wrote:
>
>> That happened also to us, but after moving the OSDs with blocked requests
>> out of the cluster it eventually regained health OK.
>>
>> Running ceph health details should list those OSDs. Do you have any?
>> El dia 07/06/2015 16:16, "Marek Dohojda" 
>> va escriure:
>>
>> Thank you.  Unfortunately this won't work because 0.21 is already being
>>> creating:
>>> ~# ceph pg force_create_pg 0.21
>>> pg 0.21 already creating
>>>
>>>
>>> I think, and I am guessing here since I don't know internals that well,
>>> that 0.21 started to be created but since its OSD disappear it never
>>> finished and it keeps trying.
>>>
>>> On Sun, Jun 7, 2015 at 12:18 AM, Alex M

Re: [ceph-users] Orphan PG

2015-06-07 Thread Alex Muntada
You can try moving osd.5 out and see what happens next.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Orphan PG

2015-06-07 Thread Marek Dohojda
Unfortunately nothing.  It done its thing, re-balanced it, and left with
same thing in the end.  BTW Thank you very much for the time and
suggestion, I really appreciate it.

ceph health detail
HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22 pgs
stuck unclean; too many PGs per OSD (456 > max 300)
pg 0.21 is stuck inactive since forever, current state creating, last
acting []
pg 0.7 is stuck inactive since forever, current state creating, last acting
[]
pg 5.2 is stuck inactive since forever, current state creating, last acting
[]
pg 1.7 is stuck inactive since forever, current state creating, last acting
[]
pg 0.34 is stuck inactive since forever, current state creating, last
acting []
pg 0.33 is stuck inactive since forever, current state creating, last
acting []
pg 5.1 is stuck inactive since forever, current state creating, last acting
[]
pg 0.1b is stuck inactive since forever, current state creating, last
acting []
pg 0.32 is stuck inactive since forever, current state creating, last
acting []
pg 1.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.31 is stuck inactive since forever, current state creating, last
acting []
pg 2.0 is stuck inactive since forever, current state creating, last acting
[]
pg 5.7 is stuck inactive since forever, current state creating, last acting
[]
pg 1.0 is stuck inactive since forever, current state creating, last acting
[]
pg 2.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.16 is stuck inactive since forever, current state creating, last
acting []
pg 0.15 is stuck inactive since forever, current state creating, last
acting []
pg 0.2b is stuck inactive since forever, current state creating, last
acting []
pg 0.3f is stuck inactive since forever, current state creating, last
acting []
pg 0.27 is stuck inactive since forever, current state creating, last
acting []
pg 0.3c is stuck inactive since forever, current state creating, last
acting []
pg 0.3a is stuck inactive since forever, current state creating, last
acting []
pg 0.21 is stuck unclean since forever, current state creating, last acting
[]
pg 0.7 is stuck unclean since forever, current state creating, last acting
[]
pg 5.2 is stuck unclean since forever, current state creating, last acting
[]
pg 1.7 is stuck unclean since forever, current state creating, last acting
[]
pg 0.34 is stuck unclean since forever, current state creating, last acting
[]
pg 0.33 is stuck unclean since forever, current state creating, last acting
[]
pg 5.1 is stuck unclean since forever, current state creating, last acting
[]
pg 0.1b is stuck unclean since forever, current state creating, last acting
[]
pg 0.32 is stuck unclean since forever, current state creating, last acting
[]
pg 1.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.31 is stuck unclean since forever, current state creating, last acting
[]
pg 2.0 is stuck unclean since forever, current state creating, last acting
[]
pg 5.7 is stuck unclean since forever, current state creating, last acting
[]
pg 1.0 is stuck unclean since forever, current state creating, last acting
[]
pg 2.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.16 is stuck unclean since forever, current state creating, last acting
[]
pg 0.15 is stuck unclean since forever, current state creating, last acting
[]
pg 0.2b is stuck unclean since forever, current state creating, last acting
[]
pg 0.3f is stuck unclean since forever, current state creating, last acting
[]
pg 0.27 is stuck unclean since forever, current state creating, last acting
[]
pg 0.3c is stuck unclean since forever, current state creating, last acting
[]
pg 0.3a is stuck unclean since forever, current state creating, last acting
[]
pg 0.22 is stuck stale for 321498.474316, current state stale+active+clean,
last acting [5,1,6]
pg 0.1f is stuck stale for 322077.806428, current state stale+active+clean,
last acting [2,0,4]
pg 0.1e is stuck stale for 321075.078151, current state stale+active+clean,
last acting [6,5,4]
pg 0.1d is stuck stale for 1473417.920791, current state
stale+active+clean, last acting [4,5,2]
pg 0.19 is stuck stale for 322454.627532, current state stale+active+clean,
last acting [0,2,6]
pg 0.18 is stuck stale for 322077.806431, current state stale+active+clean,
last acting [2,0,3]
pg 0.17 is stuck stale for 1473417.920792, current state
stale+active+clean, last acting [4,3,6]
pg 0.14 is stuck stale for 1473417.920791, current state
stale+active+clean, last acting [4,3,2]
pg 0.13 is stuck stale for 321075.078158, current state stale+active+clean,
last acting [6,4,0]
pg 0.12 is stuck stale for 321075.078159, current state stale+active+clean,
last acting [6,4,5]
pg 0.10 is stuck stale for 321075.078159, current state stale+active+clean,
last acting [6,0,2]
pg 0.f is stuck stale for 321498.474329, current state stale+active+clean,
last acting [5,6,3]
pg 0.e is stuck stale for 321075.078160, cu

[ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Cameron . Scrace
Setting up a Ceph cluster and we want the journals for our spinning disks 
to be on SSDs but all of our SSDs are 1TB. We were planning on putting 3 
journals on each SSD, but that leaves 900+GB unused on the drive, is it 
possible to use the leftover space as another OSD or will it affect 
performance too much?

Thanks,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz
Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmas...@solnetsolutions.co.nz as
soon as possible. The content of this email does not necessarily
reflect the views of Solnet Solutions Ltd.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Orphan PG

2015-06-07 Thread Marek Dohojda
Well I think I got it! The issue was with pools that were created but then
their OSD were pulled from under them via crush rools (I created SSD and
regular disk pools and move the OSD into these).  After I deleted these
pools, all the bad PGs dissipated, which made perfect sense, since these
were referencing non-existent OSDs.



On Sun, Jun 7, 2015 at 2:00 PM, Marek Dohojda 
wrote:

> Unfortunately nothing.  It done its thing, re-balanced it, and left with
> same thing in the end.  BTW Thank you very much for the time and
> suggestion, I really appreciate it.
>
> ceph health detail
> HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22
> pgs stuck unclean; too many PGs per OSD (456 > max 300)
> pg 0.21 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.34 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.33 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.1 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.1b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.32 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.31 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.16 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.15 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.2b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3f is stuck inactive since forever, current state creating, last
> acting []
> pg 0.27 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3c is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3a is stuck inactive since forever, current state creating, last
> acting []
> pg 0.21 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.34 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.33 is stuck unclean since forever, current state creating, last
> acting []
> pg 5.1 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.1b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.32 is stuck unclean since forever, current state creating, last
> acting []
> pg 1.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.31 is stuck unclean since forever, current state creating, last
> acting []
> pg 2.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 2.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.16 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.15 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.2b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3f is stuck unclean since forever, current state creating, last
> acting []
> pg 0.27 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3c is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3a is stuck unclean since forever, current state creating, last
> acting []
> pg 0.22 is stuck stale for 321498.474316, current state
> stale+active+clean, last acting [5,1,6]
> pg 0.1f is stuck stale for 322077.806428, current state
> stale+active+clean, last acting [2,0,4]
> pg 0.1e is stuck stale for 321075.078151, current state
> stale+active+clean, last acting [6,5,4]
> pg 0.1d is stuck stale for 1473417.920791, current state
> stale+active+clean, last acting [4,5,2]
> pg 0.19 is stuck stale for 322454.627532, current state
> stale+active+clean, last acting [0,2,6]
> pg 0.18 is stuck stale for 322077.806431, current state
> stale+active+clean, last acting [2,0,3]
> pg 0.17 is stuck s

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Somnath Roy
Cameron,
Generally, it's not a good idea.
You want to protect your SSDs used as journal.If any problem on that disk, you 
will be losing all of your dependent OSDs.
I don't think a bigger journal will gain you much performance , so, default 5 
GB journal size should be good enough. If you want to reduce the fault domain 
and want to put 3 journals on a SSD , go for minimum size and high endurance 
SSDs for that.
Now, if you want to use your rest of space of 1 TB ssd, creating just OSDs will 
not gain you much (rather may get some burst performance). You may want to 
consider the following.

1. If your spindle OSD size is much bigger than 900 GB , you don't want to make 
all OSDs of similar sizes, cache pool could be one of your option. But, 
remember, cache pool can wear out your SSDs faster as presently I guess it is 
not optimizing the extra writes. Sorry, I don't have exact data as I am yet to 
test that out.

2. If you want to make all the OSDs of similar sizes and you will be able to 
create a substantial number of OSDs with your unused SSDs (depends on how big 
the cluster is), you may want to put all of your primary OSDs to SSD and gain 
significant performance boost for read. Also, in this case, I don't think you 
will be getting any burst performance.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
cameron.scr...@solnet.co.nz
Sent: Sunday, June 07, 2015 1:49 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] Multiple journals and an OSD on one SSD doable?

Setting up a Ceph cluster and we want the journals for our spinning disks to be 
on SSDs but all of our SSDs are 1TB. We were planning on putting 3 journals on 
each SSD, but that leaves 900+GB unused on the drive, is it possible to use the 
leftover space as another OSD or will it affect performance too much?

Thanks,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzAttention: This email may contain information intended for the 
sole use of the original recipient. Please respect this when sharing or 
disclosing this email's contents with any third party. If you believe you have 
received this email in error, please delete it and notify the sender or 
postmas...@solnetsolutions.co.nz as 
soon as possible. The content of this email does not necessarily reflect the 
views of Solnet Solutions Ltd.



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Cameron . Scrace
The other option we were considering was putting the journals on the OS 
SSDs, they are only 250GB and the rest would be for the OS. Is that a 
decent option?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy 
To: "cameron.scr...@solnet.co.nz" , 
"ceph-us...@ceph.com" 
Date:   08/06/2015 09:34 a.m.
Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD 
doable?



Cameron,
Generally, it’s not a good idea. 
You want to protect your SSDs used as journal.If any problem on that disk, 
you will be losing all of your dependent OSDs.
I don’t think a bigger journal will gain you much performance , so, 
default 5 GB journal size should be good enough. If you want to reduce the 
fault domain and want to put 3 journals on a SSD , go for minimum size and 
high endurance SSDs for that.
Now, if you want to use your rest of space of 1 TB ssd, creating just OSDs 
will not gain you much (rather may get some burst performance). You may 
want to consider the following.
 
1. If your spindle OSD size is much bigger than 900 GB , you don’t want to 
make all OSDs of similar sizes, cache pool could be one of your option. 
But, remember, cache pool can wear out your SSDs faster as presently I 
guess it is not optimizing the extra writes. Sorry, I don’t have exact 
data as I am yet to test that out.
 
2. If you want to make all the OSDs of similar sizes and you will be able 
to create a substantial number of OSDs with your unused SSDs (depends on 
how big the cluster is), you may want to put all of your primary OSDs to 
SSD and gain significant performance boost for read. Also, in this case, I 
don’t think you will be getting any burst performance.
 
Thanks & Regards
Somnath
 
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
cameron.scr...@solnet.co.nz
Sent: Sunday, June 07, 2015 1:49 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] Multiple journals and an OSD on one SSD doable?
 
Setting up a Ceph cluster and we want the journals for our spinning disks 
to be on SSDs but all of our SSDs are 1TB. We were planning on putting 3 
journals on each SSD, but that leaves 900+GB unused on the drive, is it 
possible to use the leftover space as another OSD or will it affect 
performance too much? 

Thanks, 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzAttention: This email may contain information intended for 
the sole use of the original recipient. Please respect this when sharing 
or disclosing this email's contents with any third party. If you believe 
you have received this email in error, please delete it and notify the 
sender or postmas...@solnetsolutions.co.nz as soon as possible. The 
content of this email does not necessarily reflect the views of Solnet 
Solutions Ltd. 


PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If 
the reader of this message is not the intended recipient, you are hereby 
notified that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please 
notify the sender by telephone or e-mail (as shown above) immediately and 
destroy any and all copies of this message in your possession (whether 
hard copies or electronically stored copies).



Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmas...@solnetsolutions.co.nz as
soon as possible. The content of this email does not necessarily
reflect the views of Solnet Solutions Ltd.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Paul Evans
Cameron,  Somnath already covered most of these points, but I’ll add my $.02…

The key question to me is this: will these 1TB SSDs perform well as a Journal 
target for Ceph?  They’ll need to be fast at synchronous writes to fill that 
role, and if they aren’t  I would use them for other OSD-related tasks and get 
the right SSDs for the journal workload.  For more thoughts on the matter, read 
below…


  *   1TB Capacity SSDs for journals is certainly overkill..unless the 
underlying SSD controller is able to extend the life span of the SSD by using 
the unallocated portions.  I would normally put the extra 950G of capacity to 
use, either as a cache tier or isolated pool depending on the workload… but 
both of those efforts have their own considerations too, especially regarding 
performance and fault domains, which brings us to...
  *   Performance is going to vary depending on the SSD you have: is it PCIe, 
NVMe, SATA, or SAS?  The connection type and SSD characteristics need to 
sustain the amount of bandwidth and IOPS you need for your workload, especially 
as you’ll be be doing double writes if you use them as both journals and some 
kind of OSD storage (either cache tier or dedicated pool).  Also, do you *know* 
if these SSDs handle writes effectively?  Many SSDs don’t perform well for the 
types of journal writes that Ceph performs.  Somnath already mentioned placing 
the primary OSDs on the spare space - a good way to get a boost in read 
performance if you ceph architecture will support it.
  *   Fault domain is another consideration : the more journals you put on one 
SSD, the larger your fault domain will be.  If you have non-Enterprise SSDs 
this is an important point, as the wrong SSD will die quickly in a busy cluster.


--
Paul


On Jun 7, 2015, at 1:48 PM, 
cameron.scr...@solnet.co.nz wrote:

Setting up a Ceph cluster and we want the journals for our spinning disks to be 
on SSDs but all of our SSDs are 1TB. We were planning on putting 3 journals on 
each SSD, but that leaves 900+GB unused on the drive, is it possible to use the 
leftover space as another OSD or will it affect performance too much?

Thanks,

Cameron Scrace
Infrastructure Engineer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Somnath Roy
Probably not, but if your SSD can sustain high endurance and high BW  it may be 
☺ Also, the amount of data written to the ceph journal partitions will be much 
much higher that your OS partition and that could be a problem for SSD to wear 
level.
Again, I doubt if anybody tried out this scenario earlier, so, everything I 
said is theoretical only.
In any case, you want to make sure the partitions are SSD page aligned 
(generally 4K).


Thanks & Regards
Somnath

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Sunday, June 07, 2015 2:56 PM
To: Somnath Roy
Cc: ceph-us...@ceph.com
Subject: RE: [ceph-users] Multiple journals and an OSD on one SSD doable?

The other option we were considering was putting the journals on the OS SSDs, 
they are only 250GB and the rest would be for the OS. Is that a decent option?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
mailto:somnath@sandisk.com>>
To:"cameron.scr...@solnet.co.nz" 
mailto:cameron.scr...@solnet.co.nz>>, 
"ceph-us...@ceph.com" 
mailto:ceph-us...@ceph.com>>
Date:08/06/2015 09:34 a.m.
Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD doable?




Cameron,
Generally, it’s not a good idea.
You want to protect your SSDs used as journal.If any problem on that disk, you 
will be losing all of your dependent OSDs.
I don’t think a bigger journal will gain you much performance , so, default 5 
GB journal size should be good enough. If you want to reduce the fault domain 
and want to put 3 journals on a SSD , go for minimum size and high endurance 
SSDs for that.
Now, if you want to use your rest of space of 1 TB ssd, creating just OSDs will 
not gain you much (rather may get some burst performance). You may want to 
consider the following.

1. If your spindle OSD size is much bigger than 900 GB , you don’t want to make 
all OSDs of similar sizes, cache pool could be one of your option. But, 
remember, cache pool can wear out your SSDs faster as presently I guess it is 
not optimizing the extra writes. Sorry, I don’t have exact data as I am yet to 
test that out.

2. If you want to make all the OSDs of similar sizes and you will be able to 
create a substantial number of OSDs with your unused SSDs (depends on how big 
the cluster is), you may want to put all of your primary OSDs to SSD and gain 
significant performance boost for read. Also, in this case, I don’t think you 
will be getting any burst performance.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
cameron.scr...@solnet.co.nz
Sent: Sunday, June 07, 2015 1:49 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] Multiple journals and an OSD on one SSD doable?

Setting up a Ceph cluster and we want the journals for our spinning disks to be 
on SSDs but all of our SSDs are 1TB. We were planning on putting 3 journals on 
each SSD, but that leaves 900+GB unused on the drive, is it possible to use the 
leftover space as another OSD or will it affect performance too much?

Thanks,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nzAttention: This email may contain information intended for the 
sole use of the original recipient. Please respect this when sharing or 
disclosing this email's contents with any third party. If you believe you have 
received this email in error, please delete it and notify the sender or 
postmas...@solnetsolutions.co.nz as 
soon as possible. The content of this email does not necessarily reflect the 
views of Solnet Solutions Ltd.





PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

Attention: This email may contain information intended for the sole use of the 
original recipi

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Christian Balzer

Hello,


On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote:

> The other option we were considering was putting the journals on the OS 
> SSDs, they are only 250GB and the rest would be for the OS. Is that a 
> decent option?
>
You'll be getting a LOT better advice if you're telling us more details.

For starters, have you bought the hardware yet?
Tell us about your design, how many initial storage nodes, how many
HDDs/SSDs per node, what CPUs/RAM/network?

What SSDs are we talking about, exact models please.
(Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
aware of)

That said, I'm using Intel DC S3700s for mixed OS and journal use with good
results. 
In your average Ceph storage node, normal OS (logging mostly) activity is a
minute drop in the bucket for any decent SSD, so nearly all of it's
resources are available to journals.

You want to match the number of journals per SSD according to the
capabilities of your SSD, HDDs and network.

For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a
decent match. 
The two SSDs at 900MB/s would appear to be the bottleneck, but in reality
I'd expect the HDDs to be it.
Never mind that you'd be more likely to be IOPS than bandwidth bound.
 
Regards,

Christian

> Thanks!
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Somnath Roy 
> To: "cameron.scr...@solnet.co.nz" , 
> "ceph-us...@ceph.com" 
> Date:   08/06/2015 09:34 a.m.
> Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD 
> doable?
> 
> 
> 
> Cameron,
> Generally, it’s not a good idea. 
> You want to protect your SSDs used as journal.If any problem on that
> disk, you will be losing all of your dependent OSDs.
> I don’t think a bigger journal will gain you much performance , so, 
> default 5 GB journal size should be good enough. If you want to reduce
> the fault domain and want to put 3 journals on a SSD , go for minimum
> size and high endurance SSDs for that.
> Now, if you want to use your rest of space of 1 TB ssd, creating just
> OSDs will not gain you much (rather may get some burst performance). You
> may want to consider the following.
>  
> 1. If your spindle OSD size is much bigger than 900 GB , you don’t want
> to make all OSDs of similar sizes, cache pool could be one of your
> option. But, remember, cache pool can wear out your SSDs faster as
> presently I guess it is not optimizing the extra writes. Sorry, I don’t
> have exact data as I am yet to test that out.
>  
> 2. If you want to make all the OSDs of similar sizes and you will be
> able to create a substantial number of OSDs with your unused SSDs
> (depends on how big the cluster is), you may want to put all of your
> primary OSDs to SSD and gain significant performance boost for read.
> Also, in this case, I don’t think you will be getting any burst
> performance. 
> Thanks & Regards
> Somnath
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> cameron.scr...@solnet.co.nz
> Sent: Sunday, June 07, 2015 1:49 PM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Multiple journals and an OSD on one SSD doable?
>  
> Setting up a Ceph cluster and we want the journals for our spinning
> disks to be on SSDs but all of our SSDs are 1TB. We were planning on
> putting 3 journals on each SSD, but that leaves 900+GB unused on the
> drive, is it possible to use the leftover space as another OSD or will
> it affect performance too much? 
> 
> Thanks, 
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nzAttention: This email may contain information intended
> for the sole use of the original recipient. Please respect this when
> sharing or disclosing this email's contents with any third party. If you
> believe you have received this email in error, please delete it and
> notify the sender or postmas...@solnetsolutions.co.nz as soon as
> possible. The content of this email does not necessarily reflect the
> views of Solnet Solutions Ltd. 
> 
> 
> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above.
> If the reader of this message is not the intended recipient, you are
> hereby notified that you have received this message in error and that
> any review, dissemination, distribution, or copying of this message is
> strictly prohibited. If you have received this communication in error,
> please notify the sender by telephone or e-mail (as shown above)
> immediately and destroy any and all copies of thi

[ceph-users] ceph-deploy | Hammer | RHEL 7.1

2015-06-07 Thread Jerico Revote
Hello,

When trying to deploy ceph mons on our rhel 7 cluster, I get the following 
error:

ceph-deploy mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon 
create-initial
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts gorness
[ceph_deploy.mon][DEBUG ] detecting platform for host gorness ...
[gorness][DEBUG ] connected to host: gorness 
[gorness][DEBUG ] detect platform information from remote host
[gorness][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[gorness][DEBUG ] determining if provided host has same hostname in remote
[gorness][DEBUG ] get remote short hostname
[gorness][DEBUG ] deploying mon to gorness
[gorness][DEBUG ] get remote short hostname
[gorness][DEBUG ] remote hostname: gorness
[gorness][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[gorness][DEBUG ] create the mon path if it does not exist
[gorness][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-gorness/done
[gorness][DEBUG ] create a done file to avoid re-doing the mon deployment
[gorness][DEBUG ] create the init path if it does not exist
[gorness][DEBUG ] locating the `service` executable...
[gorness][INFO  ] Running command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.gorness
[gorness][WARNIN] The service command supports only basic LSB actions (start, 
stop, restart, try-restart, reload, force-reload, status). For other actions, 
please try to use systemctl.
[gorness][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy.mon][ERROR ] Failed to execute command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.gorness
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors

--

ceph-deploy --version
1.5.25

rpm -qa | grep -i ceph
ceph-radosgw-0.94.1-0.el7.x86_64
python-cephfs-0.94.1-0.el7.x86_64
libcephfs1-0.94.1-0.el7.x86_64
ceph-deploy-1.5.25-0.noarch
ceph-common-0.94.1-0.el7.x86_64

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

Regards,

Jerico___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Cameron . Scrace
Hi Christian,

Yes we have purchased all our hardware, was very hard to convince 
management/finance to approve it, so some of the stuff we have is a bit 
cheap.

We have four storage nodes each with 6 x 6TB Western Digital Red SATA 
Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB 
Samsung EVO 850s (for OS raid).
CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
We have a 10Gig Network.

The two options we are considering are:

1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and 
then use the remaining 900+GB of each drive as an OSD to be part of the 
cache pool.

2) Put the spinning disk journals on the OS SSDs and use the 2 1TB SSDs 
for the cache pool.

In both cases the other 4 1TB SSDs will be part of their own tier.

Thanks a lot!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Christian Balzer 
To: "ceph-us...@ceph.com" 
Cc: cameron.scr...@solnet.co.nz
Date:   08/06/2015 12:18 p.m.
Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 
doable?




Hello,


On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote:

> The other option we were considering was putting the journals on the OS 
> SSDs, they are only 250GB and the rest would be for the OS. Is that a 
> decent option?
>
You'll be getting a LOT better advice if you're telling us more details.

For starters, have you bought the hardware yet?
Tell us about your design, how many initial storage nodes, how many
HDDs/SSDs per node, what CPUs/RAM/network?

What SSDs are we talking about, exact models please.
(Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
aware of)

That said, I'm using Intel DC S3700s for mixed OS and journal use with 
good
results. 
In your average Ceph storage node, normal OS (logging mostly) activity is 
a
minute drop in the bucket for any decent SSD, so nearly all of it's
resources are available to journals.

You want to match the number of journals per SSD according to the
capabilities of your SSD, HDDs and network.

For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a
decent match. 
The two SSDs at 900MB/s would appear to be the bottleneck, but in reality
I'd expect the HDDs to be it.
Never mind that you'd be more likely to be IOPS than bandwidth bound.
 
Regards,

Christian

> Thanks!
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Somnath Roy 
> To: "cameron.scr...@solnet.co.nz" , 
> "ceph-us...@ceph.com" 
> Date:   08/06/2015 09:34 a.m.
> Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD 

> doable?
> 
> 
> 
> Cameron,
> Generally, it’s not a good idea. 
> You want to protect your SSDs used as journal.If any problem on that
> disk, you will be losing all of your dependent OSDs.
> I don’t think a bigger journal will gain you much performance , so, 
> default 5 GB journal size should be good enough. If you want to reduce
> the fault domain and want to put 3 journals on a SSD , go for minimum
> size and high endurance SSDs for that.
> Now, if you want to use your rest of space of 1 TB ssd, creating just
> OSDs will not gain you much (rather may get some burst performance). You
> may want to consider the following.
> 
> 1. If your spindle OSD size is much bigger than 900 GB , you don’t want
> to make all OSDs of similar sizes, cache pool could be one of your
> option. But, remember, cache pool can wear out your SSDs faster as
> presently I guess it is not optimizing the extra writes. Sorry, I don’t
> have exact data as I am yet to test that out.
> 
> 2. If you want to make all the OSDs of similar sizes and you will be
> able to create a substantial number of OSDs with your unused SSDs
> (depends on how big the cluster is), you may want to put all of your
> primary OSDs to SSD and gain significant performance boost for read.
> Also, in this case, I don’t think you will be getting any burst
> performance. 
> Thanks & Regards
> Somnath
> 
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 

> cameron.scr...@solnet.co.nz
> Sent: Sunday, June 07, 2015 1:49 PM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Multiple journals and an OSD on one SSD doable?
> 
> Setting up a Ceph cluster and we want the journals for our spinning
> disks to be on SSDs but all of our SSDs are 1TB. We were planning on
> putting 3 journals on each SSD, but that leaves 900+GB unused on the
> drive, is it possible to use the leftover space as another OSD or will
> it affect performance too much? 
> 
> Thanks, 
> 
> Cameron Sc

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Christian Balzer

Hello Cameron,

On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote:

> Hi Christian,
> 
> Yes we have purchased all our hardware, was very hard to convince 
> management/finance to approve it, so some of the stuff we have is a bit 
> cheap.
> 
Unfortunate. Both the done deal and the cheapness. 

> We have four storage nodes each with 6 x 6TB Western Digital Red SATA 
> Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB 
> Samsung EVO 850s (for OS raid).
> CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> We have a 10Gig Network.
>
I wish there was a nice way to say this, but it unfortunately boils down to
a "You're fooked".

There have been many discussions about which SSDs are usable with Ceph,
very recently as well.
Samsung EVOs (the non DC type for sure) are basically unusable for
journals. See the recent thread:
 Possible improvements for a slow write speed (excluding independent SSD 
journals)
and:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
for reference.

I presume your intention for the 1TB SSDs is for a SSD backed pool? 
Note that the EVOs have a pretty low (guaranteed) endurance, so aside from
needing journal SSDs that actually can do the job, you're looking at
wearing them out rather quickly (depending on your use case of course).

Now with SSD based OSDs or even HDD based OSDs with SSD journals that CPU
looks a bit anemic.

More below:
> The two options we are considering are:
> 
> 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and 
> then use the remaining 900+GB of each drive as an OSD to be part of the 
> cache pool.
> 
> 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB SSDs 
> for the cache pool.
> 
Cache pools aren't all that speedy currently (research the ML archives),
even less so with the SSDs you have.

Christian

> In both cases the other 4 1TB SSDs will be part of their own tier.
> 
> Thanks a lot!
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Christian Balzer 
> To: "ceph-us...@ceph.com" 
> Cc: cameron.scr...@solnet.co.nz
> Date:   08/06/2015 12:18 p.m.
> Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 
> doable?
> 
> 
> 
> 
> Hello,
> 
> 
> On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote:
> 
> > The other option we were considering was putting the journals on the
> > OS SSDs, they are only 250GB and the rest would be for the OS. Is that
> > a decent option?
> >
> You'll be getting a LOT better advice if you're telling us more details.
> 
> For starters, have you bought the hardware yet?
> Tell us about your design, how many initial storage nodes, how many
> HDDs/SSDs per node, what CPUs/RAM/network?
> 
> What SSDs are we talking about, exact models please.
> (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
> aware of)
> 
> That said, I'm using Intel DC S3700s for mixed OS and journal use with 
> good
> results. 
> In your average Ceph storage node, normal OS (logging mostly) activity
> is a
> minute drop in the bucket for any decent SSD, so nearly all of it's
> resources are available to journals.
> 
> You want to match the number of journals per SSD according to the
> capabilities of your SSD, HDDs and network.
> 
> For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a
> decent match. 
> The two SSDs at 900MB/s would appear to be the bottleneck, but in reality
> I'd expect the HDDs to be it.
> Never mind that you'd be more likely to be IOPS than bandwidth bound.
>  
> Regards,
> 
> Christian
> 
> > Thanks!
> > 
> > Cameron Scrace
> > Infrastructure Engineer
> > 
> > Mobile +64 22 610 4629
> > Phone  +64 4 462 5085 
> > Email  cameron.scr...@solnet.co.nz
> > Solnet Solutions Limited
> > Level 12, Solnet House
> > 70 The Terrace, Wellington 6011
> > PO Box 397, Wellington 6140
> > 
> > www.solnet.co.nz
> > 
> > 
> > 
> > From:   Somnath Roy 
> > To: "cameron.scr...@solnet.co.nz" , 
> > "ceph-us...@ceph.com" 
> > Date:   08/06/2015 09:34 a.m.
> > Subject:RE: [ceph-users] Multiple journals and an OSD on one
> > SSD 
> 
> > doable?
> > 
> > 
> > 
> > Cameron,
> > Generally, it’s not a good idea. 
> > You want to protect your SSDs used as journal.If any problem on that
> > disk, you will be losing all of your dependent OSDs.
> > I don’t think a bigger journal will gain you much performance , so, 
> > default 5 GB journal size should be good enough. If you want to reduce
> > the fault domain and want to put 3 journals on a SSD , go for minimum
> > size and high endurance SSDs for that.
> > Now, if you want to use your rest of space of 1 TB ssd, creating just
> > OSDs will not gain you much (r

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Christian Balzer

Cameron,

To offer at least some constructive advice here instead of just all doom
and gloom, here's what I'd do:

Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
They have enough BW to nearly saturate your network.

Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
While that's a bad move from a failure domain perspective, your budget
probably won't allow for anything better and those are VERY reliable and
just as important durable SSDs. 

This will give you the speed your current setup is capable of, probably
limited by the CPU when it comes to SSD pool operations.

Christian

On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:

> 
> Hello Cameron,
> 
> On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote:
> 
> > Hi Christian,
> > 
> > Yes we have purchased all our hardware, was very hard to convince 
> > management/finance to approve it, so some of the stuff we have is a
> > bit cheap.
> > 
> Unfortunate. Both the done deal and the cheapness. 
> 
> > We have four storage nodes each with 6 x 6TB Western Digital Red SATA 
> > Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB 
> > Samsung EVO 850s (for OS raid).
> > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > We have a 10Gig Network.
> >
> I wish there was a nice way to say this, but it unfortunately boils down
> to a "You're fooked".
> 
> There have been many discussions about which SSDs are usable with Ceph,
> very recently as well.
> Samsung EVOs (the non DC type for sure) are basically unusable for
> journals. See the recent thread:
>  Possible improvements for a slow write speed (excluding independent SSD
> journals) and:
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> for reference.
> 
> I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> from needing journal SSDs that actually can do the job, you're looking at
> wearing them out rather quickly (depending on your use case of course).
> 
> Now with SSD based OSDs or even HDD based OSDs with SSD journals that CPU
> looks a bit anemic.
> 
> More below:
> > The two options we are considering are:
> > 
> > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and 
> > then use the remaining 900+GB of each drive as an OSD to be part of
> > the cache pool.
> > 
> > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB
> > SSDs for the cache pool.
> > 
> Cache pools aren't all that speedy currently (research the ML archives),
> even less so with the SSDs you have.
> 
> Christian
> 
> > In both cases the other 4 1TB SSDs will be part of their own tier.
> > 
> > Thanks a lot!
> > 
> > Cameron Scrace
> > Infrastructure Engineer
> > 
> > Mobile +64 22 610 4629
> > Phone  +64 4 462 5085 
> > Email  cameron.scr...@solnet.co.nz
> > Solnet Solutions Limited
> > Level 12, Solnet House
> > 70 The Terrace, Wellington 6011
> > PO Box 397, Wellington 6140
> > 
> > www.solnet.co.nz
> > 
> > 
> > 
> > From:   Christian Balzer 
> > To: "ceph-us...@ceph.com" 
> > Cc: cameron.scr...@solnet.co.nz
> > Date:   08/06/2015 12:18 p.m.
> > Subject:Re: [ceph-users] Multiple journals and an OSD on one
> > SSD doable?
> > 
> > 
> > 
> > 
> > Hello,
> > 
> > 
> > On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote:
> > 
> > > The other option we were considering was putting the journals on the
> > > OS SSDs, they are only 250GB and the rest would be for the OS. Is
> > > that a decent option?
> > >
> > You'll be getting a LOT better advice if you're telling us more
> > details.
> > 
> > For starters, have you bought the hardware yet?
> > Tell us about your design, how many initial storage nodes, how many
> > HDDs/SSDs per node, what CPUs/RAM/network?
> > 
> > What SSDs are we talking about, exact models please.
> > (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
> > aware of)
> > 
> > That said, I'm using Intel DC S3700s for mixed OS and journal use with 
> > good
> > results. 
> > In your average Ceph storage node, normal OS (logging mostly) activity
> > is a
> > minute drop in the bucket for any decent SSD, so nearly all of it's
> > resources are available to journals.
> > 
> > You want to match the number of journals per SSD according to the
> > capabilities of your SSD, HDDs and network.
> > 
> > For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a
> > decent match. 
> > The two SSDs at 900MB/s would appear to be the bottleneck, but in
> > reality I'd expect the HDDs to be it.
> > Never mind that you'd be more likely to be IOPS than bandwidth bound.
> >  
> > Regards,
> > 
> > Christian
> > 
> > > Thanks!
> > > 
> > > Cameron Scrace
> > > Infrastructure Engineer
> > > 
> > > Mobile +64 22 610 4629
> > > Phone  +64 4 462 5085 
> > > Email  cameron.scr...@solnet.co.nz
> > > Solnet Solutions Limite

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Cameron . Scrace
Thanks for all the feedback. 

What makes the EVOs unusable? They should have plenty of speed but your 
link has them at 1.9MB/s, is it just the way they handle O_DIRECT and 
D_SYNC? 

Not sure if we will be able to spend anymore, we may just have to take the 
performance hit until we can get more money for the project.

Thanks,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Christian Balzer 
To: "ceph-us...@ceph.com" 
Cc: cameron.scr...@solnet.co.nz
Date:   08/06/2015 02:00 p.m.
Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 
doable?




Cameron,

To offer at least some constructive advice here instead of just all doom
and gloom, here's what I'd do:

Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
They have enough BW to nearly saturate your network.

Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
While that's a bad move from a failure domain perspective, your budget
probably won't allow for anything better and those are VERY reliable and
just as important durable SSDs. 

This will give you the speed your current setup is capable of, probably
limited by the CPU when it comes to SSD pool operations.

Christian

On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:

> 
> Hello Cameron,
> 
> On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote:
> 
> > Hi Christian,
> > 
> > Yes we have purchased all our hardware, was very hard to convince 
> > management/finance to approve it, so some of the stuff we have is a
> > bit cheap.
> > 
> Unfortunate. Both the done deal and the cheapness. 
> 
> > We have four storage nodes each with 6 x 6TB Western Digital Red SATA 
> > Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB 
> > Samsung EVO 850s (for OS raid).
> > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > We have a 10Gig Network.
> >
> I wish there was a nice way to say this, but it unfortunately boils down
> to a "You're fooked".
> 
> There have been many discussions about which SSDs are usable with Ceph,
> very recently as well.
> Samsung EVOs (the non DC type for sure) are basically unusable for
> journals. See the recent thread:
>  Possible improvements for a slow write speed (excluding independent SSD
> journals) and:
> 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

> for reference.
> 
> I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> from needing journal SSDs that actually can do the job, you're looking 
at
> wearing them out rather quickly (depending on your use case of course).
> 
> Now with SSD based OSDs or even HDD based OSDs with SSD journals that 
CPU
> looks a bit anemic.
> 
> More below:
> > The two options we are considering are:
> > 
> > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and 

> > then use the remaining 900+GB of each drive as an OSD to be part of
> > the cache pool.
> > 
> > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB
> > SSDs for the cache pool.
> > 
> Cache pools aren't all that speedy currently (research the ML archives),
> even less so with the SSDs you have.
> 
> Christian
> 
> > In both cases the other 4 1TB SSDs will be part of their own tier.
> > 
> > Thanks a lot!
> > 
> > Cameron Scrace
> > Infrastructure Engineer
> > 
> > Mobile +64 22 610 4629
> > Phone  +64 4 462 5085 
> > Email  cameron.scr...@solnet.co.nz
> > Solnet Solutions Limited
> > Level 12, Solnet House
> > 70 The Terrace, Wellington 6011
> > PO Box 397, Wellington 6140
> > 
> > www.solnet.co.nz
> > 
> > 
> > 
> > From:   Christian Balzer 
> > To: "ceph-us...@ceph.com" 
> > Cc: cameron.scr...@solnet.co.nz
> > Date:   08/06/2015 12:18 p.m.
> > Subject:Re: [ceph-users] Multiple journals and an OSD on one
> > SSD doable?
> > 
> > 
> > 
> > 
> > Hello,
> > 
> > 
> > On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote:
> > 
> > > The other option we were considering was putting the journals on the
> > > OS SSDs, they are only 250GB and the rest would be for the OS. Is
> > > that a decent option?
> > >
> > You'll be getting a LOT better advice if you're telling us more
> > details.
> > 
> > For starters, have you bought the hardware yet?
> > Tell us about your design, how many initial storage nodes, how many
> > HDDs/SSDs per node, what CPUs/RAM/network?
> > 
> > What SSDs are we talking about, exact models please.
> > (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm
> > aware of)
> > 
> > That said, I'm using Intel DC S3700s for mixed OS and journal use with 

> > good
> > results. 
> > In your average Ceph storage node, normal OS (logging m

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Christian Balzer
On Mon, 8 Jun 2015 14:30:17 +1200 cameron.scr...@solnet.co.nz wrote:

> Thanks for all the feedback. 
> 
> What makes the EVOs unusable? They should have plenty of speed but your 
> link has them at 1.9MB/s, is it just the way they handle O_DIRECT and 
> D_SYNC? 
> 
Precisely. 
Read that ML thread for details.

And once more, they also are not very endurable.
So depending on your usage pattern and Ceph (Ceph itself and the
underlying FS) write amplification their TBW/$ will be horrible, costing
you more in the end than more expensive, but an order of magnitude more
endurable DC SSDs. 

> Not sure if we will be able to spend anymore, we may just have to take
> the performance hit until we can get more money for the project.
>
You could cheap out with 200GB DC S3700s (half the price), but they will
definitely become the bottleneck at a combined max speed of about 700MB/s,
as opposed to the 400GB ones at 900MB/s combined.
 
Christian

> Thanks,
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Christian Balzer 
> To: "ceph-us...@ceph.com" 
> Cc: cameron.scr...@solnet.co.nz
> Date:   08/06/2015 02:00 p.m.
> Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 
> doable?
> 
> 
> 
> 
> Cameron,
> 
> To offer at least some constructive advice here instead of just all doom
> and gloom, here's what I'd do:
> 
> Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
> They have enough BW to nearly saturate your network.
> 
> Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
> While that's a bad move from a failure domain perspective, your budget
> probably won't allow for anything better and those are VERY reliable and
> just as important durable SSDs. 
> 
> This will give you the speed your current setup is capable of, probably
> limited by the CPU when it comes to SSD pool operations.
> 
> Christian
> 
> On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:
> 
> > 
> > Hello Cameron,
> > 
> > On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote:
> > 
> > > Hi Christian,
> > > 
> > > Yes we have purchased all our hardware, was very hard to convince 
> > > management/finance to approve it, so some of the stuff we have is a
> > > bit cheap.
> > > 
> > Unfortunate. Both the done deal and the cheapness. 
> > 
> > > We have four storage nodes each with 6 x 6TB Western Digital Red
> > > SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and
> > > 2x250GB Samsung EVO 850s (for OS raid).
> > > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > > We have a 10Gig Network.
> > >
> > I wish there was a nice way to say this, but it unfortunately boils
> > down to a "You're fooked".
> > 
> > There have been many discussions about which SSDs are usable with Ceph,
> > very recently as well.
> > Samsung EVOs (the non DC type for sure) are basically unusable for
> > journals. See the recent thread:
> >  Possible improvements for a slow write speed (excluding independent
> > SSD journals) and:
> > 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> 
> > for reference.
> > 
> > I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> > Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> > from needing journal SSDs that actually can do the job, you're looking 
> at
> > wearing them out rather quickly (depending on your use case of course).
> > 
> > Now with SSD based OSDs or even HDD based OSDs with SSD journals that 
> CPU
> > looks a bit anemic.
> > 
> > More below:
> > > The two options we are considering are:
> > > 
> > > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each)
> > > and 
> 
> > > then use the remaining 900+GB of each drive as an OSD to be part of
> > > the cache pool.
> > > 
> > > 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB
> > > SSDs for the cache pool.
> > > 
> > Cache pools aren't all that speedy currently (research the ML
> > archives), even less so with the SSDs you have.
> > 
> > Christian
> > 
> > > In both cases the other 4 1TB SSDs will be part of their own tier.
> > > 
> > > Thanks a lot!
> > > 
> > > Cameron Scrace
> > > Infrastructure Engineer
> > > 
> > > Mobile +64 22 610 4629
> > > Phone  +64 4 462 5085 
> > > Email  cameron.scr...@solnet.co.nz
> > > Solnet Solutions Limited
> > > Level 12, Solnet House
> > > 70 The Terrace, Wellington 6011
> > > PO Box 397, Wellington 6140
> > > 
> > > www.solnet.co.nz
> > > 
> > > 
> > > 
> > > From:   Christian Balzer 
> > > To: "ceph-us...@ceph.com" 
> > > Cc: cameron.scr...@solnet.co.nz
> > > Date:   08/06/2015 12:18 p.m.
> > > Subject:Re: [ceph-users] Multiple journals and an OSD on one
>

[ceph-users] radosgw sync agent against aws s3

2015-06-07 Thread Blair Bethwaite
Has anyone had any luck using the radosgw-sync-agent to push or pull
to/from "real" S3?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-07 Thread Cameron . Scrace
Just used the method in the link you sent me to test one of the EVO 850s, 
with one job it reached a speed of around 2.5MB/s but it didn't max out 
until around 32 jobs at 24MB/s: 

sudo fio --filename=/dev/sdh --direct=1 --sync=1 --rw=write --bs=4k 
--numjobs=32 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test
write: io=1507.4MB, bw=25723KB/s, iops=6430, runt= 60007msec

Also tested a Micron 550 we had sitting around and it maxed out at 
2.5mb/s, both results conflict with the chart

Regards,

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Christian Balzer 
To: "ceph-us...@ceph.com" 
Cc: cameron.scr...@solnet.co.nz
Date:   08/06/2015 02:40 p.m.
Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 
doable?



On Mon, 8 Jun 2015 14:30:17 +1200 cameron.scr...@solnet.co.nz wrote:

> Thanks for all the feedback. 
> 
> What makes the EVOs unusable? They should have plenty of speed but your 
> link has them at 1.9MB/s, is it just the way they handle O_DIRECT and 
> D_SYNC? 
> 
Precisely. 
Read that ML thread for details.

And once more, they also are not very endurable.
So depending on your usage pattern and Ceph (Ceph itself and the
underlying FS) write amplification their TBW/$ will be horrible, costing
you more in the end than more expensive, but an order of magnitude more
endurable DC SSDs. 

> Not sure if we will be able to spend anymore, we may just have to take
> the performance hit until we can get more money for the project.
>
You could cheap out with 200GB DC S3700s (half the price), but they will
definitely become the bottleneck at a combined max speed of about 700MB/s,
as opposed to the 400GB ones at 900MB/s combined.
 
Christian

> Thanks,
> 
> Cameron Scrace
> Infrastructure Engineer
> 
> Mobile +64 22 610 4629
> Phone  +64 4 462 5085 
> Email  cameron.scr...@solnet.co.nz
> Solnet Solutions Limited
> Level 12, Solnet House
> 70 The Terrace, Wellington 6011
> PO Box 397, Wellington 6140
> 
> www.solnet.co.nz
> 
> 
> 
> From:   Christian Balzer 
> To: "ceph-us...@ceph.com" 
> Cc: cameron.scr...@solnet.co.nz
> Date:   08/06/2015 02:00 p.m.
> Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD 

> doable?
> 
> 
> 
> 
> Cameron,
> 
> To offer at least some constructive advice here instead of just all doom
> and gloom, here's what I'd do:
> 
> Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s).
> They have enough BW to nearly saturate your network.
> 
> Put all your journals on them (3 SSD OSD and 3 HDD OSD per). 
> While that's a bad move from a failure domain perspective, your budget
> probably won't allow for anything better and those are VERY reliable and
> just as important durable SSDs. 
> 
> This will give you the speed your current setup is capable of, probably
> limited by the CPU when it comes to SSD pool operations.
> 
> Christian
> 
> On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote:
> 
> > 
> > Hello Cameron,
> > 
> > On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote:
> > 
> > > Hi Christian,
> > > 
> > > Yes we have purchased all our hardware, was very hard to convince 
> > > management/finance to approve it, so some of the stuff we have is a
> > > bit cheap.
> > > 
> > Unfortunate. Both the done deal and the cheapness. 
> > 
> > > We have four storage nodes each with 6 x 6TB Western Digital Red
> > > SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and
> > > 2x250GB Samsung EVO 850s (for OS raid).
> > > CPUs are Intel Atom C2750  @ 2.40GHz (8 Cores) with 32 GB of RAM. 
> > > We have a 10Gig Network.
> > >
> > I wish there was a nice way to say this, but it unfortunately boils
> > down to a "You're fooked".
> > 
> > There have been many discussions about which SSDs are usable with 
Ceph,
> > very recently as well.
> > Samsung EVOs (the non DC type for sure) are basically unusable for
> > journals. See the recent thread:
> >  Possible improvements for a slow write speed (excluding independent
> > SSD journals) and:
> > 
> 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

> 
> > for reference.
> > 
> > I presume your intention for the 1TB SSDs is for a SSD backed pool? 
> > Note that the EVOs have a pretty low (guaranteed) endurance, so aside
> > from needing journal SSDs that actually can do the job, you're looking 

> at
> > wearing them out rather quickly (depending on your use case of 
course).
> > 
> > Now with SSD based OSDs or even HDD based OSDs with SSD journals that 
> CPU
> > looks a bit anemic.
> > 
> > More below:
> > > The two options we are considering are:
> > > 
> > > 1) Use two of the 1TB SSDs for the spinning disk journals (3 each)
> > > and 
> 
> > > then use the remaining 900+G