Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Dan Van Der Ster
Hi Sage, all,

On 21 May 2014, at 22:02, Sage Weil  wrote:

> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)

Do you have some advice about how to use the snap trim throttle? I saw 
osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to follow 
the original ticket, since it started out as a question about deep scrub 
contending with client IOs, but then at some point you renamed the ticket to 
throttling snap trim. What exactly does snap trim do in the context of RBD 
client? And can you suggest a good starting point for osd_snap_trim_sleep = … ?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Storage

2014-06-04 Thread yalla.gnan.kumar
Hi All,

I have a  ceph storage cluster with four nodes. I have created block storage 
using cinder in openstack and ceph as its storage backend.
So, I see a volume is created in ceph in one of the pools.  But how to get 
information like on which OSD, PG, the volume is created in ?


Thanks
Kumar



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Storage

2014-06-04 Thread Smart Weblications GmbH - Florian Wiessner
Hi,

Am 04.06.2014 14:51, schrieb yalla.gnan.ku...@accenture.com:
> Hi All,
> 
>  
> 
> I have a  ceph storage cluster with four nodes. I have created block storage
> using cinder in openstack and ceph as its storage backend.
> 
> So, I see a volume is created in ceph in one of the pools.  But how to get
> information like on which OSD, PG, the volume is created in ?
> 

Check rbd ls, rbd info / to get block_name_prefix.

rados ls -p to see the objects used.

Normally, ceph stripes rbd images across different objects on different osds, so
the volume is not created in only one osd or one pg.

-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD server alternatives to choose

2014-06-04 Thread Cedric Lemarchand

Le 04/06/2014 03:23, Christian Balzer a écrit :
> On Tue, 03 Jun 2014 18:52:00 +0200 Cedric Lemarchand wrote:
>> Le 03/06/2014 12:14, Christian Balzer a écrit :
>>> A simple way to make 1) and 2) cheaper is to use AMD CPUs, they will do
>>> just fine at half the price with these loads. 
>>> If you're that tight on budget, 64GB RAM will do fine, too.
>> I am interested about this specific thought, could you elaborate how did
>> you determine if such hardware (CPU and RAM) will handle well cases
>> where the cluster goes in rebalancing mode when a node or some OSD goes
>> down ?
> Well, firstly we both read:
> https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
I was not aware of this doc, it enlightens lots of questions I was
having about CPU/RAM consideration.

Thanks for your very exhaustive explanations ;-)

Cheers
>
> And looking at those values a single Opteron 4386 would be more
> than sufficient for both 1) and 2). 
> I'm saying and suggesting a single CPU here to keep things all in one NUMA
> node. 
> AFAIK (I haven't used anything Intel for years) some Intel boards require
> both CPUs in place to use all available interfaces (PCIe buses), so the
> above advice is only for AMD.
> As for RAM, it would be totally overspec'ed with 64GB, but a huge
> pagecache is an immense help for reads and RAM is fairly cheap these days,
> so the more you can afford, the better. 
>
> Secondly experience.
> The above document is pretty much on spot when comes to CPU suggestions in
> combination with OSDs backed by a single HDD (SSD journal or not).
> I think it is overly optimistic when it comes to purely SSD based storage
> nodes or something like my HW RAID backed OSD.
> Remember, when using the 4k fio I could get Ceph to use about 2 cores
> per OSD and then stall on whatever locking contention or other things that
> are going on inside it before actually exhausting all available CPU
> resources. 
> OSDs (journal and backing storage) as well as the network were nowhere
> near getting exhausted.
>
> Compared to that fio run a cluster rebalancing is a breeze, at least when
> it comes to CPU resources needed. 
> It comes in a much more CEPH friendly IO block size and thus exhausts
> either network or disk bandwidth first.
>
>> Because, as Robert stated (and I totally agree with that!), designing a
>> cluster is about the expected performances in optimal conditions, and
>> expected recovery time and nodes loads in non optimal conditions
>> (typically rebalancing), and I found this last point hard to consider
>> and anticipate.
>>
> This is why one builds test clusters and then builds production HW
> clusters with the expectation that it will be twice as bad as anticipated
> from what you saw on the test cluster. ^o^
>
>> As a quick exercise (without taking in consideration FS size overhead
>> ect ...), based on config "1.NG" from Christian (ratio SSD/HDD of 1:3,
>> thus 9x4TB HDD/nodes, 24 nodes) and replication ratio of 2 :
> I would never use a replication of 2 unless I were VERY confident in my
> backing storage devices (either high end and well monitored SSDs or RAIDs).
>
>> - each nodes : ~36TB RAW /~18TB NET
>> - the whole cluster, 864TB RAW / ~432TB NET
>>
>> If a node goes down, ~36TB have to be re balanced between the 23
>> existing, so ~1,6TB have to be read and write on each nodes. I think
>> this is the expected workload of the cluster in rebalancing mode.
>>
>> So 2 questions :
>>
>> * did my maths are good until now ?
> Math is hard, lets go shopping. ^o^
> But yes, given your parameters that looks correct.
>> * where will be the main bottleneck with such configuration and workload
>> (CPU/IO/RAM/NET) ? how calculate it ?
>>
> See above. 
> In the configurations suggested by Benjamin disk IO will be the
> bottleneck, as the network bandwidth is higher than write capacity of the
> SSDs and HDDs. CPU and RAM will not be an issue.
>
> The other thing to consider are the backfilling and/or recovery settings
> in CEPH, these will of course influence how much of an impact a node
> failure (and potential recovery of it) will have.
> Depending on those settings and the cluster load (as in client side) at
> the time of failure the most optimistic number for full recovery of
> redundancy I can come up with is about an hour, in reality it is probably
> going to be substantially longer. 
> And during that time any further disk failure (with over 200 in the
> cluster a pretty decent probability) can result in irrecoverable data loss.
>
> Christian
>> --
>> Cédric
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Cédric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Storage

2014-06-04 Thread Mārtiņš Jakubovičs
Hello,

How can check ceph client session in clients side, for example, when
mount iscsi or nfs, you can check it (nfs just mount, iscsi iscsiadm -m
session), but how can do that with ceph? And is there more detailed
documentation about openstack and ceph than
http://ceph.com/docs/master/rbd/rbd-openstack/?

On 2014.06.04. 16:29, Smart Weblications GmbH - Florian Wiessner wrote:
> Hi,
> 
> Am 04.06.2014 14:51, schrieb yalla.gnan.ku...@accenture.com:
>> Hi All,
>>
>>  
>>
>> I have a  ceph storage cluster with four nodes. I have created block storage
>> using cinder in openstack and ceph as its storage backend.
>>
>> So, I see a volume is created in ceph in one of the pools.  But how to get
>> information like on which OSD, PG, the volume is created in ?
>>
> 
> Check rbd ls, rbd info / to get block_name_prefix.
> 
> rados ls -p to see the objects used.
> 
> Normally, ceph stripes rbd images across different objects on different osds, 
> so
> the volume is not created in only one osd or one pg.
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Sage Weil
On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
> Hi Sage, all,
> 
> On 21 May 2014, at 22:02, Sage Weil  wrote:
> 
> > * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
> 
> Do you have some advice about how to use the snap trim throttle? I saw 
> osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
> follow the original ticket, since it started out as a question about 
> deep scrub contending with client IOs, but then at some point you 
> renamed the ticket to throttling snap trim. What exactly does snap trim 
> do in the context of RBD client? And can you suggest a good starting 
> point for osd_snap_trim_sleep = ? ?

This is a coarse hack to make the snap trimming slow down and let client 
IO run by simply sleeping between work.  I would start with something 
smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
has on request latency.  Unfortunately it's not a very intuitive knob to 
adjust, but it is an interim solution until we figure out how to better 
prioritize this (and other) background work.

In short, if you do see a performance degradation after removing snaps, 
adjust this up or down and see how it changes that.  If you don't see a 
degradation, then you're lucky and don't need to do anything.  :)

You can adjust this on running OSDs with something like 'ceph daemon 
osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
injectargs -- --osd-snap-trim-sleep .01'.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Experiences with Ceph at the June'14 issue of USENIX ; login:

2014-06-04 Thread Filippos Giannakos
Hello Ian,

Thanks for your interest.

On Mon, Jun 02, 2014 at 06:37:48PM -0400, Ian Colle wrote:
> Thanks, Filippos! Very interesting reading.
> 
> Are you comfortable enough yet to remove the RAID-1 from your architecture and
> get all that space back?

Actually, we are not ready to do that yet. There are three major things to
consider.

First, to be able to get rid of the RAID-1 setup, we need to increase the
replication level to at least 3x. So the space gain is not that great to begin
with.

Second, this operation can take about a month for our scale according to our
calculations and previous experience. During this period of increased I/O we
might get peaks of performance degradation. Plus, we currently do not have the
necessary hardware available to increase the replication level before we get rid
of the RAID setup.

Third, we have a few disk failures per month. The RAID-1 setup has allowed us to
seamlessly replace them without any hiccup or even a clue to the end user that
something went wrong. Surely we can rely on RADOS to avoid any data loss, but if
we currently rely on RADOS for recovery there might be some (minor) performance
degradation, especially for the VM I/O traffic.

Kind Regards,
-- 
Filippos

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW: Multi Part upload and resulting objects

2014-06-04 Thread Sylvain Munaut
Hi,


During a multi part upload you can't upload parts smaller than 5M, and
radosgw also slices object in slices of 4M. Having those two being
different is a bit unfortunate because if you slice your files in the
minimum chunk size you end up with a main file of 4M and a shadowfile
of 1M for each part ...


Would it make sense to allow either multipart upload of 4M, or to rise
the slice size to something more than 4M (4M or 8M if you want power
of 2) ?


Cheers,

   Sylvain
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Dan Van Der Ster
On 04 Jun 2014, at 16:06, Sage Weil  wrote:

> On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
>> Hi Sage, all,
>> 
>> On 21 May 2014, at 22:02, Sage Weil  wrote:
>> 
>>> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
>> 
>> Do you have some advice about how to use the snap trim throttle? I saw 
>> osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
>> follow the original ticket, since it started out as a question about 
>> deep scrub contending with client IOs, but then at some point you 
>> renamed the ticket to throttling snap trim. What exactly does snap trim 
>> do in the context of RBD client? And can you suggest a good starting 
>> point for osd_snap_trim_sleep = ? ?
> 
> This is a coarse hack to make the snap trimming slow down and let client 
> IO run by simply sleeping between work.  I would start with something 
> smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
> has on request latency.  Unfortunately it's not a very intuitive knob to 
> adjust, but it is an interim solution until we figure out how to better 
> prioritize this (and other) background work.
> 

Thanks Sage. Is this delay applied per object being removed or at some higher 
granularity?

And BTW, I was also curious why you’ve only added a throttle to the snap trim 
ops. Are object/rbd/pg/pool deletions somehow less disruptive to client IOs?

Cheers, Dan

> In short, if you do see a performance degradation after removing snaps, 
> adjust this up or down and see how it changes that.  If you don't see a 
> degradation, then you're lucky and don't need to do anything.  :)
> 
> You can adjust this on running OSDs with something like 'ceph daemon 
> osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> injectargs -- --osd-snap-trim-sleep .01'.
> 
> sage
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Sage Weil
On Wed, 4 Jun 2014, Andrey Korolyov wrote:
> On 06/04/2014 06:06 PM, Sage Weil wrote:
> > On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
> >> Hi Sage, all,
> >>
> >> On 21 May 2014, at 22:02, Sage Weil  wrote:
> >>
> >>> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
> >>
> >> Do you have some advice about how to use the snap trim throttle? I saw 
> >> osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
> >> follow the original ticket, since it started out as a question about 
> >> deep scrub contending with client IOs, but then at some point you 
> >> renamed the ticket to throttling snap trim. What exactly does snap trim 
> >> do in the context of RBD client? And can you suggest a good starting 
> >> point for osd_snap_trim_sleep = ? ?
> > 
> > This is a coarse hack to make the snap trimming slow down and let client 
> > IO run by simply sleeping between work.  I would start with something 
> > smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
> > has on request latency.  Unfortunately it's not a very intuitive knob to 
> > adjust, but it is an interim solution until we figure out how to better 
> > prioritize this (and other) background work.
> > 
> > In short, if you do see a performance degradation after removing snaps, 
> > adjust this up or down and see how it changes that.  If you don't see a 
> > degradation, then you're lucky and don't need to do anything.  :)
> > 
> > You can adjust this on running OSDs with something like 'ceph daemon 
> > osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> > injectargs -- --osd-snap-trim-sleep .01'.
> > 
> > sage
> > 
> 
> Hi,
> 
> we had the same mechanism for almost a half of year and it working nice
> except cases when multiple background snap deletions are hitting their
> ends - latencies may spike not regarding very large sleep gap for snap
> operations. Do you have any thoughts on reducing this particular impact?

This isn't ringing any bells.  If this is somethign you can reproduce with 
osd logging enabled we should be able to tell what is causing the spike, 
though...

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Sage Weil
On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
> On 04 Jun 2014, at 16:06, Sage Weil  wrote:
> 
> > On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
> >> Hi Sage, all,
> >> 
> >> On 21 May 2014, at 22:02, Sage Weil  wrote:
> >> 
> >>> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
> >> 
> >> Do you have some advice about how to use the snap trim throttle? I saw 
> >> osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
> >> follow the original ticket, since it started out as a question about 
> >> deep scrub contending with client IOs, but then at some point you 
> >> renamed the ticket to throttling snap trim. What exactly does snap trim 
> >> do in the context of RBD client? And can you suggest a good starting 
> >> point for osd_snap_trim_sleep = ? ?
> > 
> > This is a coarse hack to make the snap trimming slow down and let client 
> > IO run by simply sleeping between work.  I would start with something 
> > smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
> > has on request latency.  Unfortunately it's not a very intuitive knob to 
> > adjust, but it is an interim solution until we figure out how to better 
> > prioritize this (and other) background work.
> > 
> 
> Thanks Sage. Is this delay applied per object being removed or at some 
> higher granularity?

Per object.

> And BTW, I was also curious why you?ve only added a throttle to the snap 
> trim ops. Are object/rbd/pg/pool deletions somehow less disruptive to 
> client IOs?

Other deletions are client IOs.  Snap deletions are one of the few 
operations that are driven by the OSD and thus need their own throttling.  
FWIW, I think the plan going forward is to create ops for these internally 
so that the go through the same queues and prioritization as client 
requests.

sage


> 
> Cheers, Dan
> 
> > In short, if you do see a performance degradation after removing snaps, 
> > adjust this up or down and see how it changes that.  If you don't see a 
> > degradation, then you're lucky and don't need to do anything.  :)
> > 
> > You can adjust this on running OSDs with something like 'ceph daemon 
> > osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> > injectargs -- --osd-snap-trim-sleep .01'.
> > 
> > sage
> > 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Sage Weil
On Wed, 4 Jun 2014, Andrey Korolyov wrote:
> On 06/04/2014 07:22 PM, Sage Weil wrote:
> > On Wed, 4 Jun 2014, Andrey Korolyov wrote:
> >> On 06/04/2014 06:06 PM, Sage Weil wrote:
> >>> On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
>  Hi Sage, all,
> 
>  On 21 May 2014, at 22:02, Sage Weil  wrote:
> 
> > * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
> 
>  Do you have some advice about how to use the snap trim throttle? I saw 
>  osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
>  follow the original ticket, since it started out as a question about 
>  deep scrub contending with client IOs, but then at some point you 
>  renamed the ticket to throttling snap trim. What exactly does snap trim 
>  do in the context of RBD client? And can you suggest a good starting 
>  point for osd_snap_trim_sleep = ? ?
> >>>
> >>> This is a coarse hack to make the snap trimming slow down and let client 
> >>> IO run by simply sleeping between work.  I would start with something 
> >>> smallish (.01 = 10ms) after deleting some snapshots and see what effect 
> >>> it 
> >>> has on request latency.  Unfortunately it's not a very intuitive knob to 
> >>> adjust, but it is an interim solution until we figure out how to better 
> >>> prioritize this (and other) background work.
> >>>
> >>> In short, if you do see a performance degradation after removing snaps, 
> >>> adjust this up or down and see how it changes that.  If you don't see a 
> >>> degradation, then you're lucky and don't need to do anything.  :)
> >>>
> >>> You can adjust this on running OSDs with something like 'ceph daemon 
> >>> osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> >>> injectargs -- --osd-snap-trim-sleep .01'.
> >>>
> >>> sage
> >>>
> >>
> >> Hi,
> >>
> >> we had the same mechanism for almost a half of year and it working nice
> >> except cases when multiple background snap deletions are hitting their
> >> ends - latencies may spike not regarding very large sleep gap for snap
> >> operations. Do you have any thoughts on reducing this particular impact?
> > 
> > This isn't ringing any bells.  If this is somethign you can reproduce with 
> > osd logging enabled we should be able to tell what is causing the spike, 
> > though...
> > 
> > sage
> > 
> 
> Ok, would 10 be enough there? On 20, all timings most likely to be
> distorted by logging operations even for tmpfs.

Yeah, debug osd = 20 and debug ms = 1 should be sufficient.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Sage Weil
On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
> On 04 Jun 2014, at 16:06, Sage Weil  wrote:
> 
> > You can adjust this on running OSDs with something like 'ceph daemon 
> > osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> > injectargs -- --osd-snap-trim-sleep .01'.
> 
> Thanks, trying that now.
> 
> I noticed that using = 0.01 in ceph.conf it gets parsed as 0, whereas 
> .01 is parsed correctly. Known bug?

Nope!  Do you mind filing a ticket at tracker.ceph.com?

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Andrey Korolyov
On 06/04/2014 06:06 PM, Sage Weil wrote:
> On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
>> Hi Sage, all,
>>
>> On 21 May 2014, at 22:02, Sage Weil  wrote:
>>
>>> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)
>>
>> Do you have some advice about how to use the snap trim throttle? I saw 
>> osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
>> follow the original ticket, since it started out as a question about 
>> deep scrub contending with client IOs, but then at some point you 
>> renamed the ticket to throttling snap trim. What exactly does snap trim 
>> do in the context of RBD client? And can you suggest a good starting 
>> point for osd_snap_trim_sleep = ? ?
> 
> This is a coarse hack to make the snap trimming slow down and let client 
> IO run by simply sleeping between work.  I would start with something 
> smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
> has on request latency.  Unfortunately it's not a very intuitive knob to 
> adjust, but it is an interim solution until we figure out how to better 
> prioritize this (and other) background work.
> 
> In short, if you do see a performance degradation after removing snaps, 
> adjust this up or down and see how it changes that.  If you don't see a 
> degradation, then you're lucky and don't need to do anything.  :)
> 
> You can adjust this on running OSDs with something like 'ceph daemon 
> osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> injectargs -- --osd-snap-trim-sleep .01'.
> 
> sage
> 

Hi,

we had the same mechanism for almost a half of year and it working nice
except cases when multiple background snap deletions are hitting their
ends - latencies may spike not regarding very large sleep gap for snap
operations. Do you have any thoughts on reducing this particular impact?


> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Multi Part upload and resulting objects

2014-06-04 Thread Gregory Farnum
On Wed, Jun 4, 2014 at 7:58 AM, Sylvain Munaut
 wrote:
> Hi,
>
>
> During a multi part upload you can't upload parts smaller than 5M, and
> radosgw also slices object in slices of 4M. Having those two being
> different is a bit unfortunate because if you slice your files in the
> minimum chunk size you end up with a main file of 4M and a shadowfile
> of 1M for each part ...
>
>
> Would it make sense to allow either multipart upload of 4M, or to rise
> the slice size to something more than 4M (4M or 8M if you want power
> of 2) ?

Huh. We took the 5MB limit from S3, but it definitely is unfortunate
in combination with our 4MB chunking. You can change the default slice
size using a config option, though. I believe you want to change
rgw_obj_stripe_size (default: 4 << 20). There might be some other
considerations around the initial 512KB "head" objects,
though...Yehuda?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Andrey Korolyov
On 06/04/2014 07:22 PM, Sage Weil wrote:
> On Wed, 4 Jun 2014, Andrey Korolyov wrote:
>> On 06/04/2014 06:06 PM, Sage Weil wrote:
>>> On Wed, 4 Jun 2014, Dan Van Der Ster wrote:
 Hi Sage, all,

 On 21 May 2014, at 22:02, Sage Weil  wrote:

> * osd: allow snap trim throttling with simple delay (#6278, Sage Weil)

 Do you have some advice about how to use the snap trim throttle? I saw 
 osd_snap_trim_sleep, which is still 0 by default. But I didn't manage to 
 follow the original ticket, since it started out as a question about 
 deep scrub contending with client IOs, but then at some point you 
 renamed the ticket to throttling snap trim. What exactly does snap trim 
 do in the context of RBD client? And can you suggest a good starting 
 point for osd_snap_trim_sleep = ? ?
>>>
>>> This is a coarse hack to make the snap trimming slow down and let client 
>>> IO run by simply sleeping between work.  I would start with something 
>>> smallish (.01 = 10ms) after deleting some snapshots and see what effect it 
>>> has on request latency.  Unfortunately it's not a very intuitive knob to 
>>> adjust, but it is an interim solution until we figure out how to better 
>>> prioritize this (and other) background work.
>>>
>>> In short, if you do see a performance degradation after removing snaps, 
>>> adjust this up or down and see how it changes that.  If you don't see a 
>>> degradation, then you're lucky and don't need to do anything.  :)
>>>
>>> You can adjust this on running OSDs with something like 'ceph daemon 
>>> osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
>>> injectargs -- --osd-snap-trim-sleep .01'.
>>>
>>> sage
>>>
>>
>> Hi,
>>
>> we had the same mechanism for almost a half of year and it working nice
>> except cases when multiple background snap deletions are hitting their
>> ends - latencies may spike not regarding very large sleep gap for snap
>> operations. Do you have any thoughts on reducing this particular impact?
> 
> This isn't ringing any bells.  If this is somethign you can reproduce with 
> osd logging enabled we should be able to tell what is causing the spike, 
> though...
> 
> sage
> 

Ok, would 10 be enough there? On 20, all timings most likely to be
distorted by logging operations even for tmpfs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Multi Part upload and resulting objects

2014-06-04 Thread Yehuda Sadeh
On Wed, Jun 4, 2014 at 8:49 AM, Gregory Farnum  wrote:
> On Wed, Jun 4, 2014 at 7:58 AM, Sylvain Munaut
>  wrote:
>> Hi,
>>
>>
>> During a multi part upload you can't upload parts smaller than 5M, and
>> radosgw also slices object in slices of 4M. Having those two being
>> different is a bit unfortunate because if you slice your files in the
>> minimum chunk size you end up with a main file of 4M and a shadowfile
>> of 1M for each part ...
>>
>>
>> Would it make sense to allow either multipart upload of 4M, or to rise
>> the slice size to something more than 4M (4M or 8M if you want power
>> of 2) ?
>
> Huh. We took the 5MB limit from S3, but it definitely is unfortunate
> in combination with our 4MB chunking. You can change the default slice
> size using a config option, though. I believe you want to change
> rgw_obj_stripe_size (default: 4 << 20). There might be some other
> considerations around the initial 512KB "head" objects,
> though...Yehuda?

The head object size is unrelated to the stripe size and changing the
stripe size wouldn't affect it. For large uploads the head size is
negligible, so I don't see it as any concern.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.67.9 Dumpling released

2014-06-04 Thread Dan Van Der Ster
On 04 Jun 2014, at 16:06, Sage Weil  wrote:

> You can adjust this on running OSDs with something like 'ceph daemon 
> osd.NN config set osd_snap_trim_sleep .01' or with 'ceph tell osd.* 
> injectargs -- --osd-snap-trim-sleep .01'.

Thanks, trying that now.

I noticed that using = 0.01 in ceph.conf it gets parsed as 0, whereas .01 is 
parsed correctly. Known bug?

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados benchmark is fast, but dd result on guest vm still slow?

2014-06-04 Thread Christian Balzer

Hello,

On Wed, 4 Jun 2014 22:36:00 +0800 Indra Pramana wrote:

> Hi Christian,
> 
> Good day to you, and thank you for your reply.
> 
> Just now I managed to identify 3 more OSDs which were slow and needed to
> be trimmed. Here is a longer (1 minute) result of rados bench after the
> trimming:
> 
This is the second time I see you mentioning needing to trim OSDs.
Does that mean your actual storage is on SDDs? 
If only your journals are on SSDs (and nothing else) a trim (how do you
trim them?) should have no effect at all.

> http://pastebin.com/YFTbLyHA
> 

You should run atop on all of your storage nodes and watch all OSD disks
when your cluster stalls. If you have too many nodes/OSDs to watch them
all at the same time, use the logging functionality of atop (probably with
a lower interval than the standard 10 seconds) and review things after a
bench run.
I have a hard time believing that your entire cluster just stopped
processing things for 10 seconds there, but I bet an OSD or node stalled.

> 
>  Total time run: 69.441936
> Total writes made:  3773
> Write size: 4096000
> Bandwidth (MB/sec): 212.239
> 
> Stddev Bandwidth:   247.672
> Max bandwidth (MB/sec): 921.875
> Min bandwidth (MB/sec): 0
> Average Latency:0.58602
> Stddev Latency: 2.39341
> Max latency:32.1121
> Min latency:0.04847
> 
> 
> When I run this for 60 seconds, I noted some slow requests message when I
> monitor using ceph -w, near the end of the 60-second period.
> 
> I have verified that all OSDs have I/O speed of > 220 MB/s after I
> trimmed the remaining slow ones just now. I noted that some SSDs are
> having 250 MB/s of I/O speed when I take it out of cluster, but then
> drop to 150 MB/s -ish after I put back into the cluster.
> 
Also having to trim SSDs to regain performance suggests that you probably
aren't using Intel DC ones.
Some (most really)) SSDs are known to have massive delays (latencies) when
having to do a garbage collection or other internal functions.

> Could it be due to the latency? You mentioned that average latency of 0.5
> is pretty horrible. How can I find what contributes to the latency and
> how to fix the problem? Really at loss now. :(
> 
The latency is a combination of all delays, in your case I'm sure it is
storage related.

Christian

> Looking forward to your reply, thank you.
> 
> Cheers.
> 
> 
> 
> 
> On Mon, Jun 2, 2014 at 4:56 PM, Christian Balzer  wrote:
> 
> >
> > Hello,
> >
> > On Mon, 2 Jun 2014 16:15:22 +0800 Indra Pramana wrote:
> >
> > > Dear all,
> > >
> > > I have managed to identify some slow OSDs and journals and have since
> > > replaced them. RADOS benchmark of the whole cluster is now fast, much
> > > improved from last time, showing the cluster can go up to 700+ MB/s.
> > >
> > > =
> > >  Maintaining 16 concurrent writes of 4194304 bytes for up to 10
> > > seconds or 0 objects
> > >  Object prefix: benchmark_data_hv-kvm-01_6931
> > >sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
> > > avg lat 0   0 0 0 0 0
> > > - 0 1  16   214   198   791.387   792
> > > 0.260687 0.074689 2  16   275   259   517.721   244
> > > 0.079697 0.0861397 3  16   317   301   401.174
> > > 168  0.209022 0.115348 4  16   317   301
> > > 300.902 0 - 0.115348 5  16   356   340
> > > 271.92478  0.040032 0.172452 6  16   389   373
> > > 248.604   132  0.038983 0.221213 7  16   411   395
> > > 225.66288  0.048462 0.211686 8  16   441   425
> > > 212.454   120  0.048722 0.237671 9  16   474   458
> > > 203.513   132  0.041285 0.226825 10  16   504
> > > 488   195.161   120  0.041899 0.224044 11  16
> > > 505   489   177.784 4  0.622238 0.224858 12
> > > 16   505   489162.97 0 - 0.224858 Total
> > > time run: 12.142654 Total writes made:  505
> > > Write size: 4194304
> > > Bandwidth (MB/sec): 166.356
> > >
> > > Stddev Bandwidth:   208.41
> > > Max bandwidth (MB/sec): 792
> > > Min bandwidth (MB/sec): 0
> > > Average Latency:0.384178
> > > Stddev Latency: 1.10504
> > > Max latency:9.64224
> > > Min latency:0.031679
> > > =
> > >
> > This might be better than the last result, but it still shows the same
> > massive variance in latency and a pretty horrible average latency.
> >
> > Also you want to run this test for a lot longer, looking at the
> > bandwidth progression it seems to drop over time.
> > I'd expect the sustained bandwidth over a minute or so be below
> > 100MB/s.
> >
> >
> > > However, dd test result on guest VM is still slow.
> > >
> > > =
> > > root@test1# dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync
> > > oflag=direct
> > > 256+0 records in
> > > 256+0 recor

Re: [ceph-users] rados benchmark is fast, but dd result on guest vm still slow?

2014-06-04 Thread Christian Balzer

Hello,

On Wed, 4 Jun 2014 23:46:33 +0800 Indra Pramana wrote:

> Hi Christian,
> 
> In addition to my previous email, I realised that if I use dd with 4M
> block size, I can get higher speed.
> 
> root@Ubuntu-12043-64bit:/data# dd bs=4M count=128 if=/dev/zero of=test4
> conv=fdatasync oflag=direct
> 128+0 records in
> 128+0 records out
> 536870912 bytes (537 MB) copied, 5.68378 s, 94.5 MB/s
> 
> compared to:
> 
> root@Ubuntu-12043-64bit:/data# dd bs=1M count=512 if=/dev/zero of=test8
> conv=fdatasync oflag=direct
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 8.91133 s, 60.2 MB/s
> 
That's what I told you. An even bigger impact than saw here.

> But still, the difference is still very big. With 4M block size, I can
> get 400 MB/s average I/O speed (max 1,000 MB/s) using rados bench, but
> only 90 MB/s average using dd on guest VM. I am wondering if there are
> any "throttling" settings which prevent the guest VM to get the full I/O
> speed the Ceph cluster provides.
>
Not really, no. 
However despite the identical block size now, you are still using 2
different tools and thus comparing apples to oranges.
rados bench by default starts 16 threads, doesn't have to deal with any
inefficiencies of the VM layers and neither with a filesystem.
The dd on the other hand is in the VM, writes to a filesystem and most of
all is single threaded. 

If I run a dd I get about half the speed of rados bench, running 2 in
parallel on different VMs gets things to 80%, etc. 
 
> With regards to the VM user space, kernel space that you mentioned, can
> you elaborate more on what do you mean by that? We are using CloudStack
> and KVM hypervisor, using libvirt to connect to Ceph RBD.
> 
So probably userspace RBD, I don't really know Cloudstack though.

What I was suggesting is mapping and then mounting a (new) RBD image to a
host (kernelspace), formatting it with the same FS type as your VM and
then run the dd on it. 
Not a perfect match due to kernel versus user space, but a lot closer than
bench versus dd.

Christian

> Looking forward to your reply, thank you.
> 
> Cheers.
> 
> 
> 
> On Wed, Jun 4, 2014 at 10:36 PM, Indra Pramana  wrote:
> 
> > Hi Christian,
> >
> > Good day to you, and thank you for your reply.
> >
> > Just now I managed to identify 3 more OSDs which were slow and needed
> > to be trimmed. Here is a longer (1 minute) result of rados bench after
> > the trimming:
> >
> > http://pastebin.com/YFTbLyHA
> >
> > 
> >  Total time run: 69.441936
> > Total writes made:  3773
> > Write size: 4096000
> > Bandwidth (MB/sec): 212.239
> >
> > Stddev Bandwidth:   247.672
> > Max bandwidth (MB/sec): 921.875
> > Min bandwidth (MB/sec): 0
> > Average Latency:0.58602
> > Stddev Latency: 2.39341
> > Max latency:32.1121
> > Min latency:0.04847
> > 
> >
> > When I run this for 60 seconds, I noted some slow requests message
> > when I monitor using ceph -w, near the end of the 60-second period.
> >
> > I have verified that all OSDs have I/O speed of > 220 MB/s after I
> > trimmed the remaining slow ones just now. I noted that some SSDs are
> > having 250 MB/s of I/O speed when I take it out of cluster, but then
> > drop to 150 MB/s -ish after I put back into the cluster.
> >
> > Could it be due to the latency? You mentioned that average latency of
> > 0.5 is pretty horrible. How can I find what contributes to the latency
> > and how to fix the problem? Really at loss now. :(
> >
> > Looking forward to your reply, thank you.
> >
> > Cheers.
> >
> >
> >
> >
> > On Mon, Jun 2, 2014 at 4:56 PM, Christian Balzer  wrote:
> >
> >>
> >> Hello,
> >>
> >> On Mon, 2 Jun 2014 16:15:22 +0800 Indra Pramana wrote:
> >>
> >> > Dear all,
> >> >
> >> > I have managed to identify some slow OSDs and journals and have
> >> > since replaced them. RADOS benchmark of the whole cluster is now
> >> > fast, much improved from last time, showing the cluster can go up
> >> > to 700+ MB/s.
> >> >
> >> > =
> >> >  Maintaining 16 concurrent writes of 4194304 bytes for up to 10
> >> > seconds or 0 objects
> >> >  Object prefix: benchmark_data_hv-kvm-01_6931
> >> >sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
> >> > avg lat 0   0 0 0 0 0 -
> >> 0
> >> >  1  16   214   198   791.387   792  0.260687
> >> > 0.074689 2  16   275   259   517.721   244  0.079697
> >> > 0.0861397 3  16   317   301   401.174   168
> >> > 0.209022 0.115348 4  16   317   301   300.902
> >> > 0 - 0.115348 5  16   356   340   271.924
> >> > 78  0.040032 0.172452 6  16   389   373   248.604
> >> > 132  0.038983 0.221213 7  16   411   395
> >> > 225.66288  0.048462 0.211686 8  16   441
> >> > 425   212.454   120  0.048722 0.237671 9  16
> >> > 474   458   203.513   132  0.041285 0.226

[ceph-users] osd down/out problem

2014-06-04 Thread Cao, Buddy
Hi,

some of the osds in my env continues to try to connect to monitors/ceph nodes, 
but get connection refused and down/out. It even worse when I try to initialize 
100+ osds (800G HDD for each osd), most of the osds would run into the same 
problem to connect to monitor. I checked the monitor status, it looks good, 
there are no monitors down, I also disabled iptalbes and selinux, set " max 
open files = 131072" in ceph.conf. Could you let me know what else I should do 
to fix the problem?

BTW, for now I have 3 monitors in ceph cluster, and all of them are in good 
status.

Osd log - 
-4633> 2014-06-03 10:37:55.359873 7fa894c2c7a0 10 monclient(hunting): 
-4633> auth_supported 2 method cephx
 -4632> 2014-06-03 10:37:55.360055 7fa894c2c7a0  2 auth: KeyRing::load: loaded 
key file /etc/ceph/keyring.osd.0  -4631> 2014-06-03 10:37:55.360607 
7fa894c2c7a0  5 asok(0x2660230) register_command objecter_requests hook 
0x2610190  -4630> 2014-06-03 10:37:55.360620 7fa87f4fa700  5 osd.0 0 heartbeat: 
osd_stat(33016 kB used, 837 GB avail, 837 GB total, peers []/[] op hist [])  
-4629> 2014-06-03 10:37:55.360679 7fa894c2c7a0 10 monclient(hunting): 
renew_subs  -4628> 2014-06-03 10:37:55.360694 7fa894c2c7a0 10 
monclient(hunting): _reopen_session rank -1 name  -4627> 2014-06-03 
10:37:55.360779 7fa894c2c7a0 10 monclient(hunting): picked mon.0 con 0x269dc20 
addr 192.168.50.11:6789/0  -4626> 2014-06-03 10:37:55.360804 7fa894c2c7a0 10 
monclient(hunting): _send_mon_message to mon.0 at 192.168.50.11:6789/0  -4625> 
2014-06-03 10:37:55.360814 7fa894c2c7a0  1 -- 192.168.50.11:6800/7283 --> 
192.168.50.11:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x2668900 con 
0x269dc20  -4624> 2014-06-
 03 10:37:55.360835 7fa894c2c7a0 10 monclient(hunting): renew_subs  -4623> 
2014-06-03 10:37:55.360904 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).connect error 192.168.50.11:6789/0, (111) Connection refused  
-4622> 2014-06-03 10:37:55.360980 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).fault (111) Connection refused  -4621> 2014-06-03 10:37:55.361007 
7fa87d4f6700  0 -- 192.168.50.11:6800/7283 >> 192.168.50.11:6789/0 
pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 c=0x269dc20).fault  -4620> 
2014-06-03 10:37:55.361072 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).connect error 192.168.50.11:6789/0, (111) Connection refused  
-4619> 2014-06-03 10:37:55.361101 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pg
 s=0 cs=0 l=1 c=0x269dc20).fault (111) Connection refused  -4618> 2014-06-03 
10:37:55.561290 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).connect error 192.168.50.11:6789/0, (111) Connection refused  
-4617> 2014-06-03 10:37:55.561384 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).fault (111) Connection refused  -4616> 2014-06-03 10:37:55.961583 
7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 192.168.50.11:6789/0 
pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 c=0x269dc20).connect error 
192.168.50.11:6789/0, (111) Connection refused  -4615> 2014-06-03 
10:37:55.961641 7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 
192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 pgs=0 cs=0 l=1 
c=0x269dc20).fault (111) Connection refused  -4614> 2014-06-03 10:37:56.761838 
7fa87d4f6700  2 -- 192.168.50.11:6800/7283 >> 192.168.50.11:6789/0 
pipe(0x27b8000
  sd=25 :0 s=1 pgs=0 cs=0 l=1 c=0x269dc20).connect error 192.168.50.11:6789/0, 
(111) Connection refused  -4613> 2014-06-03 10:37:56.761904 7fa87d4f6700  2 -- 
192.168.50.11:6800/7283 >> 192.168.50.11:6789/0 pipe(0x27b8000 sd=25 :0 s=1 
pgs=0 cs=0 l=1 c=0x269dc20).fault (111) Connection refused

..

-3482> 2014-06-03 10:40:37.377272 7fa882d01700 10 monclient(hunting): 
-3482> tick
 -3481> 2014-06-03 10:40:37.377286 7fa882d01700  1 monclient(hunting): 
continuing hunt  -3480> 2014-06-03 10:40:37.377288 7fa882d01700 10 
monclient(hunting): _reopen_session rank -1 name  -3479> 2014-06-03 
10:40:37.377294 7fa882d01700  1 -- 192.168.50.11:6800/7283 mark_down 0x269dc20 
-- 0x27b8780  -3478> 2014-06-03 10:40:37.377376 7fa882d01700 10 
monclient(hunting): picked mon.2 con 0x269f380 addr 192.168.50.13:6789/0  
-3477> 2014-06-03 10:40:37.377401 7fa882d01700 10 monclient(hunting): 
_send_mon_message to mon.2 at 192.168.50.13:6789/0  -3476> 2014-06-03 
10:40:37.377405 7fa882d01700  1 -- 192.168.50.11:6800/7283 --> 
192.168.50.13:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x266a880 con 
0x269f380  -3475> 2014-06-03 10:40:37.377415 7fa882d01700 10 
monclient(hunting): renew_subs  -3474> 2014-06-03 10:40:37.377387 7fa87c3f3700  
2 -- 192.168.50.11:6800/7

[ceph-users] Storage

2014-06-04 Thread yalla.gnan.kumar
Hi All,

I have a  ceph storage cluster with four nodes. I have created block storage 
using cinder in openstack and ceph as its storage backend.
So, I see a volume is created in ceph in one of the pools.  But how to get 
information like on which OSD, PG, the volume is created in ?


Thanks
Kumar



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Experiences with Ceph at the June'14 issue of USENIX ; login:

2014-06-04 Thread Christian Balzer

Hello Filippos,

On Wed, 4 Jun 2014 17:22:35 +0300 Filippos Giannakos wrote:

> Hello Ian,
> 
> Thanks for your interest.
> 
> On Mon, Jun 02, 2014 at 06:37:48PM -0400, Ian Colle wrote:
> > Thanks, Filippos! Very interesting reading.
> > 
> > Are you comfortable enough yet to remove the RAID-1 from your
> > architecture and get all that space back?
> 
> Actually, we are not ready to do that yet. There are three major things
> to consider.
> 
> First, to be able to get rid of the RAID-1 setup, we need to increase the
> replication level to at least 3x. So the space gain is not that great to
> begin with.
> 
> Second, this operation can take about a month for our scale according to
> our calculations and previous experience. During this period of
> increased I/O we might get peaks of performance degradation. Plus, we
> currently do not have the necessary hardware available to increase the
> replication level before we get rid of the RAID setup.
> 
> Third, we have a few disk failures per month. The RAID-1 setup has
> allowed us to seamlessly replace them without any hiccup or even a clue
> to the end user that something went wrong. Surely we can rely on RADOS
> to avoid any data loss, but if we currently rely on RADOS for recovery
> there might be some (minor) performance degradation, especially for the
> VM I/O traffic.
> 
That. 
And in addition you probably never had to do all that song and dance of
removing a failed OSD and bringing up a replacement. ^o^ 
One of the reasons I choose RAIDs as OSDs, especially since the Ceph
cluster in question is not local.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com