Re: [ceph-users] mon memory usage (again)

2014-04-12 Thread Christian Balzer


On Fri, 11 Apr 2014 23:33:42 -0700 Gregory Farnum wrote:

> On Fri, Apr 11, 2014 at 11:12 PM, Christian Balzer  wrote:
> >
[snip]
> >
> > Questions remaining:
> >
> > a) Is that non-deterministic "ceph heap" behavior expected and if yes
> > can it be fixed?
> 
> You can specify the monitor you want to connect to (by IP) with the "-m"
> option.
>
Argh!
Well, I only feel half as stupid as I maybe should, since I only looked at
the ceph man page once or twice early one and after seeing that it was
tiny and had no actual information about the commands inside ceph I used
"ceph --help" and/or the homepage since then. ^o^
 
> > b) Any idea what might have caused the heap to grow to that size?
> > c) Shouldn't it have released that memory by itself at some point in
> > time?
> 
> This is some issue between tcmalloc (the memory allocator we use) and
> the OS that pops up rarely and hasn't been nailed down. Given that,
> we're not really sure what the issue is.
>
Fair enough, as long as it is known issue and hopefully won't go into
death spiral invoking the OOM.

Thanks,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
I'm in the process of building a dashboard for our Ceph nodes. I was
wondering if anyone out there had instrumented their OSD / MON clusters and
found particularly useful visualizations.

At first, I was trying to do ridiculous things (like graphing % used for
every disk in every OSD host), but I realized quickly that that is simply
too many metrics and far too visually dense to be useful. I am attempting
to put together a few simpler, more dense visualizations like... overcall
cluster utilization, aggregate cpu and memory utilization per osd host, etc.

Just looking for some suggestions.  Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Jason Villalta
Hi, i have not don't anything with metrics yet but the only ones I
personally would be interested in is total capacity utilization and cluster
latency.

Just my 2 cents.


On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier wrote:

> I'm in the process of building a dashboard for our Ceph nodes. I was
> wondering if anyone out there had instrumented their OSD / MON clusters and
> found particularly useful visualizations.
>
> At first, I was trying to do ridiculous things (like graphing % used for
> every disk in every OSD host), but I realized quickly that that is simply
> too many metrics and far too visually dense to be useful. I am attempting
> to put together a few simpler, more dense visualizations like... overcall
> cluster utilization, aggregate cpu and memory utilization per osd host, etc.
>
> Just looking for some suggestions.  Thanks!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com
<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
Curious as to how you define cluster latency.


On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta  wrote:

> Hi, i have not don't anything with metrics yet but the only ones I
> personally would be interested in is total capacity utilization and cluster
> latency.
>
> Just my 2 cents.
>
>
> On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier wrote:
>
>> I'm in the process of building a dashboard for our Ceph nodes. I was
>> wondering if anyone out there had instrumented their OSD / MON clusters and
>> found particularly useful visualizations.
>>
>> At first, I was trying to do ridiculous things (like graphing % used for
>> every disk in every OSD host), but I realized quickly that that is simply
>> too many metrics and far too visually dense to be useful. I am attempting
>> to put together a few simpler, more dense visualizations like... overcall
>> cluster utilization, aggregate cpu and memory utilization per osd host, etc.
>>
>> Just looking for some suggestions.  Thanks!
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com
>
<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Jason Villalta
I know ceph throws some warnings if there is high write latency.  But i
would be most intrested in the delay for io requests, linking directly to
iops.  If iops start to drop because the disk are overwhelmed then latency
for requests would be increasing.  This would tell me that I need to add
more OSDs/Nodes.  I am not sure there is a specific metric in ceph for this
but it would be awesome if there was.


On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier wrote:

> Curious as to how you define cluster latency.
>
>
> On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta wrote:
>
>> Hi, i have not don't anything with metrics yet but the only ones I
>> personally would be interested in is total capacity utilization and cluster
>> latency.
>>
>> Just my 2 cents.
>>
>>
>> On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier 
>> wrote:
>>
>>> I'm in the process of building a dashboard for our Ceph nodes. I was
>>> wondering if anyone out there had instrumented their OSD / MON clusters and
>>> found particularly useful visualizations.
>>>
>>>  At first, I was trying to do ridiculous things (like graphing % used
>>> for every disk in every OSD host), but I realized quickly that that is
>>> simply too many metrics and far too visually dense to be useful. I am
>>> attempting to put together a few simpler, more dense visualizations like...
>>> overcall cluster utilization, aggregate cpu and memory utilization per osd
>>> host, etc.
>>>
>>> Just looking for some suggestions.  Thanks!
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> --
>> *Jason Villalta*
>> Co-founder
>> [image: Inline image 1]
>> 800.799.4407x1230 | www.RubixTechnology.com
>>
>
>


-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com
<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Alexandre DERUMIER
Hello,

I known that qemu live migration with disk with cache=writeback are not safe 
with storage like nfs,iscsi...

Is it also true with rbd ?


If yes, it is possible to disable manually writeback online with qmp ?

Best Regards,

Alexandre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Alex Crow

Hi.

I've read in many places that you should never use writeback on any kind 
of shared storage. Caching is better dealt with on the storage side 
anyway as you have hopefully provided resilience there. In fact if your 
SAN/NAS is good enough it's supposed to be best to use "none" as the 
caching algo.


If you need caching on the hypervisor side it would probably better to 
use something like bcache/dmcache etc.


Cheers

Alex


On 12/04/14 16:01, Alexandre DERUMIER wrote:

Hello,

I known that qemu live migration with disk with cache=writeback are not safe 
with storage like nfs,iscsi...

Is it also true with rbd ?


If yes, it is possible to disable manually writeback online with qmp ?

Best Regards,

Alexandre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Christian Balzer

Hello,

On Sat, 12 Apr 2014 16:26:40 +0100 Alex Crow wrote:

> Hi.
> 
> I've read in many places that you should never use writeback on any kind 
> of shared storage. Caching is better dealt with on the storage side 
> anyway as you have hopefully provided resilience there. In fact if your 
> SAN/NAS is good enough it's supposed to be best to use "none" as the 
> caching algo.
> 
> If you need caching on the hypervisor side it would probably better to 
> use something like bcache/dmcache etc.
>
And I read:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg06890.html

Niw don't get me wrong, erring on the side of safety is quite sensible,
but the impact of having no caching by qemu is in the order of a 2
magnitudes easily.

Beers,

Christian

> Cheers
> 
> Alex
> 
> 
> On 12/04/14 16:01, Alexandre DERUMIER wrote:
> > Hello,
> >
> > I known that qemu live migration with disk with cache=writeback are
> > not safe with storage like nfs,iscsi...
> >
> > Is it also true with rbd ?
> >
> >
> > If yes, it is possible to disable manually writeback online with qmp ?
> >
> > Best Regards,
> >
> > Alexandre
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD: GPT Partition for journal on different partition ?

2014-04-12 Thread Sage Weil
Hi Florent,

GPT partitions ate required if the udev-based magic is going to work.  If you 
opt out of that strategy, you need to mount your file systems using fstab or 
similar and start the daemons manually.

sage

On April 12, 2014 6:38:13 AM PDT, Florent B  wrote:
>Hi all,
>
>I run Debian Wheezy and Ceph 0.72
>
>I would like to put journal of my OSD at the beginning of disk (aka
>/dev/sda1).
>/dev/sda4 is the partition for data.
>
>I do:
>
>ceph-disk prepare --cluster ceph --cluster-uuid xxx --fs-type xfs
>/dev/sda4 /dev/sda1
>
>
>
>Then :
>
>ceph-disk activate-journal /dev/sda1
>
>tells me it can't read from /dev/disk/by-partuuid/xxx
>
>I have no /dev/disk/by-partuuid/ because I use MBR partitions.
>
>Are GPT partitions required to do what I want ?
>
>Thank you a lot
>
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Sent from Kaiten Mail. Please excuse my brevity.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Mark Nelson
One thing I do right now for ceph performance testing is run a copy of 
collectl during every test.  This gives you a TON of information about 
CPU usage, network stats, disk stats, etc.  It's pretty easy to import 
the output data into gnuplot.  Mark Seger (the creator of collectl) also 
has some tools to gather aggregate statistics across multiple nodes. 
Beyond collectl, you can get a ton of useful data out of the ceph admin 
socket.  I especially like dump_historic_ops as it some times is enough 
to avoid having to parse through debug 20 logs.


While the following tools have too much overhead to be really useful for 
general system monitoring, they are really useful for specific 
performance investiations:


1) perf with the dwarf/unwind support
2) blktrace (optionally with seekwatcher)
3) valgrind (cachegrind, callgrind, massif)

Beyond that, there are some collectd plugins for Ceph and last time I 
checked DreamHost was using Graphite for a lot of visualizations. 
There's always ganglia too. :)


Mark

On 04/12/2014 09:41 AM, Jason Villalta wrote:

I know ceph throws some warnings if there is high write latency.  But i
would be most intrested in the delay for io requests, linking directly
to iops.  If iops start to drop because the disk are overwhelmed then
latency for requests would be increasing.  This would tell me that I
need to add more OSDs/Nodes.  I am not sure there is a specific metric
in ceph for this but it would be awesome if there was.


On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier mailto:greg.poir...@opower.com>> wrote:

Curious as to how you define cluster latency.


On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta mailto:ja...@rubixnet.com>> wrote:

Hi, i have not don't anything with metrics yet but the only ones
I personally would be interested in is total capacity
utilization and cluster latency.

Just my 2 cents.


On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier
mailto:greg.poir...@opower.com>> wrote:

I'm in the process of building a dashboard for our Ceph
nodes. I was wondering if anyone out there had instrumented
their OSD / MON clusters and found particularly useful
visualizations.

At first, I was trying to do ridiculous things (like
graphing % used for every disk in every OSD host), but I
realized quickly that that is simply too many metrics and
far too visually dense to be useful. I am attempting to put
together a few simpler, more dense visualizations like...
overcall cluster utilization, aggregate cpu and memory
utilization per osd host, etc.

Just looking for some suggestions.  Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
--
*/Jason Villalta/*
Co-founder
Inline image 1
800.799.4407x1230 | www.RubixTechnology.com






--
--
*/Jason Villalta/*
Co-founder
Inline image 1
800.799.4407x1230 | www.RubixTechnology.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
We are collecting system metrics through sysstat every minute and getting
those to OpenTSDB via Sensu. We have a plethora of metrics, but I am
finding it difficult to create meaningful visualizations. We have alerting
for things like individual OSDs reaching capacity thresholds, memory spikes
on OSD or MON hosts. I am just trying to come up with some visualizations
that could become solid indicators that something is wrong with the cluster
in general, or with a particular host (besides CPU or memory utilization).

This morning, I have thought of things like:

- Stddev of bytes used on all disks in the cluster and individual OSD hosts
- 1st and 2nd derivative of bytes used on all disks in the cluster and
individual OSD hosts
- bytes used in the entire cluster
- % usage of cluster capacity

Stddev should help us identify hotspots. Velocity and acceleration of bytes
used should help us with capacity planning. Bytes used in general is just a
neat thing to see, but doesn't tell us all that much. % usage of cluster
capacity is another thing that's just kind of neat to see.

What would you suggest looking for in dump_historic_ops? Maybe get regular
metrics on things like total transaction length? The only problem is that
dump_historic_ops may not always contain relevant/recent data. It is not as
easily translated into time series data as some other things.




On Sat, Apr 12, 2014 at 9:23 AM, Mark Nelson wrote:

> One thing I do right now for ceph performance testing is run a copy of
> collectl during every test.  This gives you a TON of information about CPU
> usage, network stats, disk stats, etc.  It's pretty easy to import the
> output data into gnuplot.  Mark Seger (the creator of collectl) also has
> some tools to gather aggregate statistics across multiple nodes. Beyond
> collectl, you can get a ton of useful data out of the ceph admin socket.  I
> especially like dump_historic_ops as it some times is enough to avoid
> having to parse through debug 20 logs.
>
> While the following tools have too much overhead to be really useful for
> general system monitoring, they are really useful for specific performance
> investiations:
>
> 1) perf with the dwarf/unwind support
> 2) blktrace (optionally with seekwatcher)
> 3) valgrind (cachegrind, callgrind, massif)
>
> Beyond that, there are some collectd plugins for Ceph and last time I
> checked DreamHost was using Graphite for a lot of visualizations. There's
> always ganglia too. :)
>
> Mark
>
>
> On 04/12/2014 09:41 AM, Jason Villalta wrote:
>
>> I know ceph throws some warnings if there is high write latency.  But i
>> would be most intrested in the delay for io requests, linking directly
>> to iops.  If iops start to drop because the disk are overwhelmed then
>> latency for requests would be increasing.  This would tell me that I
>> need to add more OSDs/Nodes.  I am not sure there is a specific metric
>> in ceph for this but it would be awesome if there was.
>>
>>
>> On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier > > wrote:
>>
>> Curious as to how you define cluster latency.
>>
>>
>> On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta > > wrote:
>>
>> Hi, i have not don't anything with metrics yet but the only ones
>> I personally would be interested in is total capacity
>> utilization and cluster latency.
>>
>> Just my 2 cents.
>>
>>
>> On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier
>> mailto:greg.poir...@opower.com>> wrote:
>>
>> I'm in the process of building a dashboard for our Ceph
>> nodes. I was wondering if anyone out there had instrumented
>> their OSD / MON clusters and found particularly useful
>> visualizations.
>>
>> At first, I was trying to do ridiculous things (like
>> graphing % used for every disk in every OSD host), but I
>> realized quickly that that is simply too many metrics and
>> far too visually dense to be useful. I am attempting to put
>> together a few simpler, more dense visualizations like...
>> overcall cluster utilization, aggregate cpu and memory
>> utilization per osd host, etc.
>>
>> Just looking for some suggestions.  Thanks!
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> --
>> --
>> */Jason Villalta/*
>> Co-founder
>>
>> Inline image 1
>> 800.799.4407x1230 | www.RubixTechnology.com
>> 
>>
>>
>>
>>
>>
>> --
>> --
>> */Jason Villalta/*
>> Co-founder
>>
>> Inline image 1
>> 800.799.4407x1230 | www.RubixTechnology.com
>> 
>>
>>
>>
>> 

[ceph-users] DCBX & Ceph...

2014-04-12 Thread N. Richard Solis
Anyone using DCB-X features on their cluster network in conjunction with
Ceph?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Andrey Korolyov
Hello,

AFAIK qemu calls bdrv_flush at the end of migration process so this is
absolutely safe. Anyway it`s proven by our production systems very
well too :)

On Sat, Apr 12, 2014 at 7:01 PM, Alexandre DERUMIER  wrote:
> Hello,
>
> I known that qemu live migration with disk with cache=writeback are not safe 
> with storage like nfs,iscsi...
>
> Is it also true with rbd ?
>
>
> If yes, it is possible to disable manually writeback online with qmp ?
>
> Best Regards,
>
> Alexandre
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg incomplete, won't create

2014-04-12 Thread Craig Lewis
I reformatted 2 OSDs, in a cluster with 2 replicas.  I tried to get as 
much data off them as possible before hand, using ceph osd out, but I 
couldn't get it all.

I know I've lost data.

I have 1 incomplete PG, which is better than I expected.  Following 
previous advice, I ran

ceph pg force_create_pg 11.483

The PG switches to 'creating' for a while, then goes back to 'incomplete':
2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035 
active+clean, 553 active+remapped+wait_backfill, 2 active+recovery_wait, 
1 active+remapped+backfilling, 1 incomplete; 15086 GB data, 30576 GB 
used, 29011 GB / 59588 GB avail; 4606075/41313663 objects degraded 
(11.149%); 24965 kB/s, 34 objects/s recovering
2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1 
creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2 
active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data, 
30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objects 
degraded (11.149%); 16179 kB/s, 22 objects/s recovering


2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 included 
below; oldest blocked for > 444.032652 secs
2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 seconds 
old, received at 2014-04-12 12:20:58.763265: osd_op(client.57449388.0:1 
.dir.us-west-1.51941060.1 [delete] 11.7c96a483 e28552) v4 currently 
reached pg


2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up, 16 in
2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1 
creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB / 
59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137 kB/s, 
28 objects/s recovering

2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up, 16 in
2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037 
active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GB used, 
29002 GB / 59588 GB avail; 4597857/41313663 objects degraded (11.129%)


The blocked object is on the incomplete PG.

PG query is 2.3MiB: https://cd.centraldesktop.com/p/eAAADSsLAH2kja0

The query is from after the PG switched back to incomplete.

I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).


How can I get this PG clean again?

Once it's clean, is there a RGW fsck/scrub I can run?

Any advice is appreciated.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Craig Lewis

  
  
I've been graphing disk latency, osd
  latency, and RGW latency.  It's a bit tricky to pull out of ceph
--admin-daemon ceph-osd.0.asok perf dump though.  perf dump
  gives you the total ops and total op time.  You have to track the
  delta of those two values, then divide the deltas to get the
  average latency over your sample interval.
  
  I had some alerting on those values, but it was too noisy.  The
  graphs are helpful though, especially the graphs that have all of
  a single node's disks (one graph) and OSDs (second graph) on it. 
  Viewing both graphs helped me identify several problems, including
  a failing disk and a bad write cache battery.
  
  I'm not getting much out of the RGW latency graph though. It's
  pretty much just the sum of all the OSD latency graphs during that
  sample interval.
  
  
  
  
 
  Craig Lewis
  
   Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com
   
  Central Desktop.
  Work together in ways you never thought possible.
 
   Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog  

  

  
  On 4/12/14 07:37 , Greg Poirier wrote:


  Curious as to how you define cluster latency.
  

On Sat, Apr 12, 2014 at 7:21 AM, Jason
  Villalta 
  wrote:
  
Hi, i have not don't anything with metrics
  yet but the only ones I personally would be interested in
  is total capacity utilization and cluster latency.
  

  
  Just my 2 cents.


  
  
  

  On Sat, Apr 12, 2014 at 10:02 AM, Greg
Poirier 
wrote:
  


  

  I'm in the process of building a
dashboard for our Ceph nodes. I was wondering if
anyone out there had instrumented their OSD /
MON clusters and found particularly useful
visualizations.



  At first, I was trying to do ridiculous things
  (like graphing % used for every disk in every
  OSD host), but I realized quickly that that is
  simply too many metrics and far too visually
  dense to be useful. I am attempting to put
  together a few simpler, more dense
  visualizations like... overcall cluster
  utilization, aggregate cpu and memory
  utilization per osd host, etc.


Just looking for some suggestions.  Thanks!
  
  

  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  

  
  
  
  
  
  -- 
  -- 
Jason
Villalta

  Co-founder
  
  800.799.4407x1230 | www.RubixTechnology.com

  

  


  
  
  
  
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg incomplete, won't create

2014-04-12 Thread Craig Lewis

From another discussion, I learned about ceph osd lost.

I'm draining osd 1 and 3 (ceph osd out).  Once they're empty, I'll mark 
them lost and see if that helps.


*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 4/12/14 12:43 , Craig Lewis wrote:
I reformatted 2 OSDs, in a cluster with 2 replicas.  I tried to get as 
much data off them as possible before hand, using ceph osd out, but I 
couldn't get it all.

I know I've lost data.

I have 1 incomplete PG, which is better than I expected. Following 
previous advice, I ran

ceph pg force_create_pg 11.483

The PG switches to 'creating' for a while, then goes back to 'incomplete':
2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035 
active+clean, 553 active+remapped+wait_backfill, 2 
active+recovery_wait, 1 active+remapped+backfilling, 1 incomplete; 
15086 GB data, 30576 GB used, 29011 GB / 59588 GB avail; 
4606075/41313663 objects degraded (11.149%); 24965 kB/s, 34 objects/s 
recovering
2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1 
creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2 
active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data, 
30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objects 
degraded (11.149%); 16179 kB/s, 22 objects/s recovering


2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 included 
below; oldest blocked for > 444.032652 secs
2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 seconds 
old, received at 2014-04-12 12:20:58.763265: 
osd_op(client.57449388.0:1 .dir.us-west-1.51941060.1 [delete] 
11.7c96a483 e28552) v4 currently reached pg


2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up, 
16 in
2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1 
creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB / 
59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137 
kB/s, 28 objects/s recovering
2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up, 
16 in
2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037 
active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GB 
used, 29002 GB / 59588 GB avail; 4597857/41313663 objects degraded 
(11.129%)


The blocked object is on the incomplete PG.

PG query is 2.3MiB: 
https://cd.centraldesktop.com/p/eAAADSsLAH2kja0


The query is from after the PG switched back to incomplete.

I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).


How can I get this PG clean again?

Once it's clean, is there a RGW fsck/scrub I can run?

Any advice is appreciated.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg incomplete, won't create

2014-04-12 Thread Craig Lewis
While I'm waiting for these OSDs to drain, is there any way to 
prioritize certain PGs to recover/backfill first?


In this case, I'd prefer to prioritize the PGs that are on the two OSDs 
that I'm draining.



There have been other times I've wanted to manually boost a recovery 
though.  Most times when I see an object blocked, I'd like to move it's 
PG to the front of the recovery line.  It seems to happen the most often 
on RGW directories, since they get a lot of activity.  When that 
happens, RGW is effectively down until whenever the affected PG gets 
around to recovering.


I have osd max backfills = 1.  Maybe that comes into play here?


*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 4/12/14 13:29 , Craig Lewis wrote:

From another discussion, I learned about ceph osd lost.

I'm draining osd 1 and 3 (ceph osd out).  Once they're empty, I'll 
mark them lost and see if that helps.


*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 4/12/14 12:43 , Craig Lewis wrote:
I reformatted 2 OSDs, in a cluster with 2 replicas.  I tried to get 
as much data off them as possible before hand, using ceph osd out, 
but I couldn't get it all.

I know I've lost data.

I have 1 incomplete PG, which is better than I expected. Following 
previous advice, I ran

ceph pg force_create_pg 11.483

The PG switches to 'creating' for a while, then goes back to 
'incomplete':
2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035 
active+clean, 553 active+remapped+wait_backfill, 2 
active+recovery_wait, 1 active+remapped+backfilling, 1 incomplete; 
15086 GB data, 30576 GB used, 29011 GB / 59588 GB avail; 
4606075/41313663 objects degraded (11.149%); 24965 kB/s, 34 objects/s 
recovering
2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1 
creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2 
active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data, 
30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objects 
degraded (11.149%); 16179 kB/s, 22 objects/s recovering


2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 included 
below; oldest blocked for > 444.032652 secs
2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 seconds 
old, received at 2014-04-12 12:20:58.763265: 
osd_op(client.57449388.0:1 .dir.us-west-1.51941060.1 [delete] 
11.7c96a483 e28552) v4 currently reached pg


2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up, 
16 in
2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1 
creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB / 
59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137 
kB/s, 28 objects/s recovering
2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up, 
16 in
2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037 
active+clean, 552 active+remapped+wait_backfill, 2 
active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GB 
used, 29002 GB / 59588 GB avail; 4597857/41313663 objects degraded 
(11.129%)


The blocked object is on the incomplete PG.

PG query is 2.3MiB: 
https://cd.centraldesktop.com/p/eAAADSsLAH2kja0


The query is from after the PG switched back to incomplete.

I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).


How can I get this PG clean again?

Once it's clean, is there a RGW fsck/scrub I can run?

Any advice is appreciated.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users 

Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Alexandre DERUMIER
>>And I read:
>>https://www.mail-archive.com/ceph-users@lists.ceph.com/msg06890.html
>>
>>Niw don't get me wrong, erring on the side of safety is quite sensible,
>>but the impact of having no caching by qemu is in the order of a 2
>>magnitudes easily.


Thanks for link reference !




- Mail original - 

De: "Christian Balzer"  
À: "Alex Crow"  
Cc: ceph-users@lists.ceph.com 
Envoyé: Samedi 12 Avril 2014 17:56:07 
Objet: Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live 
migration safe ? 


Hello, 

On Sat, 12 Apr 2014 16:26:40 +0100 Alex Crow wrote: 

> Hi. 
> 
> I've read in many places that you should never use writeback on any kind 
> of shared storage. Caching is better dealt with on the storage side 
> anyway as you have hopefully provided resilience there. In fact if your 
> SAN/NAS is good enough it's supposed to be best to use "none" as the 
> caching algo. 
> 
> If you need caching on the hypervisor side it would probably better to 
> use something like bcache/dmcache etc. 
> 
And I read: 
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg06890.html 

Niw don't get me wrong, erring on the side of safety is quite sensible, 
but the impact of having no caching by qemu is in the order of a 2 
magnitudes easily. 

Beers, 

Christian 

> Cheers 
> 
> Alex 
> 
> 
> On 12/04/14 16:01, Alexandre DERUMIER wrote: 
> > Hello, 
> > 
> > I known that qemu live migration with disk with cache=writeback are 
> > not safe with storage like nfs,iscsi... 
> > 
> > Is it also true with rbd ? 
> > 
> > 
> > If yes, it is possible to disable manually writeback online with qmp ? 
> > 
> > Best Regards, 
> > 
> > Alexandre 
> > ___ 
> > ceph-users mailing list 
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 


-- 
Christian Balzer Network/Systems Engineer 
ch...@gol.com Global OnLine Japan/Fusion Communications 
http://www.gol.com/ 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Alexandre DERUMIER
>>If you need caching on the hypervisor side it would probably better to 
>>use something like bcache/dmcache etc.
Not possible in my case, as I use qemu rbd block driver, not the rbd kernel 
module.


My concern was mainly using librbd cache
http://ceph.com/docs/master/rbd/rbd-config-ref/#rbd-cache-config-settings

which is enabled with cache=writeback


According the doc:
" it can coalesce contiguous requests for better throughput."

So they are optimisations specific to rbd when it's enabled.


(If someone have documentation about how exactly librbd cache is working, I'm 
interested)


- Mail original - 

De: "Alex Crow"  
À: ceph-users@lists.ceph.com 
Envoyé: Samedi 12 Avril 2014 17:26:40 
Objet: Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live 
migration safe ? 

Hi. 

I've read in many places that you should never use writeback on any kind 
of shared storage. Caching is better dealt with on the storage side 
anyway as you have hopefully provided resilience there. In fact if your 
SAN/NAS is good enough it's supposed to be best to use "none" as the 
caching algo. 

If you need caching on the hypervisor side it would probably better to 
use something like bcache/dmcache etc. 

Cheers 

Alex 


On 12/04/14 16:01, Alexandre DERUMIER wrote: 
> Hello, 
> 
> I known that qemu live migration with disk with cache=writeback are not safe 
> with storage like nfs,iscsi... 
> 
> Is it also true with rbd ? 
> 
> 
> If yes, it is possible to disable manually writeback online with qmp ? 
> 
> Best Regards, 
> 
> Alexandre 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ?

2014-04-12 Thread Stefan Priebe - Profihost AG

Am 13.04.2014 um 06:39 schrieb Alexandre DERUMIER :

>>> And I read:
>>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg06890.html
>>> 
>>> Niw don't get me wrong, erring on the side of safety is quite sensible,
>>> but the impact of having no caching by qemu is in the order of a 2
>>> magnitudes easily.
> 
> 
> Thanks for link reference !

Works for me too. Had already done 100'drets of migrations.

Greets,
Stefan 

> 
> 
> 
> 
> - Mail original - 
> 
> De: "Christian Balzer"  
> À: "Alex Crow"  
> Cc: ceph-users@lists.ceph.com 
> Envoyé: Samedi 12 Avril 2014 17:56:07 
> Objet: Re: [ceph-users] qemu + rbd block driver with cache=writeback, is live 
> migration safe ? 
> 
> 
> Hello, 
> 
>> On Sat, 12 Apr 2014 16:26:40 +0100 Alex Crow wrote: 
>> 
>> Hi. 
>> 
>> I've read in many places that you should never use writeback on any kind 
>> of shared storage. Caching is better dealt with on the storage side 
>> anyway as you have hopefully provided resilience there. In fact if your 
>> SAN/NAS is good enough it's supposed to be best to use "none" as the 
>> caching algo. 
>> 
>> If you need caching on the hypervisor side it would probably better to 
>> use something like bcache/dmcache etc.
> And I read: 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg06890.html 
> 
> Niw don't get me wrong, erring on the side of safety is quite sensible, 
> but the impact of having no caching by qemu is in the order of a 2 
> magnitudes easily. 
> 
> Beers, 
> 
> Christian 
> 
>> Cheers 
>> 
>> Alex 
>> 
>> 
>>> On 12/04/14 16:01, Alexandre DERUMIER wrote: 
>>> Hello, 
>>> 
>>> I known that qemu live migration with disk with cache=writeback are 
>>> not safe with storage like nfs,iscsi... 
>>> 
>>> Is it also true with rbd ? 
>>> 
>>> 
>>> If yes, it is possible to disable manually writeback online with qmp ? 
>>> 
>>> Best Regards, 
>>> 
>>> Alexandre 
>>> ___ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> -- 
> Christian Balzer Network/Systems Engineer 
> ch...@gol.com Global OnLine Japan/Fusion Communications 
> http://www.gol.com/ 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com