Hello,
Something I came across a while agao, but the recent discussion here
jolted my memory.
If you have a cluster configured with just a "public network" and that
network being in RFC space like 10.0.0.0/8, you'd think you'd be "safe",
wouldn't you?
Alas you're not:
---
root@ceph-01:~# netsta
Hello,
On Thu, 12 Jan 2017 10:03:33 -0500 Sivaram Kannan wrote:
> Hi,
>
> Thanks for the reply. The public network I am talking about is an
> isolated network with no access to internet, but lot of compute
> traffic though. If it is more about security, I would try setting up
> both in the same
There is another nice tool for ceph monitoring:
https://github.com/inkscope/inkscope
Little hard to setup but beside just monitoring you can also manage some
items using it.
regards
Marko
On 1/13/17 07:30, Tu Holmes wrote:
I'll give ceph-dash a look.
Thanks!
On Thu, Jan 12, 2017 at 9:19
Hi Greg,
Am 2017-01-12 19:54, schrieb Gregory Farnum:
...
That's not what anybody intended to have happen. It's possible the
simultaneous loss of a monitor and the OSDs is triggering a case
that's not behaving correctly. Can you create a ticket at
tracker.ceph.com with your logs and what steps
We are using a production cluster which started in Firefly, then moved to
Giant, Hammer and finally Jewel. So our images have different features
correspondind to the value of "rbd_default_features" of the version when
they were created.
We have actually three pack of features activated :
image with
Hammer or jewel? I've forgotten which thread pool is handling the snap
trim nowadays -- is it the op thread yet? If so, perhaps all the op
threads are stuck sleeping? Just a wild guess. (Maybe increasing # op
threads would help?).
-- Dan
On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk wrote:
> Hi,
>
On 13-1-2017 09:07, Christian Balzer wrote:
>
> Hello,
>
> Something I came across a while agao, but the recent discussion here
> jolted my memory.
>
> If you have a cluster configured with just a "public network" and that
> network being in RFC space like 10.0.0.0/8, you'd think you'd be "safe"
Hi
I know this isn't the obvious choice to ask this, but nonetheless:
Has anyone had any experience with running IBM Spectrum Protect (or Tivoli
Storage Manager as it was previously know as) BA client backups of filesystems
created inside RBDs using TSMs Journal-based backup features[1][2]?
Th
On Fri, Jan 13, 2017 at 5:11 AM, Vincent Godin wrote:
> We are using a production cluster which started in Firefly, then moved to
> Giant, Hammer and finally Jewel. So our images have different features
> correspondind to the value of "rbd_default_features" of the version when
> they were created.
Another tool :
http://openattic.org/
- Mail original -
De: "Marko Stojanovic"
À: "Tu Holmes" , "John Petrini"
Cc: "ceph-users"
Envoyé: Vendredi 13 Janvier 2017 09:30:16
Objet: Re: [ceph-users] Calamari or Alternative
There is another nice tool for ceph monitoring:
[ https://githu
I remember seeing one of the openATTIC project people on the list
mentioning that.
My initial question is, "Can you configure openATTIC just to monitor an
existing cluster without having to build a new one?"
//Tu
On Fri, Jan 13, 2017 at 6:10 AM Alexandre DERUMIER
wrote:
> Another tool :
>
> htt
Hi everyone,
We have a deployment with 90 OSDs at the moment which is all SSD that’s not
hitting quite the performance that it should be in my opinion, a `rados bench`
run gives something along these numbers:
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304
for up t
> Op 13 januari 2017 om 18:18 schreef Mohammed Naser :
>
>
> Hi everyone,
>
> We have a deployment with 90 OSDs at the moment which is all SSD that’s not
> hitting quite the performance that it should be in my opinion, a `rados
> bench` run gives something along these numbers:
>
> Maintainin
> On Jan 13, 2017, at 12:37 PM, Wido den Hollander wrote:
>
>
>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser :
>>
>>
>> Hi everyone,
>>
>> We have a deployment with 90 OSDs at the moment which is all SSD that’s not
>> hitting quite the performance that it should be in my opinion, a `
> Op 13 januari 2017 om 18:39 schreef Mohammed Naser :
>
>
>
> > On Jan 13, 2017, at 12:37 PM, Wido den Hollander wrote:
> >
> >
> >> Op 13 januari 2017 om 18:18 schreef Mohammed Naser :
> >>
> >>
> >> Hi everyone,
> >>
> >> We have a deployment with 90 OSDs at the moment which is all SSD
> On Jan 13, 2017, at 12:41 PM, Wido den Hollander wrote:
>
>
>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser :
>>
>>
>>
>>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander wrote:
>>>
>>>
Op 13 januari 2017 om 18:18 schreef Mohammed Naser :
Hi everyone,
<< Both OSDs are pinned to two cores on the system
Is there any reason you are pinning osds like that ? I would say for object
workload there is no need to pin osds.
The configuration you mentioned , Ceph with 4M object PUT it should be
saturating your network first.
Have you run say 4M object G
Also, there are lot of discussion about SSDs not suitable for Ceph write
workload (with filestore) in community as those are not good for odirect/odsync
kind of writes. Hope your SSDs are tolerant of that.
-Original Message-
From: Somnath Roy
Sent: Friday, January 13, 2017 10:06 AM
To: '
These Intel SSDs are more than capable of handling the workload, in addition,
this cluster is used as an RBD backend for an OpenStack cluster.
Sent from my iPhone
> On Jan 13, 2017, at 1:08 PM, Somnath Roy wrote:
>
> Also, there are lot of discussion about SSDs not suitable for Ceph write
>
> Op 13 januari 2017 om 18:50 schreef Mohammed Naser :
>
>
>
> > On Jan 13, 2017, at 12:41 PM, Wido den Hollander wrote:
> >
> >
> >> Op 13 januari 2017 om 18:39 schreef Mohammed Naser :
> >>
> >>
> >>
> >>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander wrote:
> >>>
> >>>
> Op
> Op 24 december 2016 om 13:47 schreef Wido den Hollander :
>
>
>
> > Op 23 december 2016 om 16:05 schreef Wido den Hollander :
> >
> >
> >
> > > Op 22 december 2016 om 19:00 schreef Orit Wasserman :
> > >
> > >
> > > HI Maruis,
> > >
> > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiek
> On Jan 13, 2017, at 1:34 PM, Wido den Hollander wrote:
>
>>
>> Op 13 januari 2017 om 18:50 schreef Mohammed Naser :
>>
>>
>>
>>> On Jan 13, 2017, at 12:41 PM, Wido den Hollander wrote:
>>>
>>>
Op 13 januari 2017 om 18:39 schreef Mohammed Naser :
> On Jan 13, 2
We're using:
https://github.com/rochaporto/collectd-ceph
for time-series, with a slightly modified Grafana dashboard from the one
referenced.
https://github.com/Crapworks/ceph-dash
for quick health status.
Both took a small bit of modification to make them work with Jewel at the time,
not
> Op 13 januari 2017 om 20:33 schreef Mohammed Naser :
>
>
>
> > On Jan 13, 2017, at 1:34 PM, Wido den Hollander wrote:
> >
> >>
> >> Op 13 januari 2017 om 18:50 schreef Mohammed Naser :
> >>
> >>
> >>
> >>> On Jan 13, 2017, at 12:41 PM, Wido den Hollander wrote:
> >>>
> >>>
> Op
General question/survey:
Those that have larger clusters, how are you doing alerting/monitoring?
Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about
collectd related but more on initial alerts of an issue or potential issue?
What threshold do you use basically? Just trying
We don't use many critical alerts (that will have our NOC wake up an engineer),
but the main one that we do have is a check that tells us if there are 2 or
more hosts with osds that are down. We have clusters with 60 servers in them,
so having an osd die and backfill off of isn't something to w
Thanks.
What about 'NN ops > 32 sec' (blocked ops) type alerts? Does anyone monitor
for those type and if so what criteria do you use?
Thanks again!
On Fri, Jan 13, 2017 at 3:28 PM, David Turner wrote:
> We don't use many critical alerts (that will have our NOC wake up an
> engineer), but the
We don't currently monitor that, but my todo list has an item to monitor for
blocked requests longer than 500 seconds to critical on. You can see how long
they've been blocked for from `ceph health detail`. Our cluster doesn't need
to be super fast at any given point, but it does need to be pr
We're on Jewel and your right, I'm pretty sure the snap stuff is also now
handled in the op thread.
The dump historic ops socket command showed a 10s delay at the "Reached PG"
stage, from Greg's response [1], it would suggest that the OSD itself isn't
blocking but the PG it's currently sleeping
We monitor few things:
- cluster health (error only, ignoring warnings since we have separate
checks for interesting things)
- if all PGs are active (number of active replicas >= min_size)
- if there are any blocked requests (it's a good indicator, in our case,
that some disk is going to fail s
I am sorry for posting this if this has been addressed already. I am not
sure on how to search through old ceph-users mailing list posts. I used to
use gmane.org but that seems to be down.
My setup::
I have a moderate ceph cluster (ceph hammer 94.9
- fe6d859066244b97b24f09d46552afc2071e6f90 ). Th
FYI, I'm seeing this as well on the latest Kraken 11.1.1 RPMs on CentOS 7
w/ elrepo kernel 4.8.10. ceph-mgr is currently tearing through CPU and has
allocated ~11GB of RAM after a single day of usage. Only the active manager
is performing this way. The growth is linear and reproducible.
The cluste
On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒 wrote:
> Thank you for your continuous helpJ.
>
>
>
> We are using hammer 0.94.5 version, and what I read is the version of the
> source code.
>
> However, on the other hand, if Pipe::do_recv do act as blocked, is it
> reasonable for the Pipe::reader_thread to
Want to install debuginfo packages and use something like this to try
and find out where it is spending most of its time?
https://poormansprofiler.org/
Note that you may need to do multiple runs to get a "feel" for where
it is spending most of its time. Also not that likely only one or two
thread
Hello,
On Fri, 13 Jan 2017 13:18:35 -0500 Mohammed Naser wrote:
> These Intel SSDs are more than capable of handling the workload, in addition,
> this cluster is used as an RBD backend for an OpenStack cluster.
>
I haven't tested the S3520s yet, them being the first 3D NAND offering
from Inte
35 matches
Mail list logo