date:20180620

Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-20 Thread Burkhard Linke

Hi, On 06/21/2018 05:14 AM, dave.c...@dell.com wrote: Hi all, I have setup a ceph cluster in my lab recently, the configuration per my understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state "active+undersized+degraded", I think this should be very g

[ceph-users] PG status is "active+undersized+degraded"

2018-06-20 Thread Dave.Chen

Hi all, I have setup a ceph cluster in my lab recently, the configuration per my understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state "active+undersized+degraded", I think this should be very generic issue, could anyone help me out? Here is the deta

Re: [ceph-users] issues with ceph nautilus version

2018-06-20 Thread Raju Rangoju

Hey Igor, patch that you pointed worked for me. Thanks Again. From: ceph-users On Behalf Of Igor Fedotov Sent: 20 June 2018 21:55 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] issues with ceph nautilus version Hi Raju, This is a bug in new BlueStore's bitmap allocator. This PR wil

Re: [ceph-users] pg inconsistent, scrub stat mismatch on bytes

2018-06-20 Thread David Turner

As a part of the repair operation it runs a deep-scrub on the PG. If it showed active+clean after the repair and deep-scrub finished, then the next run of a scrub on the PG shouldn't change the PG status at all. On Wed, Jun 6, 2018 at 8:57 PM Adrian wrote: > Update to this. > > The affected pg

Re: [ceph-users] radosgw failover help

2018-06-20 Thread David Turner

We originally used pacemaker to move a VIP between our RGWs, but ultimately decided to go with an LB in front of them. With an LB you can utilize both RGWs while they're up, but the LB will shy away from either if they're down until the check starts succeeding for that host again. We do have 2 LB

Re: [ceph-users] issues with ceph nautilus version

2018-06-20 Thread Raju Rangoju

Hi Igor, Great! Thanks for the quick response. Will try the fix and let you know how it goes. -Raj From: ceph-users On Behalf Of Igor Fedotov Sent: 20 June 2018 21:55 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] issues with ceph nautilus version Hi Raju, This is a bug in new Blue

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz

Thanks, Paul - I could probably activate the Jewel tunables profile without losing too many clients - most are running at least kernel 4.2, I think. I'll go hunting for older clients ... After changing the tunables, do I need to restart any Ceph daemons? Another question, if I may: The hammer tu

Re: [ceph-users] issues with ceph nautilus version

2018-06-20 Thread Igor Fedotov

Hi Raju, This is a bug in new BlueStore's bitmap allocator. This PR will most probably fix that: https://github.com/ceph/ceph/pull/22610 Also you may try to switch bluestore and bluefs allocators (bluestore_allocator and bluefs_allocator parameters respectively) to stupid and restart OSDs.

Re: [ceph-users] issues with ceph nautilus version

2018-06-20 Thread Paul Emmerich

I've also seen something similar with Luminous once on broken OSDs reporting nonsense stats that overflowed some variables and reporting 1000% full. In my case it was Bluestore OSDs running on too tiny VMs. Paul 2018-06-20 17:41 GMT+02:00 Raju Rangoju : > Hi, > > > > Recently I have upgrad

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Paul Emmerich

Yeah, your tunables are ancient. Probably wouldn't have happened with modern ones. If this was my cluster I would probably update the clients and update that (caution: lots of data movement!), but I know how annoying it can be to chase down everyone who runs ancient clients. For comparison, this i

Re: [ceph-users] radosgw failover help

2018-06-20 Thread Simon Ironside

Hi, Perhaps not optimal nor exactly what you want but round robin DNS works with two (or more) vanilla radosgw servers ok for me as a very rudimentary form of failover and load balancing. If you wanted active/standby you could use something like pacemaker to start services and move the vIP a

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz

Hi Paul, ah, right, "ceph pg dump | grep remapped", that's what I was looking for. I added the output and the result of the pg query at the end of https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb > But my guess here is that you are running a CRUSH rule to distribute across

[ceph-users] radosgw failover help

2018-06-20 Thread nigel davies

Hay All Has any one, done or working a way to do S3(radosgw) failover. I am trying to work out away to have 2 radosgw servers, with an VIP when one server goes down it will go over to the other. I am trying this with CTDB, but while testing the upload can fail and then carry on or just hand and

Re: [ceph-users] separate monitoring node

2018-06-20 Thread Kevin Hrpcek

Denny, I should have mentioned this as well. Any ceph cluster wide checks I am doing with Icinga are only applied to my 3 mon/mgr nodes. They would definitely be annoying if it was on all osd nodes. Having the checks on all of the mons allows me to not lose monitoring ability should one go dow

[ceph-users] issues with ceph nautilus version

2018-06-20 Thread Raju Rangoju

Hi, Recently I have upgraded my ceph cluster to version 14.0.0 - nautilus(dev) from ceph version 13.0.1, after this, I noticed some weird data usage numbers on the cluster. Here are the issues I'm seeing... 1. The data usage reported is much more than what is available usage: 16 EiB used,

Re: [ceph-users] CentOS Dojo at CERN

2018-06-20 Thread Dan van der Ster

And BTW, if you can't make it to this event we're in the early days of planning a dedicated Ceph + OpenStack Days at CERN around May/June 2019. More news on that later... -- Dan @ CERN On Tue, Jun 19, 2018 at 10:23 PM Leonardo Vaz wrote: > > Hey Cephers, > > We will join our friends from OpenSt

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Paul Emmerich

Hi, have a look at "ceph pg dump" to see which ones are stuck in remapped. But my guess here is that you are running a CRUSH rule to distribute across 3 racks and you only have 3 racks in total. CRUSH will sometimes fail to find a mapping in this scenario. There are a few parameters that you can

Re: [ceph-users] [Ceph-community] Ceph Tech Talk Calendar

2018-06-20 Thread Lenz Grimmer

Hi Leo, On 06/20/2018 01:47 AM, Leonardo Vaz wrote: > We created the following etherpad to organize the calendar for the > future Ceph Tech Talks. > > For the Ceph Tech Talk of June 28th our fellow George Mihaiescu will > tell us how Ceph is being used on cancer research at OICR (Ontario > Insti

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz

Dear Paul, thanks, here goes (output of "ceph -s", etc.): https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb > Also please run "ceph pg X.YZ query" on one of the PGs not backfilling. Silly question: How do I get a list of the PGs not backfilling? On 06/20/2018 04:00 PM, Pa

[ceph-users] [Important] Ceph Developer Monthly of July 2018

2018-06-20 Thread Leonardo Vaz

Hi Cephers, Due the July 4th holiday in US we are postponing the Ceph Developer Monthly meeting to July 11th. Kindest regards, Leo -- Leonardo Vaz Ceph Community Manager Open Source and Standards Team ___ ceph-users mailing list ceph-users@lists.ceph

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-20 Thread Andrei Mikhailovsky

Hi Brad, Yes, but it doesn't show much: ceph pg 18.2 query Error EPERM: problem getting command descriptions from pg.18.2 Cheers - Original Message - > From: "Brad Hubbard" > To: "andrei" > Cc: "ceph-users" > Sent: Wednesday, 20 June, 2018 00:02:07 > Subject: Re: [ceph-users] fixin

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Paul Emmerich

Can you post the full output of "ceph -s", "ceph health detail, and ceph osd df tree Also please run "ceph pg X.YZ query" on one of the PGs not backfilling. Paul 2018-06-20 15:25 GMT+02:00 Oliver Schulz : > Dear all, > > we (somewhat) recently extended our Ceph cluster, > and updated it to Lumi

Re: [ceph-users] EPEL dependency on CENTOS

2018-06-20 Thread Alfredo Deza

On Wed, Jun 20, 2018 at 7:27 AM, Bernhard Dick wrote: > Hi, > > I'm experimenting with CEPH and have seen that ceph-deploy and ceph-ansible > have the EPEL repositories as requirement, when installing CEPH on CENTOS > hosts. Due to the nature of the EPEL repos this might cause trouble (i.e. > when

[ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz

Dear all, we (somewhat) recently extended our Ceph cluster, and updated it to Luminous. By now, the fill level on some ODSs is quite high again, so I'd like to re-balance via "OSD reweight". I'm running into the following problem, however: Not matter what I do (reweigt a little, or a lot, or onl

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-20 Thread Benjeman Meekhof

Thanks for the response. I was also hoping to be able to debug better once we got onto Mimic. We just finished that upgrade yesterday and cephfs-journal-tool does find a corruption in the purge queue though our MDS continues to startup and the filesystem appears to be functional as usual. How ca

[ceph-users] Fwd: Planning all flash cluster

2018-06-20 Thread Luis Periquito

adding back in the list :) -- Forwarded message - From: Luis Periquito Date: Wed, Jun 20, 2018 at 1:54 PM Subject: Re: [ceph-users] Planning all flash cluster To: On Wed, Jun 20, 2018 at 1:35 PM Nick A wrote: > > Thank you, I was under the impression that 4GB RAM per 1TB was q

Re: [ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

2018-06-20 Thread Sean Redmond

Hi, It sounds like the .rgw.bucket.index pool has grown maybe due to some problem with dynamic bucket resharding. I wonder if the (stale/old/not used) bucket index's needs to be purged using something like the below radosgw-admin bi purge --bucket= --bucket-id= Not sure how you would find the o

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Luis Periquito

Adding more nodes from the beginning would probably be a good idea. On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote: > > Hello Everyone, > > We're planning a small cluster on a budget, and I'd like to request any > feedback or tips. > > 3x Dell R720XD with: > 2x Xeon E5-2680v2 or very similar The

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Paul Emmerich

Another great thing about lots of small servers vs. few big servers is that you can use erasure coding. You can save a lot of money by using erasure coding, but performance will have to be evaluated for your use case. I'm working with several clusters that are 8-12 servers with 6-10 SSDs each runn

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Blair Bethwaite

This is true, but misses the point that the OP is talking about old hardware already - you're not going to save much money on removing a 2nd hand CPU from a system. On Wed, 20 Jun 2018 at 22:10, Wido den Hollander wrote: > > > On 06/20/2018 02:00 PM, Robert Sander wrote: > > On 20.06.2018 13:58,

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Wido den Hollander

On 06/20/2018 02:00 PM, Robert Sander wrote: > On 20.06.2018 13:58, Nick A wrote: > >> We'll probably add another 2 OSD drives per month per node until full >> (24 SSD's per node), at which point, more nodes. > > I would add more nodes earlier to achieve better overall performance. Exactly. No

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Paul Emmerich

* More small servers give better performance then few big servers, maybe twice the number of servers with half the disks, cpus and RAM * 2x 10 gbit is usually enough, especially with more servers. that will rarely be the bottleneck (unless you have extreme bandwidth requirements) * maybe save money

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Robert Sander

On 20.06.2018 13:58, Nick A wrote: > We'll probably add another 2 OSD drives per month per node until full > (24 SSD's per node), at which point, more nodes. I would add more nodes earlier to achieve better overall performance. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b,

[ceph-users] Planning all flash cluster

2018-06-20 Thread Nick A

Hello Everyone, We're planning a small cluster on a budget, and I'd like to request any feedback or tips. 3x Dell R720XD with: 2x Xeon E5-2680v2 or very similar 96GB RAM 2x Samsung SM863 240GB boot/OS drives 4x Samsung SM863 960GB OSD drives Dual 40/56Gbit Infiniband using IPoIB. 3 replica, MON

[ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

2018-06-20 Thread Tom W

Hi all, We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and after this we decided to update our tunables configuration to the optimals, which were previously at Firefly. During this process, we have noticed the OSDs (bluestore) rapidly filling on the RGW index and GC pool. W

[ceph-users] EPEL dependency on CENTOS

2018-06-20 Thread Bernhard Dick

Hi, I'm experimenting with CEPH and have seen that ceph-deploy and ceph-ansible have the EPEL repositories as requirement, when installing CEPH on CENTOS hosts. Due to the nature of the EPEL repos this might cause trouble (i.e. when combining CEPH with oVirt on the same host). When using the C

Re: [ceph-users] HDD-only performance, how far can it be sped up ?

2018-06-20 Thread Brian :

Hi Wladimir, A combination of slow enough clock speed , erasure code, single node and SATA spinners is probably going to lead to not a really great evaluation. Some of the experts will chime in here with answers to your specific questions I"m sure but this test really isn't ever going to give grea

[ceph-users] HDD-only performance, how far can it be sped up ?

2018-06-20 Thread Wladimir Mutel

Dear all, I set up a minimal 1-node Ceph cluster to evaluate its performance. We tried to save as much as possible on the hardware, so now the box has Asus P10S-M WS motherboard, Xeon E3-1235L v5 CPU, 64 GB DDR4 ECC RAM and 8x3TB HDDs (WD30EFRX) connected to on-board SATA ports. Also w

Re: [ceph-users] separate monitoring node

2018-06-20 Thread Konstantin Shalygin

Hi, at the moment, we use Icinga2, check_ceph* and Telegraf with the Ceph plugin. I'm asking what I need to have a separate host, which knows all about the Ceph cluster health. The reason is, that each OSD node has mostly the exact same data, which is transmitted into our database (like InfluxDB

39 matches

Mail list logo