Hi,
On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:
Hi all,
I have setup a ceph cluster in my lab recently, the configuration per my understanding
should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state
"active+undersized+degraded", I think this should be very g
Hi all,
I have setup a ceph cluster in my lab recently, the configuration per my
understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of
PG stuck with state "active+undersized+degraded", I think this should be very
generic issue, could anyone help me out?
Here is the deta
Hey Igor, patch that you pointed worked for me.
Thanks Again.
From: ceph-users On Behalf Of Igor Fedotov
Sent: 20 June 2018 21:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] issues with ceph nautilus version
Hi Raju,
This is a bug in new BlueStore's bitmap allocator.
This PR wil
As a part of the repair operation it runs a deep-scrub on the PG. If it
showed active+clean after the repair and deep-scrub finished, then the next
run of a scrub on the PG shouldn't change the PG status at all.
On Wed, Jun 6, 2018 at 8:57 PM Adrian wrote:
> Update to this.
>
> The affected pg
We originally used pacemaker to move a VIP between our RGWs, but ultimately
decided to go with an LB in front of them. With an LB you can utilize both
RGWs while they're up, but the LB will shy away from either if they're down
until the check starts succeeding for that host again. We do have 2 LB
Hi Igor,
Great! Thanks for the quick response.
Will try the fix and let you know how it goes.
-Raj
From: ceph-users On Behalf Of Igor Fedotov
Sent: 20 June 2018 21:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] issues with ceph nautilus version
Hi Raju,
This is a bug in new Blue
Thanks, Paul - I could probably activate the Jewel tunables
profile without losing too many clients - most are running
at least kernel 4.2, I think. I'll go hunting for older
clients ...
After changing the tunables, do I need to restart any
Ceph daemons?
Another question, if I may: The hammer tu
Hi Raju,
This is a bug in new BlueStore's bitmap allocator.
This PR will most probably fix that:
https://github.com/ceph/ceph/pull/22610
Also you may try to switch bluestore and bluefs allocators
(bluestore_allocator and bluefs_allocator parameters respectively) to
stupid and restart OSDs.
I've also seen something similar with Luminous once on broken OSDs reporting
nonsense stats that overflowed some variables and reporting 1000% full.
In my case it was Bluestore OSDs running on too tiny VMs.
Paul
2018-06-20 17:41 GMT+02:00 Raju Rangoju :
> Hi,
>
>
>
> Recently I have upgrad
Yeah, your tunables are ancient. Probably wouldn't have happened with
modern ones.
If this was my cluster I would probably update the clients and update that
(caution: lots of data movement!),
but I know how annoying it can be to chase down everyone who runs ancient
clients.
For comparison, this i
Hi,
Perhaps not optimal nor exactly what you want but round robin DNS works
with two (or more) vanilla radosgw servers ok for me as a very
rudimentary form of failover and load balancing.
If you wanted active/standby you could use something like pacemaker to
start services and move the vIP a
Hi Paul,
ah, right, "ceph pg dump | grep remapped", that's what I was looking
for. I added the output and the result of the pg query at the end of
https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
> But my guess here is that you are running a CRUSH rule to distribute across
Hay All
Has any one, done or working a way to do S3(radosgw) failover.
I am trying to work out away to have 2 radosgw servers, with an VIP
when one server goes down it will go over to the other.
I am trying this with CTDB, but while testing the upload can fail and then
carry on or just hand and
Denny,
I should have mentioned this as well. Any ceph cluster wide checks I am
doing with Icinga are only applied to my 3 mon/mgr nodes. They would
definitely be annoying if it was on all osd nodes. Having the checks on
all of the mons allows me to not lose monitoring ability should one go dow
Hi,
Recently I have upgraded my ceph cluster to version 14.0.0 - nautilus(dev) from
ceph version 13.0.1, after this, I noticed some weird data usage numbers on the
cluster.
Here are the issues I'm seeing...
1. The data usage reported is much more than what is available
usage: 16 EiB used,
And BTW, if you can't make it to this event we're in the early days of
planning a dedicated Ceph + OpenStack Days at CERN around May/June
2019.
More news on that later...
-- Dan @ CERN
On Tue, Jun 19, 2018 at 10:23 PM Leonardo Vaz wrote:
>
> Hey Cephers,
>
> We will join our friends from OpenSt
Hi,
have a look at "ceph pg dump" to see which ones are stuck in remapped.
But my guess here is that you are running a CRUSH rule to distribute across
3 racks
and you only have 3 racks in total.
CRUSH will sometimes fail to find a mapping in this scenario. There are a
few parameters
that you can
Hi Leo,
On 06/20/2018 01:47 AM, Leonardo Vaz wrote:
> We created the following etherpad to organize the calendar for the
> future Ceph Tech Talks.
>
> For the Ceph Tech Talk of June 28th our fellow George Mihaiescu will
> tell us how Ceph is being used on cancer research at OICR (Ontario
> Insti
Dear Paul,
thanks, here goes (output of "ceph -s", etc.):
https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
> Also please run "ceph pg X.YZ query" on one of the PGs not backfilling.
Silly question: How do I get a list of the PGs not backfilling?
On 06/20/2018 04:00 PM, Pa
Hi Cephers,
Due the July 4th holiday in US we are postponing the Ceph Developer
Monthly meeting to July 11th.
Kindest regards,
Leo
--
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph
Hi Brad,
Yes, but it doesn't show much:
ceph pg 18.2 query
Error EPERM: problem getting command descriptions from pg.18.2
Cheers
- Original Message -
> From: "Brad Hubbard"
> To: "andrei"
> Cc: "ceph-users"
> Sent: Wednesday, 20 June, 2018 00:02:07
> Subject: Re: [ceph-users] fixin
Can you post the full output of "ceph -s", "ceph health detail, and ceph
osd df tree
Also please run "ceph pg X.YZ query" on one of the PGs not backfilling.
Paul
2018-06-20 15:25 GMT+02:00 Oliver Schulz :
> Dear all,
>
> we (somewhat) recently extended our Ceph cluster,
> and updated it to Lumi
On Wed, Jun 20, 2018 at 7:27 AM, Bernhard Dick wrote:
> Hi,
>
> I'm experimenting with CEPH and have seen that ceph-deploy and ceph-ansible
> have the EPEL repositories as requirement, when installing CEPH on CENTOS
> hosts. Due to the nature of the EPEL repos this might cause trouble (i.e.
> when
Dear all,
we (somewhat) recently extended our Ceph cluster,
and updated it to Luminous. By now, the fill level
on some ODSs is quite high again, so I'd like to
re-balance via "OSD reweight".
I'm running into the following problem, however:
Not matter what I do (reweigt a little, or a lot,
or onl
Thanks for the response. I was also hoping to be able to debug better
once we got onto Mimic. We just finished that upgrade yesterday and
cephfs-journal-tool does find a corruption in the purge queue though
our MDS continues to startup and the filesystem appears to be
functional as usual.
How ca
adding back in the list :)
-- Forwarded message -
From: Luis Periquito
Date: Wed, Jun 20, 2018 at 1:54 PM
Subject: Re: [ceph-users] Planning all flash cluster
To:
On Wed, Jun 20, 2018 at 1:35 PM Nick A wrote:
>
> Thank you, I was under the impression that 4GB RAM per 1TB was q
Hi,
It sounds like the .rgw.bucket.index pool has grown maybe due to some
problem with dynamic bucket resharding.
I wonder if the (stale/old/not used) bucket index's needs to be purged
using something like the below
radosgw-admin bi purge --bucket= --bucket-id=
Not sure how you would find the o
Adding more nodes from the beginning would probably be a good idea.
On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote:
>
> Hello Everyone,
>
> We're planning a small cluster on a budget, and I'd like to request any
> feedback or tips.
>
> 3x Dell R720XD with:
> 2x Xeon E5-2680v2 or very similar
The
Another great thing about lots of small servers vs. few big servers is that
you can use erasure coding.
You can save a lot of money by using erasure coding, but performance will
have to be evaluated
for your use case.
I'm working with several clusters that are 8-12 servers with 6-10 SSDs each
runn
This is true, but misses the point that the OP is talking about old
hardware already - you're not going to save much money on removing a 2nd
hand CPU from a system.
On Wed, 20 Jun 2018 at 22:10, Wido den Hollander wrote:
>
>
> On 06/20/2018 02:00 PM, Robert Sander wrote:
> > On 20.06.2018 13:58,
On 06/20/2018 02:00 PM, Robert Sander wrote:
> On 20.06.2018 13:58, Nick A wrote:
>
>> We'll probably add another 2 OSD drives per month per node until full
>> (24 SSD's per node), at which point, more nodes.
>
> I would add more nodes earlier to achieve better overall performance.
Exactly. No
* More small servers give better performance then few big servers, maybe
twice the number of servers with half the disks, cpus and RAM
* 2x 10 gbit is usually enough, especially with more servers. that will
rarely be the bottleneck (unless you have extreme bandwidth requirements)
* maybe save money
On 20.06.2018 13:58, Nick A wrote:
> We'll probably add another 2 OSD drives per month per node until full
> (24 SSD's per node), at which point, more nodes.
I would add more nodes earlier to achieve better overall performance.
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b,
Hello Everyone,
We're planning a small cluster on a budget, and I'd like to request any
feedback or tips.
3x Dell R720XD with:
2x Xeon E5-2680v2 or very similar
96GB RAM
2x Samsung SM863 240GB boot/OS drives
4x Samsung SM863 960GB OSD drives
Dual 40/56Gbit Infiniband using IPoIB.
3 replica, MON
Hi all,
We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and after
this we decided to update our tunables configuration to the optimals, which
were previously at Firefly. During this process, we have noticed the OSDs
(bluestore) rapidly filling on the RGW index and GC pool. W
Hi,
I'm experimenting with CEPH and have seen that ceph-deploy and
ceph-ansible have the EPEL repositories as requirement, when installing
CEPH on CENTOS hosts. Due to the nature of the EPEL repos this might
cause trouble (i.e. when combining CEPH with oVirt on the same host).
When using the C
Hi Wladimir,
A combination of slow enough clock speed , erasure code, single node
and SATA spinners is probably going to lead to not a really great
evaluation. Some of the experts will chime in here with answers to
your specific questions I"m sure but this test really isn't ever going
to give grea
Dear all,
I set up a minimal 1-node Ceph cluster to evaluate its performance.
We tried to save as much as possible on the hardware, so now the box has
Asus P10S-M WS motherboard, Xeon E3-1235L v5 CPU, 64 GB DDR4 ECC RAM and
8x3TB HDDs (WD30EFRX) connected to on-board SATA ports. Also w
Hi,
at the moment, we use Icinga2, check_ceph* and Telegraf with the Ceph
plugin. I'm asking what I need to have a separate host, which knows all
about the Ceph cluster health. The reason is, that each OSD node has
mostly the exact same data, which is transmitted into our database (like
InfluxDB
39 matches
Mail list logo