Wondering if anyone knows or has put together a way to wipe an Octopus
install? I’ve looked for documentation on the process, but if it exists, I
haven’t found it yet. I’m going through some test installs - working through
the ins and outs of cephadm and containers and would love an ea
You can try PetaSAN www.petasan.org we use rbd backend by SUSE. It
works out of the box.
/Maged
On 06/10/2020 19:49, dhils...@performair.com wrote:
Mark;
Are you suggesting some other means to configure iSCSI targets with Ceph?
If so, how do configure for non-tcmu?
The iSCSI clients are
pg_num and pgp_num need to be the same, not?
3.5.1. Set the Number of PGs
To set the number of placement groups in a pool, you must specify the
number of placement groups at the time you create the pool. See Create a
Pool for details. Once you set placement groups for a pool, you can
increase
Hi everyone,
I'm seeing a similar issue here. Any ideas on this?
Mac Wynkoop,
On Sun, Sep 6, 2020 at 11:09 PM norman wrote:
> Hi guys,
>
> When I update the pg_num of a pool, I found it not worked(no
> rebalanced), anyone know the reason? Pool's info:
>
> pool 21 'openstack-volumes-rs' replic
To be honest I don't really remember, those tests were from a while ago.
:) I'm guessing I probably was getting higher throughput with 32 vs 16
in some of the test cases but didn't need to go up to 64 at that time.
This was all before various work we've done in bluestore over the past
year th
Hi Dominic,
If you can't use kernel rbd I think you'll probably have to deal with
the higher overhead and lower performance with the tcmu solution. It's
possible there might be some things you can tweak at the tcmu layer that
will improve things, but when I looked at it there simply seemed t
Mark;
Are you suggesting some other means to configure iSCSI targets with Ceph?
If so, how do configure for non-tcmu?
The iSCSI clients are not RBD aware, and I can't really make them RBD aware.
Thank you,
Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International In
On 2020-10-06 15:27, Igor Fedotov wrote:
> I'm working on improving PG removal in master, see:
> https://github.com/ceph/ceph/pull/37496
>
> Hopefully this will help in case of "cleanup after rebalancing" issue
> which you presumably had.
That would be great. Does the offline compaction with the
On 2020-10-06 14:18, Kristof Coucke wrote:
> Ok, I did the compact on 1 osd.
> The utilization is back to normal, so that's good... Thumbs up to you guys!
We learned the hard way, but happy to spot the issue and share the info.
> Though, one thing I want to get out of the way before adapting the
On 2020-10-06 13:05, Igor Fedotov wrote:
>
> On 10/6/2020 1:04 PM, Kristof Coucke wrote:
>> Another strange thing is going on:
>>
>> No client software is using the system any longer, so we would expect
>> that all IOs are related to the recovery (fixing of the degraded PG).
>> However, the disks
I'm working on improving PG removal in master, see:
https://github.com/ceph/ceph/pull/37496
Hopefully this will help in case of "cleanup after rebalancing" issue
which you presumably had.
On 10/6/2020 4:24 PM, Kristof Coucke wrote:
Hi Igor and Stefan,
Everything seems okay, so we'll now cr
Hi Igor and Stefan,
Everything seems okay, so we'll now create a script to automate this on all
the nodes and we will also review the monitoring possibilities.
Thanks for your help, it was a time saver.
Does anyone know if this issue is better handled in the newer versions or
if this is planned i
I've seen similar reports after manual compactions as well. But it looks
like a presentation bug in RocksDB to me.
You can check if all the data is spilled over (as it ought to be for L4)
in bluefs section of OSD perf counters dump...
On 10/6/2020 3:18 PM, Kristof Coucke wrote:
Ok, I did th
We have similar with this issue last week. We have sluggish disk (10TB
SAS in RAID 0 mode) in half of node which affect performance of cluster.
These disk has high CPU usage and very high latency. Turns out there is
a process *patrol read* from RAID card that running automatically every
week. W
Ok, I did the compact on 1 osd.
The utilization is back to normal, so that's good... Thumbs up to you guys!
Though, one thing I want to get out of the way before adapting the other
OSDs:
When I now get the RocksDb stats, my L1, L2 and L3 are gone:
db_statistics {
"rocksdb_compaction_statistics
Den tis 6 okt. 2020 kl 11:13 skrev Kristof Coucke :
> I'm now wondering what my options are to improve the performance... The
> main goal is to use the system again, and make sure write operations are
> not affected.
> - Putting weight on 0 for the slow OSDs (temporary)? This way they recovery
> c
Good to know thank you. Why I'm thinking on consul, the user would just request
the endpoint from consul, so I can remove the single load balancer bottleneck,
they would go directly to rgw.
From: Janne Johansson
Sent: Tuesday, October 6, 2020 1:52 PM
To: Szabo,
Unfortunately currently available Ceph releases lack any means to
monitor KV data removal. The only way is to set debug_bluestore to 20
(for a short period of time, e.g. 1 min) and inspect OSD log for
_remove/_do_remove/_omap_clear calls. Plenty of them within the
inspected period means ongoing
Hi,
On 06.10.20 12:48, René Bartsch wrote:
> is there any documentation about mapping usernames, user-ids,
> groupnames and group-ids between hosts sharing the same CephFS storage?
CephFS is only recording the numeric user ID and group ID in the
directory entry. It is up to the client to map tha
hi rene,
On 10/6/20 12:48 PM, René Bartsch wrote:
> is there any documentation about mapping usernames, user-ids,
> groupnames and group-ids between hosts sharing the same CephFS storage?
i guess this is a bit outside of the scope of ceph … as with every
distributed environment, basically you'll
Hi,
I’m facing this issue, too. I haven’t found any satisfying mapping solution so
far. Now I’m considering deploying FreeIPA to unify the uid and gid on every
host. CephFS does not store user names and group names in the file system as
far as I know.
> On Oct 6, 2020, at 18:49, René Bartsch
Is there a way that I can check if this process is causing performance
issues?
Can I check somehow if this process is causing the issue?
Op di 6 okt. 2020 om 13:05 schreef Igor Fedotov :
>
> On 10/6/2020 1:04 PM, Kristof Coucke wrote:
>
> Another strange thing is going on:
>
> No client software
On 10/6/2020 1:04 PM, Kristof Coucke wrote:
Another strange thing is going on:
No client software is using the system any longer, so we would expect
that all IOs are related to the recovery (fixing of the degraded PG).
However, the disks that are reaching high IO are not a member of the
PGs t
I presume that this might be caused by massive KV data removal which was
initiated after(or during) data rebalance. We've seen multiple complains
about RocksDB's performance negatively affected by pool/pg removal. And
I expect data rebalance might suffer from the same...
You might want to run
Ok thanks, very clear, I am also indeed within this range.
-Original Message-
Subject: Re: [ceph-users] Re: Massive Mon DB Size with noout on 14.2.11
The important metric is the difference between these two values:
# ceph report | grep osdmap | grep committed report 3324953770
"os
Hi,
is there any documentation about mapping usernames, user-ids,
groupnames and group-ids between hosts sharing the same CephFS storage?
Thanx for any hint,
Renne
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-
Hi,
Is there anybody tried consul as a load balancer?
Any experience?
Thank you
This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by copyright or
other legal rules. If you have
Another strange thing is going on:
No client software is using the system any longer, so we would expect that
all IOs are related to the recovery (fixing of the degraded PG).
However, the disks that are reaching high IO are not a member of the PGs
that are being fixed.
So, something is heavily us
IF an OSD is lost, it will be detected after
osd heartbeat grace = 20 +
osd heartbeat interval = 5
ie 25 sec by default, which is what you see. During this time client io
will block, after this the OSD is flagged as down and a new OSD map is
issued which the client will use to re-direct the
Yes, some disks are spiking near 100%... The delay I see with the iostat
(r_await) seems to be synchronised with the delays between queued_for_pg
and reached_pg events.
The NVMe disks are not spiking, just the spinner disks.
I know the rocksdb is only partial on the NVMe. The read-ahead is also
12
Hi Kristof,
are you seeing high (around 100%) OSDs' disks (main or DB ones)
utilization along with slow ops?
Thanks,
Igor
On 10/6/2020 11:09 AM, Kristof Coucke wrote:
Hi all,
We have a Ceph cluster which has been expanded from 10 to 16 nodes.
Each node has between 14 and 16 OSDs of which
Thanks to @Anthony:
Diving further I see that I probably was blinded by the CPU load...
I see that some disks are very slow (so my first observations were
incorrect), and the latency seen using iostat seems more or less the same
as what we see in the dump_historic_ops. (+ 3s for r_await)
So, it l
Hi Anthony,
Thnx for the reply
Average values:
User: 3.5
Idle: 78.4
Wait: 20
System: 1.2
/K.
Op di 6 okt. 2020 om 10:18 schreef Anthony D'Atri :
>
>
> >
> > Diving onto the nodes we could see that the OSD daemons are consuming the
> > CPU power, resulting in average CPU loads going near 10 (!)
Hi all,
We have a Ceph cluster which has been expanded from 10 to 16 nodes.
Each node has between 14 and 16 OSDs of which 2 are NVMe disks.
Most disks (except NVMe's) are 16TB large.
The expansion of 16 nodes went ok, but we've configured the system to
prevent auto balance towards the new disks (
I think I do not understand you completely. How long does a live
migration take? If I do virsh migrate with vm's on librbd it is a few
seconds. I guess this is mainly caused by copying the ram to the other
host.
Any time more this takes in case of a host failure, is related to time
out sett
Hi everybody,
Our need is to do VM failover using an image disk over RBD to avoid data
loss.We want to limit the downtime as much as
possible.
We have: - Two hypervisors with a Ceph Monitor and a Ceph OSD. - A third
machine with a Ceph Monitor and a Ceph
Manager.
VM are running over qemu.The VM
36 matches
Mail list logo