good morning,
i am trying to understand ceph snapshot sizing. For example if i have 2.7
GB volume and i create a snap on it, the sizing says:
(BEFORE SNAP)
rbd du volumes/volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080
NAME PROVISIONED USED
volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080 10 GiB 2.7 Gi
Hi,
I got one cephfs with one volume and subvolumes with a erasure coding.
If I don't set any quota when I run df on the client I got
0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 5,8P 78T 5,8P 2% /vo
The 78T seem to be the size use by ceph on disk (on the hardware I mean). And I
find th
We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was powered off
and we got big problem
pls check ceph-s result below
now we cannot start mds service, ( we tried to start but it stopped after 2
minute)
Now my application cannot access to NFS exported Folder
What should we do
[roo
What does the MDS log when it crashes?
Zitat von nguyenvand...@baoviet.com.vn:
We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was
powered off and we got big problem
pls check ceph-s result below
now we cannot start mds service, ( we tried to start but it stopped
after 2 min
How can we get log of MDS, pls guide me T_T
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
There a couple of ways, find your MDS daemon with:
ceph fs status -> should show you the to-be-active MDS
On that host run:
cephadm logs --name mds.{MDS}
or alternatively:
cephadm ls --no-detail | grep mds
journalctl -u ceph-{FSID}@mds.{MDS} --no-pager > {MDS}.log
Zitat von nguyenvand...@b
Hm, I wonder if setting (and unsetting after a while) noscrub and
nodeep-scrub has any effect. Have you tried that?
Zitat von Cedric :
Update: we have run fsck and re-shard on all bluestore volume, seems
sharding were not applied.
Unfortunately scrubs and deep-scrubs are still stuck on PGs
it suck too long log, could you pls guide me how to grep/filter important
things in logs ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
just responding to the last questions:
- After the bootstrap, the Web interface was accessible :
- How can I access the wizard page again? If I don't use it the first
time I could not find another way to get it.
I don't know how to recall the wizard, but you should be able
Feb 22 13:39:43 cephgw02 conmon[1340927]: log_file
/var/lib/ceph/crash/2024-02-22T06:39:43.618845Z_78ee38bc-9115-4bc6-8c3a-4bf42284c970/log
Feb 22 13:39:43 cephgw02 conmon[1340927]: --- end dump of recent events ---
Feb 22 13:39:45 cephgw02 systemd[1]:
ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@
If it crashes after two minutes you have your time window to look for.
Restart the mds daemon and capture everything after that until the
crash.
Zitat von nguyenvand...@baoviet.com.vn:
it suck too long log, could you pls guide me how to grep/filter
important things in logs ?
_
Thanks Eugen for the suggestion, yes we have tried, also repeering
concerned PGs, still the same issue.
Looking at the code it seems the split-mode message is triggered when
the PG as ""stats_invalid": true,", here is the result of a query:
"stats_invalid": true,
"dirty_stats_inva
I found a config to force scrub invalid PGs, what is your current
setting on that?
ceph config get osd osd_scrub_invalid_stats
true
The config reference states:
Forces extra scrub to fix stats marked as invalid.
But the default seems to be true, so I'd expect it's true in your case
as we
Yes the osd_scrub_invalid_stats is set to true.
We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.
On Thu, Feb 22, 2024 at 11:50 AM Eugen Block wrote:
>
> I found a config to force scrub invalid PGs, what is your current
> s
Hi Folks,
We are excited to announce plans for building a larger Ceph-S3 setup.
To ensure its success, extensive testing is needed in advance.
Some of these tests don't need a full-blown Ceph cluster on hardware
but still require meeting specific logical requirements, such as a
multi-site S3 setu
We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.
You don't seem to have unfound objects so I don't think that command
would make sense.
You haven't told yet if you changed the hit_set_count to 0.
Have you already tried
On Thu, Feb 22, 2024 at 12:37 PM Eugen Block wrote:
> You haven't told yet if you changed the hit_set_count to 0.
Not yet, we will give it a try ASAP
> Have you already tried to set the primary PG out and wait for the
> backfill to finish?
No, we will try also
> And another question, are all s
I had to temporarily disconnect the network on my entire Ceph cluster, so I
prepared the cluster by following what appears to be some incomplete
advice.
I did the following before disconnecting the network:
#ceph osd set noout
#ceph osd set norecover
#ceph osd set norebalance
#ceph osd set nobackf
Hi Chip,
Looks like not all mons are up or couldn't reach each other via network to
form quorum.
Make sure all nodes can reach each other and check the mon logs.
Furthermore some info about
pvecm status
pveceph status
or just
ceph status
would be helpful
Cheers
Stephan
Am Do., 22. F
Hi,
Yes you can, this controlled by option
client quota df = false
k
Sent from my iPhone
> On Feb 22, 2024, at 11:17, Albert Shih wrote:
>
> Is they are any way to keep the first answer ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubsc
Hi,
Have you already tried to set the primary PG out and wait for the
backfill to finish?
Of course I meant the primary OSD for that PG, I hope that was clear. ;-)
We are thinking about the use of "ceph pg_mark_unfound_lost revert"
I'm not a developer, but how I read the code [2] is that s
The problem turns out to be burning the candle at both ends. I have been
checking network communication for the past few hours and haven't realized
I was using my 1Gb IPs, not the 100Gb IPs. The 100Gb got connected to the
wrong ports on the cable move.
Thanks for the attempted assists. Focusi
Hello guys,
We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes of
IO utilization for bstore_kv_sync thread during processes such as adding a
new pool and increasing/reducing the number of PGs in a pool.
It is funny though that the IO utilization (reported with IOTOP) is 99.99%
Most likely you are seeing time spent waiting on fdatsync in
bstore_kv_sync if the drives you are using don't have power loss
protection and can't perform flushes quickly. Some consumer grade
drives are actually slower at this than HDDs.
Mark
On 2/22/24 11:04, Work Ceph wrote:
Hello guys,
Thanks for the prompt response!
I see, and indeed some of them are consumer SSD disks. Is there any
parameter that we can change/tune to better handle the call "fdatsync"?
Maybe using NVMEs for the RocksDB?
On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson wrote:
> Most likely you are seeing time sp
The biggest improvement would be to put all of the OSDs on SSDs with
PLP. Next would be to put the WAL/DB on drives with PLP. If price is a
concern, you can sometimes find really good older drives like Intel
P4510s on ebay for reasonable prices. Just watch out for how much write
wear they h
> you can sometimes find really good older drives like Intel P4510s on ebay
> for reasonable prices. Just watch out for how much write wear they have on
> them.
Also be sure to update to the latest firmware before use, then issue a Secure
Erase.
>
_
Could you pls help me explain the status of volume: recovering ? what is it ?
and do we need to wait for volume recovery progress finished ??
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
HI Mr Patrick,
We are in same situation with Sake, now my MDS is crashed , NFS service is down
with CEPHFS not responding. with my "ceph -s" result
health: HEALTH_WARN
3 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons availab
29 matches
Mail list logo