[ceph-users] Re: RGW: HEAD ok but GET fails

2024-08-09 Thread Mathias Chapelain
Hello,

Did the customer deleted the object by any chance? If yes, could this be 
related to https://tracker.ceph.com/issues/63935 ?
We got a scenario where an application was doing some DELETE and then listing 
bucket entries.
It was able to find objects that should have been deleted and then was trying 
to GET them without success.

Regards,


Mathias Chapelain
Storage Engineer
Proton AG


On Friday, August 9th, 2024 at 08:54, Eugen Block  wrote:

> Hi,
> 
> I'm trying to help a customer with a RGW question, maybe someone here
> can help me out. Their S3 application reports errors every now and
> then, and it is complaining about missing objects. This is what the
> RGW logs:
> 
> [08/Aug/2024:08:23:47.540 +] "HEAD
> /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508
>  HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 
> .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 
> ClientSync" -
> latency=0.00392s
> 
> [08/Aug/2024:08:23:47.552 +] "GET
> /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508
>  HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 
> .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 
> ClientSync" bytes=0-2097151
> latency=0.00392s
> 
> So apparently, it can successfully query the HEAD, but the GET request
> shows 404. We can confirm that the queried object indeed doesn't exist
> in the data pool. But the object metadata must have been written
> successfully, apparently. Unfortunately, we don't have enough logs to
> find the corresponding PUT request, they just increased the retention
> days for logrotate to be able to inspect when it happens the next
> time. But my question is, should they see some metadata in the
> listomapkeys/listomapvals output in the index pool?
> The docs [0] state this about Index Transactions:
> 
> > Because the head objects are stored in different rados objects than
> > the bucket indices, we can’t update both atomically with a single
> > rados operation. In order to satisfy the Consistency Guarantee for
> > listing operations, we have to coordinate these two object writes
> > using a three-step bucket index transaction:
> > 
> > 1. Prepare a transaction on its bucket index object.
> > 2. Write or delete the head object.
> > 3. Commit the transaction on the bucket index object (or cancel the
> > transaction if step 2 fails).
> > 
> > Object writes and deletes may race with each other, so a given
> > object may have more than one prepared transaction at a time. RGW
> > considers an object entry to be ‘pending’ if there are any
> > outstanding transactions, or ‘completed’ otherwise.
> 
> 
> Could this be such a race condition which "just happens" from time to
> time? Or can this somehow be prevented from happening? Because right
> now the clenaup process is a bit complicated application-wise.
> I'm not the most experienced RGW user, so I'd be grateful for any
> pointers here.
> 
> Thanks!
> Eugen
> 
> [0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: HEAD ok but GET fails

2024-08-09 Thread Eugen Block
That's interesting, thanks for the link to the tracker issue. There's  
definitely a chance that it could have been deleted (by the  
application), but we don't have enough logs right now to confirm. They  
don't have many insights into the application, so it can be difficult  
to get to the bottom of this. I'll keep an eye on it though, because  
it will most likely happen again, but it's not that frequently. So  
hopefully with more logs we can debug a bit better.


Thanks!
Eugen

Zitat von Mathias Chapelain :


Hello,

Did the customer deleted the object by any chance? If yes, could  
this be related to https://tracker.ceph.com/issues/63935 ?
We got a scenario where an application was doing some DELETE and  
then listing bucket entries.
It was able to find objects that should have been deleted and then  
was trying to GET them without success.


Regards,


Mathias Chapelain
Storage Engineer
Proton AG


On Friday, August 9th, 2024 at 08:54, Eugen Block  wrote:


Hi,

I'm trying to help a customer with a RGW question, maybe someone here
can help me out. Their S3 application reports errors every now and
then, and it is complaining about missing objects. This is what the
RGW logs:

[08/Aug/2024:08:23:47.540 +] "HEAD
/hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync"  
-

latency=0.00392s

[08/Aug/2024:08:23:47.552 +] "GET
/hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync"  
bytes=0-2097151

latency=0.00392s

So apparently, it can successfully query the HEAD, but the GET request
shows 404. We can confirm that the queried object indeed doesn't exist
in the data pool. But the object metadata must have been written
successfully, apparently. Unfortunately, we don't have enough logs to
find the corresponding PUT request, they just increased the retention
days for logrotate to be able to inspect when it happens the next
time. But my question is, should they see some metadata in the
listomapkeys/listomapvals output in the index pool?
The docs [0] state this about Index Transactions:

> Because the head objects are stored in different rados objects than
> the bucket indices, we can’t update both atomically with a single
> rados operation. In order to satisfy the Consistency Guarantee for
> listing operations, we have to coordinate these two object writes
> using a three-step bucket index transaction:
>
> 1. Prepare a transaction on its bucket index object.
> 2. Write or delete the head object.
> 3. Commit the transaction on the bucket index object (or cancel the
> transaction if step 2 fails).
>
> Object writes and deletes may race with each other, so a given
> object may have more than one prepared transaction at a time. RGW
> considers an object entry to be ‘pending’ if there are any
> outstanding transactions, or ‘completed’ otherwise.


Could this be such a race condition which "just happens" from time to
time? Or can this somehow be prevented from happening? Because right
now the clenaup process is a bit complicated application-wise.
I'm not the most experienced RGW user, so I'd be grateful for any
pointers here.

Thanks!
Eugen

[0]  
https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool

2024-08-09 Thread Frédéric Nass
Hi Chulin,

When it comes to data consistency, it's generally admitted that Ceph is an 
undefeated master.

Considering the very few (~100) rados objects that were completely lost (data 
and metadata) and the fact that you're using colocated HDD OSDs with volatile 
disk buffers caching rocksdb metadata and Bluestore data and metadata, I doubt 
that volatile disk buffers weren't involved in the data loss, whatever the logs 
say or don't say about which of the 6 over 9 OSDs were in the acting set at the 
moment of the power outage.

Unless you're ok with facing data loss again, I'd advise you fix the initial 
design flaws if you can. Like stop using non-persistent cache / buffers along 
the IO path, raise mon_size to k+1 and reconsider data placement in regards to 
risks of network partitioning, power outage, fire. Also, considering the ceph 
status, make sure you don't run out of disk space.

Best regards,
Frédéric.


De : Best Regards 
Envoyé : jeudi 8 août 2024 11:32
À : Frédéric Nass
Cc: ceph-users 
Objet : Re:Re: Re:[ceph-users] Re: Please guide us inidentifying 
thecauseofthedata miss in EC pool

Hi,Frédéric Nass


Sorry, I may not have expressed it clearly before. The epoch and OSD up/down 
timeline was extracted and merged from the 9 OSD logs. I analyzed the PG 
(9.11b6) peering process. OSD 494, 1169, 1057 fully recorded the down/up of 
other OSDs. I also checked the logs of the other 6 OSDs. The role conversion 
during peering was expected and no abnormalities were found. I also checked the 
status of the monitor. One of the 5 monitors lost power and was powered on 
after about 40 minutes. The log showed that its rank value was relatively large 
and it did not become the leader.

Let's talk about the fault domain. The fault domain we set is the host level, 
but in fact all hosts are distributed in 2 buildings, but the original designer 
did not consider the fault level of the building.



In this case, the OSD may have a brain split, but from the log, it does not 
happen.



Best regards.



Best Regards
wu_chu...@qq.com

Best Regards






   
Original Email
   
 

From:"Frédéric Nass"< frederic.n...@univ-lorraine.fr >;

Sent Time:2024/8/8 15:40

To:"Best Regards"< wu_chu...@qq.com >;

Cc recipient:"ceph-users"< ceph-users@ceph.io >;

Subject:Re: Re:[ceph-users] Re: Please guide us inidentifying thecauseofthedata 
miss in EC pool


ceph osd pause is a lot of constraints from an operational perspective. :-)


host uptime and service running time is a thing. But it doesn't mean that these 
3 OSDs were in the acting set when the power outage occured.


Since OSDs 494, 1169 and 1057 did not crash, I assume they're in the same 
failure domain. Is that right? 


Being isolated along with their local MON(s) from other MONs and other 6 OSDs, 
there's a fair chance that any of the 6 other OSDs in other failure domains 
took the lead, sent 5 chunks around and acknowledged the write to RGW client. 
Then all of them crashed.


Your thoughts?


Frédéric.






De : Best Regards https://tracker.ceph.com/issues/66942, it includes the 
original logs needed for troubleshooting. However, four days have passed 
without any response. In desperation, we are sending this email, hoping that 
someone from the Ceph team can guide us as soon as possible. 


 We are currently in a difficult situation and hope you can provide guidance. 
Thank you. 



 Best regards. 





 wu_chu...@qq.com 
 wu_chu...@qq.com
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Logging Configuration and "Large omap objects found"

2024-08-09 Thread Janek Bevendorff

Hi,

I have a bunch of long-standing struggles with the way Ceph handles 
logging and I cannot figured out how to solve them. These issues are 
basically the following:


- The log config options are utterly confusing and very badly documented
- Mon file logs are spammed with DBG-level cluster logs, no matter what 
I configure
- syslog/journald gets only part of the messages, but file logs must be 
turned off due to the spam above

- "audit" channel logs cannot be configured at all

From this, my following needs and questions arise, perhaps you can help:

- I want to receive cluster and audit logs on the MONs with level "warn" 
or at most "info"

- I want everything to go to journald
- Where to cluster logs go if "clog_to_monitors" is off?
- What's the difference between the "mon_cluster_log_*" and "clog_*" 
settings?
- What the hell does "mon_cluster_log_to_syslog_facility" do and what 
does "audit=local0" mean or do?


A very annoying symptom of the wonky logging config is that I cannot 
debug the infamous "LARGE_OMAP_OBJECTS" warning. It says "Search the 
cluster log for 'Large omap object found' for more details.", but I 
cannot do that, because without enabling the file-logging flood gates, I 
never receive the required cluster log info at the monitors and there 
seems to be no other way to debug this than to grep the cluster log (why??).


My current log config is the following:

global    advanced 
clog_to_monitors true
global    basic 
err_to_syslog    true
global    basic 
log_to_file  false
global    basic 
log_to_stderr    false
global    basic 
log_to_syslog    true
mon   advanced 
mon_cluster_log_file /dev/null
mon   advanced 
mon_cluster_log_to_file  false
mon   advanced 
mon_cluster_log_to_stderr    false
mon   advanced 
mon_cluster_log_to_syslog    true
mon   advanced 
mon_cluster_log_to_syslog_facility   daemon
mon   advanced 
mon_cluster_log_to_syslog_level  warn



Any help solving this conundrum is much appreciated. Thanks!

Janek



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-09 Thread Eugen Block

Hi,

I don't have much to comment about logging, I feel you though. I just  
wanted to point out that the details about the large omap object  
should be in the (primary) OSD log, not in the MON log:


grep -i "large omap"  
/var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.*
/var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.1.log:2024-08-09T11:21:23.943+ 7ffb66d10700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0) Key count: 363 Size (bytes):  
2070


Regards,
Eugen

Zitat von Janek Bevendorff :


Hi,

I have a bunch of long-standing struggles with the way Ceph handles  
logging and I cannot figured out how to solve them. These issues are  
basically the following:


- The log config options are utterly confusing and very badly documented
- Mon file logs are spammed with DBG-level cluster logs, no matter  
what I configure
- syslog/journald gets only part of the messages, but file logs must  
be turned off due to the spam above

- "audit" channel logs cannot be configured at all

From this, my following needs and questions arise, perhaps you can help:

- I want to receive cluster and audit logs on the MONs with level  
"warn" or at most "info"

- I want everything to go to journald
- Where to cluster logs go if "clog_to_monitors" is off?
- What's the difference between the "mon_cluster_log_*" and "clog_*"  
settings?
- What the hell does "mon_cluster_log_to_syslog_facility" do and  
what does "audit=local0" mean or do?


A very annoying symptom of the wonky logging config is that I cannot  
debug the infamous "LARGE_OMAP_OBJECTS" warning. It says "Search the  
cluster log for 'Large omap object found' for more details.", but I  
cannot do that, because without enabling the file-logging flood  
gates, I never receive the required cluster log info at the monitors  
and there seems to be no other way to debug this than to grep the  
cluster log (why??).


My current log config is the following:

global    advanced  
clog_to_monitors true
global    basic  
err_to_syslog    true
global    basic  
log_to_file  false
global    basic  
log_to_stderr    false
global    basic  
log_to_syslog    true
mon   advanced  
mon_cluster_log_file /dev/null
mon   advanced  
mon_cluster_log_to_file  false
mon   advanced  
mon_cluster_log_to_stderr    false
mon   advanced  
mon_cluster_log_to_syslog    true
mon   advanced  
mon_cluster_log_to_syslog_facility   daemon
mon   advanced  
mon_cluster_log_to_syslog_level  warn



Any help solving this conundrum is much appreciated. Thanks!

Janek



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-09 Thread Eugen Block

I forgot to add this one to get the info from any admin node:

ceph log last 10 warn cluster
2024-08-09T11:21:23.949916+ osd.1 (osd.1) 6 : cluster [WRN] Large  
omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0)  
Key count: 363 Size (bytes): 2070
2024-08-09T11:21:27.723959+ mon.soc9-ceph (mon.0) 11905 : cluster  
[WRN] Health check failed: 1 large omap objects (LARGE_OMAP_OBJECTS)


Then you don't have to search each node for logs.

Zitat von Eugen Block :


Hi,

I don't have much to comment about logging, I feel you though. I  
just wanted to point out that the details about the large omap  
object should be in the (primary) OSD log, not in the MON log:


grep -i "large omap"  
/var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.*
/var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.1.log:2024-08-09T11:21:23.943+ 7ffb66d10700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0) Key count: 363 Size (bytes):  
2070


Regards,
Eugen

Zitat von Janek Bevendorff :


Hi,

I have a bunch of long-standing struggles with the way Ceph handles  
logging and I cannot figured out how to solve them. These issues  
are basically the following:


- The log config options are utterly confusing and very badly documented
- Mon file logs are spammed with DBG-level cluster logs, no matter  
what I configure
- syslog/journald gets only part of the messages, but file logs  
must be turned off due to the spam above

- "audit" channel logs cannot be configured at all

From this, my following needs and questions arise, perhaps you can help:

- I want to receive cluster and audit logs on the MONs with level  
"warn" or at most "info"

- I want everything to go to journald
- Where to cluster logs go if "clog_to_monitors" is off?
- What's the difference between the "mon_cluster_log_*" and  
"clog_*" settings?
- What the hell does "mon_cluster_log_to_syslog_facility" do and  
what does "audit=local0" mean or do?


A very annoying symptom of the wonky logging config is that I  
cannot debug the infamous "LARGE_OMAP_OBJECTS" warning. It says  
"Search the cluster log for 'Large omap object found' for more  
details.", but I cannot do that, because without enabling the  
file-logging flood gates, I never receive the required cluster log  
info at the monitors and there seems to be no other way to debug  
this than to grep the cluster log (why??).


My current log config is the following:

global    advanced  
clog_to_monitors true
global    basic  
err_to_syslog    true
global    basic  
log_to_file  false
global    basic  
log_to_stderr    false
global    basic  
log_to_syslog    true
mon   advanced  
mon_cluster_log_file /dev/null
mon   advanced  
mon_cluster_log_to_file  false
mon   advanced  
mon_cluster_log_to_stderr    false
mon   advanced  
mon_cluster_log_to_syslog    true
mon   advanced  
mon_cluster_log_to_syslog_facility   daemon
mon   advanced  
mon_cluster_log_to_syslog_level  warn



Any help solving this conundrum is much appreciated. Thanks!

Janek



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-09 Thread Adam King
orch approved

On Mon, Aug 5, 2024 at 4:33 PM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/67340#note-1
>
> Release Notes - N/A
> LRC upgrade - N/A
> Gibba upgrade -TBD
>
> Seeking approvals/reviews for:
>
> rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being
> tested and will be cherry-picked when ready)
>
> rgw - Eric, Adam E
> fs - Venky
> orch - Adam King
> rbd, krbd - Ilya
>
> quincy-x, reef-x - Laura, Neha
>
> powercycle - Brad
> crimson-rados - Matan, Samuel
>
> ceph-volume - Guillaume
>
> Pls let me know if any tests were missed from this list.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io