[ceph-users] Re: RGW: HEAD ok but GET fails
Hello, Did the customer deleted the object by any chance? If yes, could this be related to https://tracker.ceph.com/issues/63935 ? We got a scenario where an application was doing some DELETE and then listing bucket entries. It was able to find objects that should have been deleted and then was trying to GET them without success. Regards, Mathias Chapelain Storage Engineer Proton AG On Friday, August 9th, 2024 at 08:54, Eugen Block wrote: > Hi, > > I'm trying to help a customer with a RGW question, maybe someone here > can help me out. Their S3 application reports errors every now and > then, and it is complaining about missing objects. This is what the > RGW logs: > > [08/Aug/2024:08:23:47.540 +] "HEAD > /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 > HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 > .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 > ClientSync" - > latency=0.00392s > > [08/Aug/2024:08:23:47.552 +] "GET > /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 > HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 > .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 > ClientSync" bytes=0-2097151 > latency=0.00392s > > So apparently, it can successfully query the HEAD, but the GET request > shows 404. We can confirm that the queried object indeed doesn't exist > in the data pool. But the object metadata must have been written > successfully, apparently. Unfortunately, we don't have enough logs to > find the corresponding PUT request, they just increased the retention > days for logrotate to be able to inspect when it happens the next > time. But my question is, should they see some metadata in the > listomapkeys/listomapvals output in the index pool? > The docs [0] state this about Index Transactions: > > > Because the head objects are stored in different rados objects than > > the bucket indices, we can’t update both atomically with a single > > rados operation. In order to satisfy the Consistency Guarantee for > > listing operations, we have to coordinate these two object writes > > using a three-step bucket index transaction: > > > > 1. Prepare a transaction on its bucket index object. > > 2. Write or delete the head object. > > 3. Commit the transaction on the bucket index object (or cancel the > > transaction if step 2 fails). > > > > Object writes and deletes may race with each other, so a given > > object may have more than one prepared transaction at a time. RGW > > considers an object entry to be ‘pending’ if there are any > > outstanding transactions, or ‘completed’ otherwise. > > > Could this be such a race condition which "just happens" from time to > time? Or can this somehow be prevented from happening? Because right > now the clenaup process is a bit complicated application-wise. > I'm not the most experienced RGW user, so I'd be grateful for any > pointers here. > > Thanks! > Eugen > > [0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW: HEAD ok but GET fails
That's interesting, thanks for the link to the tracker issue. There's definitely a chance that it could have been deleted (by the application), but we don't have enough logs right now to confirm. They don't have many insights into the application, so it can be difficult to get to the bottom of this. I'll keep an eye on it though, because it will most likely happen again, but it's not that frequently. So hopefully with more logs we can debug a bit better. Thanks! Eugen Zitat von Mathias Chapelain : Hello, Did the customer deleted the object by any chance? If yes, could this be related to https://tracker.ceph.com/issues/63935 ? We got a scenario where an application was doing some DELETE and then listing bucket entries. It was able to find objects that should have been deleted and then was trying to GET them without success. Regards, Mathias Chapelain Storage Engineer Proton AG On Friday, August 9th, 2024 at 08:54, Eugen Block wrote: Hi, I'm trying to help a customer with a RGW question, maybe someone here can help me out. Their S3 application reports errors every now and then, and it is complaining about missing objects. This is what the RGW logs: [08/Aug/2024:08:23:47.540 +] "HEAD /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" - latency=0.00392s [08/Aug/2024:08:23:47.552 +] "GET /hchsarchiv/2024061326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" bytes=0-2097151 latency=0.00392s So apparently, it can successfully query the HEAD, but the GET request shows 404. We can confirm that the queried object indeed doesn't exist in the data pool. But the object metadata must have been written successfully, apparently. Unfortunately, we don't have enough logs to find the corresponding PUT request, they just increased the retention days for logrotate to be able to inspect when it happens the next time. But my question is, should they see some metadata in the listomapkeys/listomapvals output in the index pool? The docs [0] state this about Index Transactions: > Because the head objects are stored in different rados objects than > the bucket indices, we can’t update both atomically with a single > rados operation. In order to satisfy the Consistency Guarantee for > listing operations, we have to coordinate these two object writes > using a three-step bucket index transaction: > > 1. Prepare a transaction on its bucket index object. > 2. Write or delete the head object. > 3. Commit the transaction on the bucket index object (or cancel the > transaction if step 2 fails). > > Object writes and deletes may race with each other, so a given > object may have more than one prepared transaction at a time. RGW > considers an object entry to be ‘pending’ if there are any > outstanding transactions, or ‘completed’ otherwise. Could this be such a race condition which "just happens" from time to time? Or can this somehow be prevented from happening? Because right now the clenaup process is a bit complicated application-wise. I'm not the most experienced RGW user, so I'd be grateful for any pointers here. Thanks! Eugen [0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool
Hi Chulin, When it comes to data consistency, it's generally admitted that Ceph is an undefeated master. Considering the very few (~100) rados objects that were completely lost (data and metadata) and the fact that you're using colocated HDD OSDs with volatile disk buffers caching rocksdb metadata and Bluestore data and metadata, I doubt that volatile disk buffers weren't involved in the data loss, whatever the logs say or don't say about which of the 6 over 9 OSDs were in the acting set at the moment of the power outage. Unless you're ok with facing data loss again, I'd advise you fix the initial design flaws if you can. Like stop using non-persistent cache / buffers along the IO path, raise mon_size to k+1 and reconsider data placement in regards to risks of network partitioning, power outage, fire. Also, considering the ceph status, make sure you don't run out of disk space. Best regards, Frédéric. De : Best Regards Envoyé : jeudi 8 août 2024 11:32 À : Frédéric Nass Cc: ceph-users Objet : Re:Re: Re:[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool Hi,Frédéric Nass Sorry, I may not have expressed it clearly before. The epoch and OSD up/down timeline was extracted and merged from the 9 OSD logs. I analyzed the PG (9.11b6) peering process. OSD 494, 1169, 1057 fully recorded the down/up of other OSDs. I also checked the logs of the other 6 OSDs. The role conversion during peering was expected and no abnormalities were found. I also checked the status of the monitor. One of the 5 monitors lost power and was powered on after about 40 minutes. The log showed that its rank value was relatively large and it did not become the leader. Let's talk about the fault domain. The fault domain we set is the host level, but in fact all hosts are distributed in 2 buildings, but the original designer did not consider the fault level of the building. In this case, the OSD may have a brain split, but from the log, it does not happen. Best regards. Best Regards wu_chu...@qq.com Best Regards Original Email From:"Frédéric Nass"< frederic.n...@univ-lorraine.fr >; Sent Time:2024/8/8 15:40 To:"Best Regards"< wu_chu...@qq.com >; Cc recipient:"ceph-users"< ceph-users@ceph.io >; Subject:Re: Re:[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool ceph osd pause is a lot of constraints from an operational perspective. :-) host uptime and service running time is a thing. But it doesn't mean that these 3 OSDs were in the acting set when the power outage occured. Since OSDs 494, 1169 and 1057 did not crash, I assume they're in the same failure domain. Is that right? Being isolated along with their local MON(s) from other MONs and other 6 OSDs, there's a fair chance that any of the 6 other OSDs in other failure domains took the lead, sent 5 chunks around and acknowledged the write to RGW client. Then all of them crashed. Your thoughts? Frédéric. De : Best Regards https://tracker.ceph.com/issues/66942, it includes the original logs needed for troubleshooting. However, four days have passed without any response. In desperation, we are sending this email, hoping that someone from the Ceph team can guide us as soon as possible. We are currently in a difficult situation and hope you can provide guidance. Thank you. Best regards. wu_chu...@qq.com wu_chu...@qq.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph Logging Configuration and "Large omap objects found"
Hi, I have a bunch of long-standing struggles with the way Ceph handles logging and I cannot figured out how to solve them. These issues are basically the following: - The log config options are utterly confusing and very badly documented - Mon file logs are spammed with DBG-level cluster logs, no matter what I configure - syslog/journald gets only part of the messages, but file logs must be turned off due to the spam above - "audit" channel logs cannot be configured at all From this, my following needs and questions arise, perhaps you can help: - I want to receive cluster and audit logs on the MONs with level "warn" or at most "info" - I want everything to go to journald - Where to cluster logs go if "clog_to_monitors" is off? - What's the difference between the "mon_cluster_log_*" and "clog_*" settings? - What the hell does "mon_cluster_log_to_syslog_facility" do and what does "audit=local0" mean or do? A very annoying symptom of the wonky logging config is that I cannot debug the infamous "LARGE_OMAP_OBJECTS" warning. It says "Search the cluster log for 'Large omap object found' for more details.", but I cannot do that, because without enabling the file-logging flood gates, I never receive the required cluster log info at the monitors and there seems to be no other way to debug this than to grep the cluster log (why??). My current log config is the following: global advanced clog_to_monitors true global basic err_to_syslog true global basic log_to_file false global basic log_to_stderr false global basic log_to_syslog true mon advanced mon_cluster_log_file /dev/null mon advanced mon_cluster_log_to_file false mon advanced mon_cluster_log_to_stderr false mon advanced mon_cluster_log_to_syslog true mon advanced mon_cluster_log_to_syslog_facility daemon mon advanced mon_cluster_log_to_syslog_level warn Any help solving this conundrum is much appreciated. Thanks! Janek smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"
Hi, I don't have much to comment about logging, I feel you though. I just wanted to point out that the details about the large omap object should be in the (primary) OSD log, not in the MON log: grep -i "large omap" /var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.* /var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.1.log:2024-08-09T11:21:23.943+ 7ffb66d10700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0) Key count: 363 Size (bytes): 2070 Regards, Eugen Zitat von Janek Bevendorff : Hi, I have a bunch of long-standing struggles with the way Ceph handles logging and I cannot figured out how to solve them. These issues are basically the following: - The log config options are utterly confusing and very badly documented - Mon file logs are spammed with DBG-level cluster logs, no matter what I configure - syslog/journald gets only part of the messages, but file logs must be turned off due to the spam above - "audit" channel logs cannot be configured at all From this, my following needs and questions arise, perhaps you can help: - I want to receive cluster and audit logs on the MONs with level "warn" or at most "info" - I want everything to go to journald - Where to cluster logs go if "clog_to_monitors" is off? - What's the difference between the "mon_cluster_log_*" and "clog_*" settings? - What the hell does "mon_cluster_log_to_syslog_facility" do and what does "audit=local0" mean or do? A very annoying symptom of the wonky logging config is that I cannot debug the infamous "LARGE_OMAP_OBJECTS" warning. It says "Search the cluster log for 'Large omap object found' for more details.", but I cannot do that, because without enabling the file-logging flood gates, I never receive the required cluster log info at the monitors and there seems to be no other way to debug this than to grep the cluster log (why??). My current log config is the following: global advanced clog_to_monitors true global basic err_to_syslog true global basic log_to_file false global basic log_to_stderr false global basic log_to_syslog true mon advanced mon_cluster_log_file /dev/null mon advanced mon_cluster_log_to_file false mon advanced mon_cluster_log_to_stderr false mon advanced mon_cluster_log_to_syslog true mon advanced mon_cluster_log_to_syslog_facility daemon mon advanced mon_cluster_log_to_syslog_level warn Any help solving this conundrum is much appreciated. Thanks! Janek ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"
I forgot to add this one to get the info from any admin node: ceph log last 10 warn cluster 2024-08-09T11:21:23.949916+ osd.1 (osd.1) 6 : cluster [WRN] Large omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0) Key count: 363 Size (bytes): 2070 2024-08-09T11:21:27.723959+ mon.soc9-ceph (mon.0) 11905 : cluster [WRN] Health check failed: 1 large omap objects (LARGE_OMAP_OBJECTS) Then you don't have to search each node for logs. Zitat von Eugen Block : Hi, I don't have much to comment about logging, I feel you though. I just wanted to point out that the details about the large omap object should be in the (primary) OSD log, not in the MON log: grep -i "large omap" /var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.* /var/log/ceph/bce93c48-5552-11ef-8ba9-fa163e2ad8c5/ceph-osd.1.log:2024-08-09T11:21:23.943+ 7ffb66d10700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 3:592df674:::file:head PG: 3.2e6fb49a (3.0) Key count: 363 Size (bytes): 2070 Regards, Eugen Zitat von Janek Bevendorff : Hi, I have a bunch of long-standing struggles with the way Ceph handles logging and I cannot figured out how to solve them. These issues are basically the following: - The log config options are utterly confusing and very badly documented - Mon file logs are spammed with DBG-level cluster logs, no matter what I configure - syslog/journald gets only part of the messages, but file logs must be turned off due to the spam above - "audit" channel logs cannot be configured at all From this, my following needs and questions arise, perhaps you can help: - I want to receive cluster and audit logs on the MONs with level "warn" or at most "info" - I want everything to go to journald - Where to cluster logs go if "clog_to_monitors" is off? - What's the difference between the "mon_cluster_log_*" and "clog_*" settings? - What the hell does "mon_cluster_log_to_syslog_facility" do and what does "audit=local0" mean or do? A very annoying symptom of the wonky logging config is that I cannot debug the infamous "LARGE_OMAP_OBJECTS" warning. It says "Search the cluster log for 'Large omap object found' for more details.", but I cannot do that, because without enabling the file-logging flood gates, I never receive the required cluster log info at the monitors and there seems to be no other way to debug this than to grep the cluster log (why??). My current log config is the following: global advanced clog_to_monitors true global basic err_to_syslog true global basic log_to_file false global basic log_to_stderr false global basic log_to_syslog true mon advanced mon_cluster_log_file /dev/null mon advanced mon_cluster_log_to_file false mon advanced mon_cluster_log_to_stderr false mon advanced mon_cluster_log_to_syslog true mon advanced mon_cluster_log_to_syslog_facility daemon mon advanced mon_cluster_log_to_syslog_level warn Any help solving this conundrum is much appreciated. Thanks! Janek ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: squid 19.1.1 RC QE validation status
orch approved On Mon, Aug 5, 2024 at 4:33 PM Yuri Weinstein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/67340#note-1 > > Release Notes - N/A > LRC upgrade - N/A > Gibba upgrade -TBD > > Seeking approvals/reviews for: > > rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being > tested and will be cherry-picked when ready) > > rgw - Eric, Adam E > fs - Venky > orch - Adam King > rbd, krbd - Ilya > > quincy-x, reef-x - Laura, Neha > > powercycle - Brad > crimson-rados - Matan, Samuel > > ceph-volume - Guillaume > > Pls let me know if any tests were missed from this list. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io