Hi,

we just set up 2 new ceph clusters (using rook). To do some processing of the 
user activity we configured a topic that sends events to Kafka.

After 5-12 hours this stops working with a 503 SlowDown response:
debug 2024-08-02T09:17:58.205+0000 7ff4359ad700 1 req 13681579273117692719 
0.005000019s ERROR: failed to reserve notification on queue: private.rgw. 
error: -28

First thought would be that the queue is full but up to this point see messages 
coming into Kafka and without much activity on the RGW itself (only a few 
requests against the S3 API) so it can’t be a load issue.

What helps is to remove the notification configuration on the buckets 
(put-bucket-notification-configuration). If we directly re-add the previous 
notification configuration it also continuous working for a few hours before 
failing again with the same error/behaviour.

We haven’t been able to reproduce this if we disable persistence for the topic 
so it looks like it is related to the persistence option - otherwise there 
would be also no queuing of the event for sending to Kafka.
This also suggests that the issue is not with Kafka - this is also what we 
suspected first e.g. it can’t handle the amount of messages etc.

Does anyone else have or had this issue and found the cause or a suggestion on 
how to best continue debugging? Are there detailed metrics etc. on the size and 
usage of the event queue?


Here is the configuration for the topic and for a bucket:

$ radosgw-admin topic list
{
    "topics": [
        {
            "user": "",
            "name": "private.rgw",
            "dest": {
                "push_endpoint": 
"kafka://rgw-sasl-kafka-user:x...@kafka-kafka-bootstrap.kafka.svc:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512",
                "push_endpoint_args": 
"OpaqueData=&Version=2010-03-31&kafka-ack-level=broker&persistent=false&push-endpoint=kafka://rgw-sasl-kafka-user:x...@kafka-kafka-bootstrap.kafka.svc:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512&use-ssl=true&verify-ssl=true",
                "push_endpoint_topic": "private.rgw",
                "stored_secret": true,
                "persistent": true
            },
            "arn": "arn:aws:sns:ceph-objectstore::private.rgw",
            "opaqueData": ""
        }
    ]
}

$ aws s3api get-bucket-notification-configuration --bucket=XXX
{
    "TopicConfigurations": [
        {
            "Id": “my-id",
            "TopicArn": "arn:aws:sns:ceph-objectstore::private.rgw",
            "Events": [
                "s3:ObjectCreated:*",
                "s3:ObjectRemoved:*"
            ]
        }
    ]
}


Thank you for any input to solve this!


Cheers,
Florian
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to