On 14.10.19 16:31, Nikola Ciprich wrote:
On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
Probably same problem here. When I try to add another MON, "ceph
health" becomes mostly unresponsive. One of the existing ceph-mon
processes uses 100% CPU for several minutes. Tri
Probably same problem here. When I try to add another MON, "ceph health"
becomes mostly unresponsive. One of the existing ceph-mon processes uses
100% CPU for several minutes. Tried it on 2 test clusters (14.2.4, 3
MONs, 5 storage nodes with around 2 hdd osds each). To avoid errors like
"lease
Right now our main focus is on the Veeam use case (VMWare backup), used
with an S3 storage tier. Currently we host a bucket with 125M objects
and one with 100M objects.
As Paul stated, searching common prefixes can be painful. We had some
cases that did not work (taking too much time, radosgw
mably due to rocksdb compaction.)
Matt
On Tue, Jul 9, 2019 at 7:12 AM Harald Staub wrote:
Currently removing a bucket with a lot of objects:
radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects
This process was killed by the out-of-memory killer. Then looking at the
graphs,
Currently removing a bucket with a lot of objects:
radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects
This process was killed by the out-of-memory killer. Then looking at the
graphs, we see a continuous increase of memory usage for this process,
about +24 GB per day. Removal r
There are customers asking for 500 million objects in a single object
storage bucket (i.e. 5000 shards), but also more. But we found some
places that say that there is a limit in the number of shards per
bucket, e.g.
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/obj
nough
free space for the compaction after the large omaps were removed?
-- dan
On Mon, Jun 17, 2019 at 11:14 AM Harald Staub wrote:
We received the large omap warning before, but for some reasons we could
not react quickly. We accepted the risk of the bucket becoming slow, but
had not thou
beginning -- is it clear to anyone what was the
root cause and how other users can avoid this from happening? Maybe
some better default configs to warn users earlier about too-large
omaps?
Cheers, Dan
On Thu, Jun 13, 2019 at 7:36 PM Harald Staub wrote:
Looks fine (at least so far), thank you all
gone after deep-scrubbing the PG.
Then we set the 3 OSDs to out. Soon after, one after the other was down
(maybe for 2 minutes) and we got degraded PGs, but only once.
Thank you!
Harry
On 13.06.19 16:14, Sage Weil wrote:
On Thu, 13 Jun 2019, Harald Staub wrote:
On 13.06.19 15:52, Sage Weil
On 13.06.19 15:52, Sage Weil wrote:
On Thu, 13 Jun 2019, Harald Staub wrote:
[...]
I think that increasing the various suicide timeout options will allow
it to stay up long enough to clean up the ginormous objects:
ceph config set osd.NNN osd_op_thread_suicide_timeout 2h
ok
It looks
the other OSDs
"somehow"? In case of success, we would bring back the other OSDs as well?
OTOH we could try to continue with the key dump from earlier today.
Any opinions?
Thanks!
Harry
On 13.06.19 09:32, Harald Staub wrote:
On 13.06.19 00:33, Sage Weil wrote:
[...]
One other th
On 13.06.19 00:33, Sage Weil wrote:
[...]
One other thing to try before taking any drastic steps (as described
below):
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-NNN fsck
This gives: fsck success
and the large alloc warnings:
tcmalloc: large alloc 2145263616 bytes == 0x562412e1
On 13.06.19 00:29, Sage Weil wrote:
On Thu, 13 Jun 2019, Simon Leinen wrote:
Sage Weil writes:
2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families:
[default]
Unrecognized command: stats
ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356:
rocksdb::Ve
On 12.06.19 17:40, Sage Weil wrote:
On Wed, 12 Jun 2019, Harald Staub wrote:
Also opened an issue about the rocksdb problem:
https://tracker.ceph.com/issues/40300
Thanks!
The 'rocksdb: Corruption: file is too short' the root of the problem
here. Can you try starting th
Also opened an issue about the rocksdb problem:
https://tracker.ceph.com/issues/40300
On 12.06.19 16:06, Harald Staub wrote:
We ended in a bad situation with our RadosGW (Cluster is Nautilus
14.2.1, 350 OSDs with BlueStore):
1. There is a bucket with about 60 million objects, without shards
We ended in a bad situation with our RadosGW (Cluster is Nautilus
14.2.1, 350 OSDs with BlueStore):
1. There is a bucket with about 60 million objects, without shards.
2. radosgw-admin bucket reshard --bucket $BIG_BUCKET --num-shards 1024
3. Resharding looked fine first, it counted up to the n
As mentioned here recently, the sizing recommendations for BlueStore
have been updated:
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
In our ceph cluster, we have some ratios that are much lower, like 20GB
of SSD (WAL and DB) per 7TB of spinning space. This s
Hi Brad
Thank you very much for your attention.
On 07.03.2018 23:46, Brad Hubbard wrote:
On Thu, Mar 8, 2018 at 1:22 AM, Harald Staub wrote:
"ceph pg repair" leads to:
5.7bd repair 2 errors, 0 fixed
Only an empty list from:
rados list-inconsistent-obj 5.7bd --format=json-pretty
I
"ceph pg repair" leads to:
5.7bd repair 2 errors, 0 fixed
Only an empty list from:
rados list-inconsistent-obj 5.7bd --format=json-pretty
Inspired by http://tracker.ceph.com/issues/12577 , I tried again with
more verbose logging and searched the osd logs e.g. for "!=",
"mismatch", could not fi
19 matches
Mail list logo