Re: [ceph-users] ceph balancer - Some osds belong to multiple subtrees

2019-06-27 Thread Wolfgang Lendl
thx Paul - I suspect these shadow trees causing this misbehaviour. I have a second luminous cluster where these balancer settings work as expected - this working one has hdd+ssd osds i cannot use the upmap balancer because of some jewel-krbd clients - at least they are being reported as jewel c

[ceph-users] What does the differences in osd benchmarks mean?

2019-06-27 Thread Lars Täuber
Hi! In our cluster I ran some benchmarks. The results are always similar but strange to me. I don't know what the results mean. The cluster consists of 7 (nearly) identical hosts for osds. Two of them have one an additional hdd. The hdds are from identical type. The ssds for the journal and wal a

[ceph-users] ceph zabbix monitoring

2019-06-27 Thread Majid Varzideh
Hi friends i have installed ceph mimic with zabbix 3.0. i configured everything to monitor my cluster with zabbix and i could get data from zabbix frontend. but in ceph -s command it says Failed to send data to Zabbix. why this happen? my ceph version :ceph version 13.2.6 (7b695f835b03642f85998b2a

Re: [ceph-users] ceph zabbix monitoring

2019-06-27 Thread Nathan Harper
Have you configured any encryption on your Zabbix infrastructure? We took a brief look at ceph+Zabbix a while ago, and the exporter didn't have the capability to use encryption. I don't know if it's changed in the meantime though. On Thu, 27 Jun 2019 at 09:43, Majid Varzideh wrote: > Hi frie

[ceph-users] details about cloning objects using librados

2019-06-27 Thread nokia ceph
Hi Team, We have a requirement to create multiple copies of an object and currently we are handling it in client side to write as separate objects and this causes huge network traffic between client and cluster. Is there possibility of cloning an object to multiple copies using librados api? Pleas

[ceph-users] Ceph-volume ignores cluster name from ceph.conf

2019-06-27 Thread Stolte, Felix
Hi folks, I have a nautilus 14.2.1 cluster with a non-default cluster name (ceph_stag instead of ceph). I set “cluster = ceph_stag” in /etc/ceph/ceph_stag.conf. ceph-volume is using the correct config file but does not use the specified clustername. Did I hit a bug or do I need to define the cl

Re: [ceph-users] Ceph-volume ignores cluster name from ceph.conf

2019-06-27 Thread Alfredo Deza
Although ceph-volume does a best-effort to support custom cluster names, the Ceph project does not support custom cluster names anymore even though you can still see settings/options that will allow you to set it. For reference see: https://bugzilla.redhat.com/show_bug.cgi?id=1459861 On Thu, Jun

[ceph-users] osd-mon failed with "failed to write to db"

2019-06-27 Thread Anton Aleksandrov
Hello community, we have developed a cluster on latest mimic release. We are on quite old hardware, but using Centos7. Monitor, manager and all the same host. Cluster has been running for some week without actual workload. There might have been some sort of power failure (not proved), but at s

Re: [ceph-users] pgs incomplete

2019-06-27 Thread ☣Adam
Well that caused some excitement (either that or the small power disruption did)! One of my OSDs is now down because it keeps crashing due to a failed assert (stacktraces attached, also I'm apparently running mimic, not luminous). In the past a failed assert on an OSD has meant removing the disk,

[ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
All; I built a demonstration and testing cluster, just 3 hosts (10.0.200.110, 111, 112). Each host runs mon, mgr, osd, mds. During the demonstration yesterday, I pulled the power on one of the hosts. After bringing the host back up, I'm getting several error messages every second or so: 2019-

Re: [ceph-users] What does the differences in osd benchmarks mean?

2019-06-27 Thread Nathan Fish
Are these dual-socket machines? Perhaps NUMA is involved? On Thu., Jun. 27, 2019, 4:56 a.m. Lars Täuber, wrote: > Hi! > > In our cluster I ran some benchmarks. > The results are always similar but strange to me. > I don't know what the results mean. > The cluster consists of 7 (nearly) identical

Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread Eugen Block
Hi, some more information about the cluster status would be helpful, such as ceph -s ceph osd tree service status of all MONs, MDSs, MGRs. Are all services up? Did you configure the spare MDS as standby for rank 0 so that a failover can happen? Regards, Eugen Zitat von dhils...@performair

Re: [ceph-users] pgs incomplete

2019-06-27 Thread Alfredo Deza
On Thu, Jun 27, 2019 at 10:36 AM ☣Adam wrote: > Well that caused some excitement (either that or the small power > disruption did)! One of my OSDs is now down because it keeps crashing > due to a failed assert (stacktraces attached, also I'm apparently > running mimic, not luminous). > > In the

Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
Eugen; All services are running, yes, though they didn't all start when I brought the host up (configured not to start because the last thing I had done is physically relocate the entire cluster). All services are running, and happy. # ceph status cluster: id: 1a8a1693-fa54-4cb3-89d2

Re: [ceph-users] Cannot delete bucket

2019-06-27 Thread Sergei Genchev
@David Turner Did your bucket delete ever finish? I am up to 35M incomplete uploads, and I doubt that I actually had that many upload attempts. I could be wrong though. Is there a way to force bucket deletion, even at the cost of not cleaning up space? On Tue, Jun 25, 2019 at 12:29 PM J. Eric Ivan

Re: [ceph-users] Cannot delete bucket

2019-06-27 Thread David Turner
I'm still going at 452M incomplete uploads. There are guides online for manually deleting buckets kinda at the RADOS level that tend to leave data stranded. That doesn't work for what I'm trying to do so I'll keep going with this and wait for that PR to come through and hopefully help with bucket d

[ceph-users] How does monitor know OSD is dead?

2019-06-27 Thread Bryan Henderson
What does it take for a monitor to consider an OSD down which has been dead as a doornail since the cluster started? A couple of times, I have seen 'ceph status' report an OSD was up, when it was quite dead. Recently, a couple of OSDs were on machines that failed to boot up after a power failure.

Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-27 Thread Hector Martin
On 12/06/2019 22.33, Yan, Zheng wrote: > I have tracked down the bug. thank you for reporting this. 'echo 2 > > /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph > from source, please try following patch. I managed to get the packages built for Xenial properly and tested and

Re: [ceph-users] details about cloning objects using librados

2019-06-27 Thread Brad Hubbard
On Thu, Jun 27, 2019 at 8:58 PM nokia ceph wrote: > > Hi Team, > > We have a requirement to create multiple copies of an object and currently we > are handling it in client side to write as separate objects and this causes > huge network traffic between client and cluster. > Is there possibility

Re: [ceph-users] What does the differences in osd benchmarks mean?

2019-06-27 Thread Lars Täuber
Hi Nathan, yes the osd hosts are dual-socket machines. But does this make such difference? osd.0: bench: wrote 1 GiB in blocks of 4 MiB in 15.0133 sec at 68 MiB/sec 17 IOPS osd.1: bench: wrote 1 GiB in blocks of 4 MiB in 6.98357 sec at 147 MiB/sec 36 IOPS Doubling the IOPS? Thanks, Lars Thu