[ceph-users] v15.2.3 Octopus released

2020-05-29 Thread Josh Durgin
We’re happy to announce the availability of the third Octopus stable release series. This release mainly is a workaround for a potential OSD corruption in v15.2.2. We advise users to upgrade to v15.2.3 directly. For users running v15.2.2 please execute the following:: ceph config set osd blue

[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
Jarett; It is and it isn't. Replication can be thought of as continuous backups. Backups, especially as SpiderFox is suggesting, are point-in-time, immutable copies of data. Until they are written over, they don't change, even if the data does. In Ceph's RadosGW (RGW) multi-site replication

[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread Jarett DeAngelis
For some reason I’d thought replication between clusters was an “official” method of backing up. > On May 29, 2020, at 4:31 PM, > wrote: > > Ludek; > > As a cluster system, Ceph isn't really intended to be backed up. It's > designed to take quite a beating, and preserve your data. > > Fro

[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
SpiderFox; If you're concerned about ransomware (and you should be), then you should: a) protect the cluster from the internet AND from USERS. b) place another technology between your cluster and your users (I use Nextcloud backed by RadosGW through S3 buckets) c) turn on versioning in your bucke

[ceph-users] rocksdb tuning

2020-05-29 Thread Frank R
Hi all, I am attempting to prevent bluestore rocksdb Level 3/4 spillover with a 150GB logical volume for the db/wal. I am thinking of setting max_bytes_for_level_base to about 1.3G (1342177280). This should let Level 3 fill up the 150GB logical volume. I don't expect to ever actually need L4. An

[ceph-users] Re: General question CephFS or RBD

2020-05-29 Thread DHilsbos
Willi; ZFS on RBD seems like a waste, and overkill. A redundant storage solution on top of a redundant storage solution? You can have multiple file systems within CephFS, the thing to note is that each CephFS MUST have a SEPARATE active MDS. For failover, each should have a secondary MDS, and

[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread Coding SpiderFox
Am Fr., 29. Mai 2020 um 23:32 Uhr schrieb : > Ludek; > > As a cluster system, Ceph isn't really intended to be backed up. It's > designed to take quite a beating, and preserve your data. > > But that does not save me when a crypto trojan encrypts all my data. There should always be an offline bac

[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
Ludek; As a cluster system, Ceph isn't really intended to be backed up. It's designed to take quite a beating, and preserve your data. From a broader disaster recovery perspective, here's how I architected my clusters: Our primary cluster is laid out in such a way that an entire rack can fail

[ceph-users] Re: Ceph and iSCSI

2020-05-29 Thread DHilsbos
BR; I've built my own iSCSI targets (using Fedora and CentOS), and use them in production. I've also built 2 different Ceph clusters. They are completely different. Set aside everything you know about iSCSI, it doesn't apply. Ceph is a clustered object store, it can dynamically expand (nearl

[ceph-users] pg balancer plugin unresponsive

2020-05-29 Thread Philippe D'Anjou
Hi,the pg balancer is not working at all and if I call status the plugin does not respond, it just hangs forever. Mgr restart doesnt help.I have a PG distribution issue now, how to fix this?v 14.2.5 kind regards ___ ceph-users mailing list -- ceph-users

[ceph-users] Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-05-29 Thread Salsa
I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seem

[ceph-users] Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-05-29 Thread Salsa
I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seem

[ceph-users] ERROR: osd init failed: (1) Operation not permitted

2020-05-29 Thread Ml Ml
Hello List, first of all: Yes - i made mistakes. Now i am trying to recover :-/ I had a healthy 3 node cluster which i wanted to convert to a single one. My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes. I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluste

[ceph-users] OSD backups and recovery

2020-05-29 Thread Ludek Navratil
HI all, what is the best approach for OSD backups and recovery?  We use only Radosgw with S3 API and I need to backup the content of S3 buckets.  Currently I sync s3 buckets to local filesystem and backup the content using Amanda. I believe that there must a better way to do this but I couldn't fi

[ceph-users] Ceph I/O issues on all SSD cluster

2020-05-29 Thread Dennis Højgaard | Powerhosting Support
OK, where to start. I have been debugging intensively the last two days, but can't seem to wrap my head around the performance issues we see in one of our two hyperconverged (ceph) proxmox clusters. Let me introduce our two clusters and some of the debugging results. *1. Cluster for internal p

[ceph-users] Issue with cephfs

2020-05-29 Thread LeeQ @ BitBahn.io
We have been using a cephfs pool to store machine data to, the data is not overly critical at this time but. Its got to around 8TB and we started to see kernel panics with the hosts that had the mounts in place. Now when try to start the MDS's they cycle through, Active, Replay, ClientReplay

[ceph-users] CEPH HOLDING : un évènement à organiser ?

2020-05-29 Thread Groupe Partouche
Se désabonner des communications du groupe Partouche L'email ne s'affiche pas correctement ? Cliquez ici. Offre dédiée aux entreprises   Vous souhaitez organiser un séminaire, un cocktail, une soirée de lancement de produit ou tout autre évènement pour votre entreprise ? découvrez nos lieux

[ceph-users] Re: [ceph-users]: Ceph Nautius not working after setting MTU 9000

2020-05-29 Thread Anthony D'Atri
I’m pretty sure I’ve seen that happen with QFX5100 switches and net.core.netdev_max_backlog=25 net.ipv4.tcp_max_syn_backlog=10 net.ipv4.tcp_max_tw_buckets=200 > On May 29, 2020, at 10:53 AM, Dave Hall wrote: > > I agree with Paul 100%. Going further - there are many more 'knobs

[ceph-users] Is 2 osds per disk, encryption possible with cephadm on 15.2.2?

2020-05-29 Thread Marco Pizzolo
Hello Everyone, I'm still having issues getting the OSDs to properly create on a brand new Ceph 15.2.2 cluster. I don't see to be able to have OSDs created based on service definition of 2 osds per disk and encryption. It seems to hang and/or I see "No Deployments..." Has anyone had luck with t

[ceph-users] Re: [ceph-users]: Ceph Nautius not working after setting MTU 9000

2020-05-29 Thread Dave Hall
I agree with Paul 100%.  Going further - there are many more 'knobs to turn' than just Jumbo Frames, which makes the problem even harder.  Changing any one setting may just move the bottleneck, or possibly introduce instabilities.  In the worst case, one might tune their Linux system so well th

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Brian Topping
Phil, this would be an excellent contribution to the blog or the introductory documentation. I’ve been using Ceph for over a year this brought together a lot of concepts that I hadn’t related so succinctly in the past. One of the things that I hadn’t really conceptualized well was “why size of

[ceph-users] Re: Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-05-29 Thread Hans van den Bogert
What are the specs of your nodes? And what specific harddisks are you using? On Fri, May 29, 2020, 18:41 Salsa wrote: > I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 > replica rbd pool and some images and presented them to a Vmware host via > ISCSI, but the write perfo

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Simon Leinen
Dear Igor, thanks a lot for the analysis and recommendations. > Here is a brief analysis: > 1) Your DB is pretty large - 27GB at DB device (making it full) and > 279GB at main spinning one. I.e. RocksDB  is experiencing huge > spillover to slow main device - expect performance drop. And generall

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-05-29 Thread Brian Topping
Doesn’t SMB support require a paid subscription? > On Feb 13, 2020, at 3:12 AM, Martin Verges wrote: > > Hello Lars, > > we have full SMB Support in our Ceph management solution. You can create > simple (user+pass) or complex SMB (AD) high available shares on CTDB > clustered Samba with ease.

[ceph-users] Re: RBD Mirroring down+unknown

2020-05-29 Thread Jason Dillaman
On Fri, May 29, 2020 at 12:09 PM Miguel Castillo wrote: > Happy New Year Ceph Community! > > I'm in the process of figuring out RBD mirroring with Ceph and having a > really tough time with it. I'm trying to set up just one way mirroring > right now on some test systems (baremetal servers, all De

[ceph-users] Fwd: MDS Daemon Damaged

2020-05-29 Thread Ben
I have a 3 node ceph cluster for my house that I have been using for a few years now without issue. Each node is a MON, MGR, and MDS, and has 2-3 OSDs on them. It has, however been slow. I decided to finally move the bluestore DBs to SSDs. I did one OSD as a test case to make sure everything was go

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread DHilsbos
Phil; I like to refer to basic principles, and design assumptions / choices when considering things like this. I also like to refer to more broadly understood technologies. Finally; I'm still relatively new to Ceph, so here it goes... TLDR: Ceph is (likes to be) double-redundent (like RAID-6)

[ceph-users] Re: bluestore - rocksdb level sizes

2020-05-29 Thread Igor Fedotov
Yeah, it's been released in Octopus. But this is rather about better using DB space not adjusting rocksdb level sizes. See https://github.com/ceph/ceph/pull/29687 Thanks, Igor On 5/28/2020 7:18 PM, Frank R wrote: If I remember correctly, being able to configure the rocksdb level sizes was

[ceph-users] Virtual Ceph Days

2020-05-29 Thread Mike Perez
Hello, Unfortunately, due to the COVID-19 pandemic, the Ceph Foundation is looking into running future Ceph Days virtually. We have created a survey to gather feedback from the community on how these events should run. You can access the survey here: https://survey.zohopublic.com/zs/jsCsIn W

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Igor Fedotov
Simon, Harald thanks for the information. Got your log offline too. Here is a brief analysis: 1) Your DB is pretty large - 27GB at DB device (making it full) and 279GB at main spinning one. I.e. RocksDB  is experiencing huge spillover to slow main device - expect performance drop. And general

[ceph-users] Re: Ceph on CentOS 8?

2020-05-29 Thread Eric Goirand
Hello, What I can see here : http://download.ceph.com/rpm-octopus/el8/ is that the first Ceph release available on CentOS 8 is Octopus and is already accessible. Thanks, Eric. On Fri, May 29, 2020 at 5:44 PM Guillaume Abrioux wrote: > Hi Jan, > > I might be wrong but I don't think download.ceph.

[ceph-users] Re: ceph with rdma can not mount with kernel

2020-05-29 Thread Ilya Dryomov
On Fri, May 29, 2020 at 5:43 PM 李亚锋 wrote: > > hi: > > I deployed ceph cluster with rdma, it's version is "15.0.0-7282-g05d685d > (05d685dd37b34f2a015e77124c537f3f8e663152) octopus (dev)". > > the cluster status is ok as follows: > > [root@node83 lyf]# ceph -s > cluster: > id: cd389d63

[ceph-users] Re: rbd image naming convention

2020-05-29 Thread Jason Dillaman
On Fri, May 29, 2020 at 11:38 AM Palanisamy wrote: > Hello Team, > > Can I get any update on this request. > The Ceph team is not really involved in the out-of-tree rbd-provisioner. Both the in-tree and this out-of-tree RBD provisioner are deprecated to the ceph-csi [1][2] RBD provisioner. The c

[ceph-users] Ceph Erasure Coding - Stored vs used

2020-05-29 Thread Kristof Coucke
Hi all, I have an issue on my Ceph cluster. For one of my pools I have 107TiB STORED and 298TiB USED. This is strange, since I've configured erasure coding (6 data chunks, 3 coding chunks). So, in an ideal world this should result in approx. 160.5TiB USED. The question now is why this is the case

[ceph-users] Fwd: Finding erasure-code-profile of crush rule

2020-05-29 Thread David Seith
-- Forwarded message - From: David Seith Date: Wed, 12 Feb 2020 at 11:24 Subject: Finding erasure-code-profile of crush rule To: Dear all, On our ceph cluster we have created multiple erasure coding profiles and then created a number of crush rules for this profiles using: cep

[ceph-users] Re: MDS: obscene buffer_anon memory use when scanning lots of files

2020-05-29 Thread John Madden
Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested: ~$ ceph tell mds.mds1 heap stats 2020-02-10 16:52:44.313 7fbda2cae700 0 client.59208005 ms_handle_reset on v2:x.x.x.x:6800/3372494505 2020-02-10 16:52:44.337 7fbda3cb0700 0 client.59249562 ms_handle_reset on v2:x.x.x

[ceph-users] Reorganize crush map and replicated rules

2020-05-29 Thread 5 db S
Hi, I'd like to fix the crush tree and crush rule and would like to know the correct steps and worst case scenario what can happen during the maintenance. Steps should be like: 1. Create the rack structured crush tree under root default 2. create the replicated crush rules 3. Move the nodes under

[ceph-users] "mds daemon damaged" after restarting MDS - Filesystem DOWN

2020-05-29 Thread Luca Cervigni
Dear all Running nautilus 14.2.7. The data in the FS are important and cannot be lost. Today I increased the PGS of the volume pool from 8k to 16k. The active mds started reporting slow ops. (the filesystem is not in the volume pool). After few hours the FS was very slow, I reduced the backf

[ceph-users] Reorganize crush map and replicated rules

2020-05-29 Thread 5 db S
Hi, I'd like to fix the crush tree and crush rule and would like to know the correct steps and worst case scenario what can happen during the maintenance. Steps should be like: 1. Create the rack structured crush tree under root default 2. create the replicated crush rules 3. Move the nodes under

[ceph-users] Performance drops and low oss performance

2020-05-29 Thread quexian da
Hello, I'm a beginner on ceph. I set up some ceph clusters in google cloud. Cluster1 has three nodes and each node has three disks. Cluster2 has three nodes and each node has two disks. Cluster3 has five nodes and each node has five disks. Disk speed shown by `dd if=/dev/zero of=here bs=1G count=1

[ceph-users] General question CephFS or RBD

2020-05-29 Thread Willi Schiegel
Hello All, I have a HW RAID based 240 TB data pool with about 200 million files for users in a scientific institution. Data sizes range from tiny parameter files for scientific calculations and experiments to huge images of brain scans. There are group directories, home directories, Windows r

[ceph-users] CephFS writes cause system reboot

2020-05-29 Thread Ragan, Tj (Dr.)
Hello all, I have a problem with my CephFS that I’m stumped on. I recently had to rebuild a node who’s system disk failed. Once I did that, I re-created the osd directory structure in /var/lib/ceph/osd and the osds came back into the cluster, then had to backfill. However, I now have the pro

[ceph-users] Map osd to physical disk in a containerized RHCS

2020-05-29 Thread John Molefe
Hi all, I've been trying out RHCS 3.3 using Bluestore for a few months now, however I came across the challenge below... Can someone please assist with a guide on how to map and replace failed bluestore OSD disks in a containerized Rehat Ceph environment? Thanks in advance. Regards John. Vry

[ceph-users] Re: OSD crash after change of osd_memory_target

2020-05-29 Thread Martin Mlynář
Hi Igor, Dne 23. 01. 20 v 15:37 Igor Fedotov napsal(a): > > Martin, > > suggest a couple more checks: > > 1) Try different value(s) for memory target. Including one that is > equal to default 4Gb > I've tried many random numbers and even original value acquired by: # ceph daemon osd.0 config get

[ceph-users] Fwd: OSD crash after change of osd_memory_target

2020-05-29 Thread Martin Mlynář
Hi Igor, unfortunately same result: # ceph config dump WHO   MASK LEVEL OPTION    VALUE  RO   osd  basic osd_memory_target 2147483648    # /usr/bin/ceph-osd -d --cluster ceph --id 0 --setuser ceph --setgroup ceph 0> 2020-01-23 10:48:04.436 7fc61b5b5c80 -1 *** Caught sig

[ceph-users] Radosgw PubSub Traffic

2020-05-29 Thread Dustin Guerrero
Hey all, We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we no

[ceph-users] dpdk used issue in master

2020-05-29 Thread zheng...@cmss.chinamobile.com
Hello everyone If ms_type = async + dpdk, Ceph can work in the master version? I have a problem using dpdk in the master version. Mon cannot be started. Please hep me, thank you very much! 1、My network card configuration information: [root@ebs12 dpdk]# python usertools/dpdk-devbind.py --s

[ceph-users] bluestore_default_buffered_write = true

2020-05-29 Thread Adam Koczarski
Has anyone ever tried using this feature? I've added it to the [global] section of the ceph.conf on my POC cluster but I'm not sure how to tell if it's actually working. I did find a reference to this feature via Google and they had it in their [OSD] section?? I've tried that too.. TIA Adam

[ceph-users] where does 100% RBD utilization come from?

2020-05-29 Thread Philip Brown
oops. I posted this to the "Old" list, but supposedly this is the new list and the better place to ask questions? A google search didnt seem to find the answer on this, so thought I'd ask here: what determines if an rdb is "100% busy"? I have some backend OSDs, and an iSCSI gateway, serving out

[ceph-users] Ceph and iSCSI

2020-05-29 Thread Bobby
Hi all, I am new to Ceph. But I have a some good understanding of iSCSI protocol. I will dive into Ceph because it looks promising. I am particularly interested in Ceph-RBD. I have a request. Can you please tell me, if any, what are the common similarities between iSCSI and Ceph. If someone has to

[ceph-users] RBD Mirroring down+unknown

2020-05-29 Thread Miguel Castillo
Happy New Year Ceph Community! I'm in the process of figuring out RBD mirroring with Ceph and having a really tough time with it. I'm trying to set up just one way mirroring right now on some test systems (baremetal servers, all Debian 9). The first cluster is 3 nodes, and the 2nd cluster is 2

[ceph-users] Re: report librbd bug export-diff

2020-05-29 Thread zheng...@cmss.chinamobile.com
Thanks Jason, you are right. My code didn't update in time, The bug has fixed in https://tracker.ceph.com/issues/42248, zheng...@cmss.chinamobile.com From: Jason Dillaman Date: 2020-01-03 00:25 To: zheng...@cmss.chinamobile.com CC: yangjun; ceph-users Subject: Re: report librbd bug export-

[ceph-users] Access ceph cluster health from REST API

2020-05-29 Thread Vikram Giriraj
Hi, I am trying to get ceph cluster health and cluster status via REST api. My cluster is running the latest nautilus release(v14). I found that the "ceph-rest-api" is deprecated ( https://docs.ceph.com/docs/mimic/releases/luminous/ ) and now there is ceph RESTful module. But cluster health inform

[ceph-users] Re: Ceph and centos 8

2020-05-29 Thread 林浩
Hi I can't find el8 rpm package here. http://download.ceph.com/rpm-hammer/ Might I know when the rpm package will be ready for centos8 ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph radosgw failed to initialize

2020-05-29 Thread dayong tian
The radosgw can't start normally, the error in log file: --- 2019-12-20 14:37:04.058 7fd5b088f700 -1 Initialization timeout, failed to initialize 2019-12-20 14:37:04.304 7fe7148c0780 0 deferred set uid:gid to 167:167 (ceph:ceph) 2019-12-20 14:37:04.304 7fe7148c0780 0 c

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2020-05-29 Thread Paul Mezzanini
Based on what we've seen with perf, we think this is the relevant section. (attached is also the whole file) Thread: 73 (mgr-fin) - 1000 samples + 100.00% clone + 100.00% start_thread + 100.00% Finisher::finisher_thread_entry() + 99.40% Context::complete(int) | + 99.40% Fun

[ceph-users] Re: Ceph and centos 8

2020-05-29 Thread Mauro Ferraro - G2K Hosting
Hi, are the centos 8 packages for ceph ready to use?. Thanks. -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: list CephFS snapshots

2020-05-29 Thread Stephan Mueller
Hi Lars, > Is there a mean to list all snapshots existing in a (subdir of) > Cephfs? > I can't use the find dommand to look for the ".snap" dirs. You can, but you can't search for the '.snap' directories, you have to append them to the directory like `find $cephFsDir/.snap` but I it's better to u

[ceph-users] Re: RESEND: Re: PG Balancer Upmap mode not working

2020-05-29 Thread Thomas Schneider
Hello David, I'm experiencing issues with OSD balancing, too. My ceph cluster is running on release ceph version 14.2.4.1 (596a387fb278758406deabf997735a1f706660c9) nautilus (stable) Would you be able to test (the latest code) on my OSDmap and verify if balancing would work? I have attached it to

[ceph-users] Help! ceph-mon is blocked after shutting down and ip address changed

2020-05-29 Thread occj
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) os :CentOS Linux release 7.7.1908 (Core) single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used. ceph -s is blocked after shutting down the machine (192.168.0.104), then ip add

[ceph-users] Re: rbd_open_by_id crash when connection timeout

2020-05-29 Thread Dongsheng Yang
在 12/7/2019 3:56 PM, yang...@cmss.chinamobile.com 写道: Hi dongsheng https://tracker.ceph.com/issues/43178, Can I open a PR to fix it like you said?Thanks. Sure, feel free to open a PR to fix it. yang...@cmss.chinamobil

[ceph-users] Re: rbd_open_by_id crash when connection timeout

2020-05-29 Thread yang...@cmss.chinamobile.com
Hi dongsheng https://tracker.ceph.com/issues/43178 , Can I open a PR to fix it like you said?Thanks. yang...@cmss.chinamobile.com From: Jason Dillaman Date: 2019-12-06 22:58 To: Dongsheng Yang CC: yang...@cmss.chinamobile.com; ceph-users Subject: Re: rbd_open_by_id crash when connection time

[ceph-users] Re: rbd_open_by_id crash when connection timeout

2020-05-29 Thread Dongsheng Yang
在 12/6/2019 9:46 PM, Jason Dillaman 写道: On Fri, Dec 6, 2019 at 12:12 AM Dongsheng Yang wrote: On 12/06/2019 12:50 PM, yang...@cmss.chinamobile.com wrote: Hi Jason, dongsheng I found a problem using rbd_open_by_id when connection timeout(errno = 110, ceph version 12.2.8, there is no change

[ceph-users] Re: rbd_open_by_id crash when connection timeout

2020-05-29 Thread Dongsheng Yang
On 12/06/2019 12:50 PM, yang...@cmss.chinamobile.com wrote: Hi Jason, dongsheng I found a problem using rbd_open_by_id when connection timeout(errno = 110, ceph version 12.2.8, there is no change about rbd_open_by_id in master branch). int r = ictx->state->open(false); if (r < 0) { //

[ceph-users] rbd_open_by_id crash when connection timeout

2020-05-29 Thread yang...@cmss.chinamobile.com
Hi Jason, dongsheng I found a problem using rbd_open_by_id when connection timeout(errno = 110, ceph version 12.2.8, there is no change about rbd_open_by_id in master branch). int r = ictx->state->open(false); if (r < 0) { // r = -110 delete ictx; // crash,the stack is shown below: } e

[ceph-users] Re: Building a petabyte cluster from scratch

2020-05-29 Thread Jack
On 12/4/19 9:19 AM, Konstantin Shalygin wrote: > CephFS indeed support snapshots. Since Samba 4.11 support this feature > too with vfs_ceph_snapshots. You can snapshot, but you cannot export a diff of snapshots > ___ > ceph-users mailing list -- ceph-us

[ceph-users] Re: Ceph on CentOS 8?

2020-05-29 Thread Guillaume Abrioux
Hi Jan, I might be wrong but I don't think download.ceph.com provides RPMs that can be consumed using CentOS 8 at the moment. Internally, for testing ceph@master on CentOS8, we use RPMs hosted in chacra. Dimitri who has worked a bit on this topic might have more inputs. Thanks, *Guillaume Abrio

[ceph-users] ceph with rdma can not mount with kernel

2020-05-29 Thread 李亚锋
 hi: I deployed ceph cluster with rdma, it's version is "15.0.0-7282-g05d685d (05d685dd37b34f2a015e77124c537f3f8e663152) octopus (dev)". the cluster status is ok as follows: [root@node83 lyf]# ceph -s  cluster:    id: cd389d63-3eda-406b-8025-b26bba106d91    health: HEALTH_OK   service

[ceph-users] Re: Ceph on CentOS 8?

2020-05-29 Thread Sebastien Han
Guillaume, Dimitry, what does ceph-ansible do? Thanks! – Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Mon, Dec 2, 2019 at 11:16 AM Jan Kasprzak wrote: > > Hello, Ceph users, > > does anybody use Ceph on rec

[ceph-users] Re: ceph node crashed with these errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)

2020-05-29 Thread Luis Henriques
On Mon, Dec 02, 2019 at 10:27:21AM +0100, Marc Roos wrote: > > I have been asking before[1]. Since Nautilus upgrade I am having these, > with a total node failure as a result(?). Was not expecting this in my > 'low load' setup. Maybe now someone can help resolving this? I am also > waiting quit

[ceph-users] Re: cephfs worm feature

2020-05-29 Thread j j
Thanks for the information, I'll take a look at this pr and think it over. Jeff Layton 于2019年11月27日周三 下午6:50写道: > On Wed, 2019-11-27 at 15:14 +0800, j j wrote: > > Hi all, > > > >Recently I encountered a situation requires reliable file storage > with cephfs, and the point is those data is n

[ceph-users] Re: rbd image naming convention

2020-05-29 Thread Palanisamy
Hello Team, Can I get any update on this request. *Best Regards,* *Palanisamy* On Fri, Nov 22, 2019 at 1:53 PM Palanisamy wrote: > Hello Team, > > We've integrated Ceph cluster storage with Kubernetes and provisioning > volumes through rbd-provisioner. When we're creating volumes from yaml >

[ceph-users] ceph-fuse non-privileged user mount

2020-05-29 Thread yi zhang
Hi, Recently, I'm trying to mount cephfs using non-privileged users via ceph-fuse,but it‘s always fail. I looked at the code and found that there will be a remount operation when using ceph-fuse mount. Remount will execute the 'mount -i -o remount {mountpoint}' command and that causes the mount to

[ceph-users] Ceph manager not starting

2020-05-29 Thread Romain Raynaud
Hello please can you help me with that. I'm on ubuntu 18.04 bionic trying to install ceph version 12.2.12 with ceph-deploy to 3 nodes on odroid xu4 (armhf). Creating osd with lvm volumes seem's to work but after starting manager by the command "ceph-deploy mgr create node1", mgr go to starting mod

[ceph-users] rbd image naming convention

2020-05-29 Thread Palanisamy
Hello Team, We've integrated Ceph cluster storage with Kubernetes and provisioning volumes through rbd-provisioner. When we're creating volumes from yaml files in Kubernetes, pv > pvc > mounting to pod, In kubernetes end pvc are showing as meaningful naming convention as per yaml file defined. But

[ceph-users] Re: Nfs-ganesha rpm still has samba package dependency

2020-05-29 Thread Daniel Gryniewicz
You need to disable _MSPAC_SUPPORT to get rid of this dep. Daniel On 11/17/19 5:55 AM, Marc Roos wrote: == Package Arch Version

[ceph-users] RBD logs

2020-05-29 Thread 陈旭
Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph. Here is my ceph.conf: [global] fsid = 53f4e1d5-32ce-4e9c-bf36-f6b

[ceph-users] librados aysnc I/O takes considerably longer to complete

2020-05-29 Thread Ponnuvel Palaniyappan
Hi, Is anyone using librados AIO APIs? I seem to have a problem with that where the rados_aio_wait_for_complete() call just waits for a long period of time before it finishes without error. More info on my setup: I am using Ceph 14.2.4 and write 8MB objects. I run my AIO program on 24 nodes at t

[ceph-users] Help

2020-05-29 Thread Sumit Gaur
Unsubscribe please from all email ids On Tue, 29 Oct 2019 at 7:12 am, wrote: > Send ceph-users mailing list submissions to > ceph-users@ceph.io > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-requ...@ceph.io > > You can reac

[ceph-users] Re: multiple nvme per osd

2020-05-29 Thread Thomas Coelho
Hi, to have more meaningful names, I also prefer LVM for the db drive. Create the volume on OSD by hand too: vgcreate ceph-block-0 /dev/sda lvcreate -l 100%FREE -n block-0 ceph-block-0 DB Drive: vgcreate ceph-db-0 /dev/nvme0n1 lvcreate -L 50GB -n db-0 ceph-db-0 lvcreate -L 50GB -n db-1 ceph-db

[ceph-users] Re: Recovering from a Failed Disk (replication 1)

2020-05-29 Thread Stewart Morgan
Hi, > so I need to transfer the data from the failed OSD to the other OSDs that are > healthy. It’s risky, but if you think the failing disk is healthy “enough”, you can try migrate the data off of it with "ceph osd out {osd-num}” and waiting for it to empty. I’m assuming you have eno

[ceph-users] Re: Recovering from a Failed Disk (replication 1)

2020-05-29 Thread Stewart Morgan
Hi, > so I need to transfer the data from the failed OSD to the other OSDs that are > healthy. It’s risky, but if you think the failing disk is healthy “enough”, you can try migrate the data off of it with "ceph osd out {osd-num}” and waiting for it to empty. I’m assuming you have eno

[ceph-users] Crashed MDS (segfault)

2020-05-29 Thread Gustavo Tonini
Dear ceph users, we're experiencing a segfault during MDS startup (replay process) which is making our FS inaccessible. MDS log messages: Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201 7f3c08f49700 1 -- 192.168.8.195:6800/3181891717 <== osd.26 192.168.8.209:6821/2419345 3

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Simon Leinen
Dear Igor, thanks a lot for your assistance. We're still trying to bring OSDs back up... the cluster is not in a great shape right now. > so the log from the ticket I can see a huge ((400+ MB) bluefs log > kept  over many small non-adjustent extents. > Presumably it was caused by either setting

[ceph-users] Re: Cache pools at or near target size but no evict happen

2020-05-29 Thread Eugen Block
Can you manually flush/evict the cache? Maybe reduce max_target_bytes and max_target_objects to see if that triggers anything. We use cache_mode writeback, maybe give that a try? I don't see many differences between our and your cache tier config, except for cache_mode and we don't have a max

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Igor Fedotov
Simon, Harry, so the log from the ticket I can see a huge ((400+ MB) bluefs log kept  over many small non-adjustent extents. Presumably it was caused by either setting small bluefs_alloc_size or high disk space fragmentation or both. Now I'd like more details on your OSDs. Could you please

[ceph-users] Re: No scrubbing during upmap balancing

2020-05-29 Thread Vytenis A
Actually it's the opposite: I have enabled it: `ceph config set 'osd.*' osd_scrub_during_recovery true`, but still no scrubbing. But now I'm started thinking that the change of `osd_scrub_during_recovery true` was not imminent. I have waited for some time, and then reverted it back to the default `

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Igor Fedotov
Hi Simon, your analysis is correct, you've stepped into an unexpected state for BlueFS log. This is the second occurrence of the issue, the first one is mentioned at https://tracker.ceph.com/issues/45519 Looking if we can get out of this state and how to fix that... Thanks, Igor On 5/29/

[ceph-users] Re: [ceph-users]: Ceph Nautius not working after setting MTU 9000

2020-05-29 Thread Paul Emmerich
Please do not apply any optimization without benchmarking *before* and *after* in a somewhat realistic scenario. No, iperf is likely not a realistic setup because it will usually be limited by available network bandwidth which is (should) rarely be maxed out on your actual Ceph setup. Paul -- P

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Max Krasilnikov
Hello! Fri, May 29, 2020 at 09:58:58AM +0200, pr wrote: > Hans van den Bogert (hansbogert) writes: > > I would second that, there's no winning in this case for your requirements > > and single PSU nodes. If there were 3 feeds,  then yes; you could make an > > extra layer in your crushmap much l

[ceph-users] Re: No scrubbing during upmap balancing

2020-05-29 Thread Paul Emmerich
Did you disable "osd scrub during recovery"? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, May 29, 2020 at 12:04 AM Vytenis A wrote: > Forgot to mention th

[ceph-users] Re: The sufficient OSD capabilities to enable write access on cephfs

2020-05-29 Thread Paul Emmerich
There are two bugs that may cause the tag to be missing from the pools, you can somehow manually add these tags with "ceph osd pool application ..."; I think I posted these commands some time ago on tracker.ceph.com Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at ht

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Simon Leinen
Colleague of Harry's here... Harald Staub writes: > This is again about our bad cluster, with too much objects, and the > hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 > GB usable). Now several OSDs do not come up any more. > Typical error message: > /build/ceph-14.2.8/sr

[ceph-users] Re: PGs degraded after osd restart

2020-05-29 Thread Vytenis A
Thanks for pointing this out! One thing to mention is that we're not using cache tiering, as described on https://tracker.ceph.com/issues/44286 , but it's a good lead. This means that we can't restart (or experience crashes of) OSDs during rebalancing. On Fri, May 29, 2020 at 4:18 AM Chad Willia

[ceph-users] Recover UUID from a partition

2020-05-29 Thread Szabo, Istvan (Agoda)
Hi, Is there a way to recover UUID from the partition? Someone mapped in fstab to /sd* not UUID and all the metadata is gone. The data is there , just can't mount it to access the data. Any idea how to get it back or determine what was it before? Thank you. This

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-29 Thread si...@turka.nl
Does this happen with any random node or specific to 1 node? If specific to 1 node, does this node holds more data compared to other nodes (ceph osd df)? Sinan Polat > Op 29 mei 2020 om 09:56 heeft Boris Behrens het volgende > geschreven: > > Well, this happens when any OSD goes offline. (I

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-29 Thread Boris Behrens
Hi Sinan, this happens with any node, and any single OSD. On Fri, May 29, 2020 at 10:09 AM si...@turka.nl wrote: > > Does this happen with any random node or specific to 1 node? > > If specific to 1 node, does this node holds more data compared to other nodes > (ceph osd df)? > > Sinan Polat > _

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Burkhard Linke (Burkhard.Linke) writes: > > Buy some power transfer switches. You can connect those to the two PDUs, and > in case of a power failure on one PDUs they will still be able to use the > second PDU. ATS = power switches (in my original mail). > We only use them for "small" ma

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Hans van den Bogert (hansbogert) writes: > I would second that, there's no winning in this case for your requirements > and single PSU nodes. If there were 3 feeds,  then yes; you could make an > extra layer in your crushmap much like you would incorporate a rack topology > in the crushmap.

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Chris Palmer (chris.palmer) writes: > Immediate thought: Forget about crush maps, osds, etc. If you lose half the > nodes (when one power rail fails) your MONs will lose quorum. I don't see > how you can win with that configuration... That's a good point, I'll have to think that one throug

  1   2   >