[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
From ceph document, i see using fast device as wal/db could improve the performance. So we using one(2TB) or two(1TB) samsung Nvme 970pro as wal/db here, and yes, we have two data pools, ssd pool and hdd pool, also ssd pool using samsung 860Pro. the Nvme970 as wal/db for both ssd pool and hdd

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
Thanks a lot, Marc! I will try to do a fio test for the crash disks when there is no traffic in our cluster. we using Samsung nvme 970pro as wal/db and using SSD 860 Pro as SSD. And the nvme disappear after ssd hit timeout. may be also need throw 970pro away? Thanks, zx > 在 2021年2月22日,下午9:2

[ceph-users] Re: multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Simon Pierre DESROSIERS
Le lun. 22 févr. 2021, à 10 h 34, Janne Johansson a écrit : > Den mån 22 feb. 2021 kl 15:27 skrev Simon Pierre DESROSIERS < > simonpierre.desrosi...@montreal.ca>: > >> Hello, >> >> We have functional ceph swarm with a pair of S3 rgw in front that uses >> A.B.C.D domain to be accessed. >> >> Now a

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread Mark Lehrer
> Yes, it a Nvme, and on node has two Nvmes as db/wal, one > for ssd(0-2) and another for hdd(3-6). I have no spare to try. > ... > I/O 517 QID 7 timeout, aborting > Input/output error If you are seeing errors like these, it is almost certainly a bad drive unless you are using fabric. Why are yo

[ceph-users] Re: 10G stackabe lacp switches

2021-02-22 Thread Phil Regnauld
Fabien Sirjean (fsirjean) writes: > Hi, > > Yes, Cisco supports VPC (virtual port-channel) for LACP over multiple > switches. > > We're using 2x10G VPC on our Ceph and Proxmox hosts with 2 Cisco Nexus > 3064PQ (48 x 10G SFP+ & 4 x 40G). Same config here. Have set it up with LACP on the C

[ceph-users] Re: multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Chris Palmer
I'm not sure that the tenant solution is what the OP wants - my reading is that running under a different tenant allows you have different tenants use the same bucket and user names but still be distinct, which wasn't what I thought was meant. You can however get RGW to accept a list of host n

[ceph-users] Re: Data Missing with RBD-Mirror

2021-02-22 Thread Vikas Rana
We did compare and it was missing lot of data. I'll issue resync and report back. Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Monday, February 22, 2021 12:09 PM To: Vikas Rana Cc: 'Mykola Golub' ; 'Eugen Block' ; ceph-users@ceph.io; dilla...@redhat.com Subject: Re: [cep

[ceph-users] multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Simon Pierre DESROSIERS
Hello, We have functional ceph swarm with a pair of S3 rgw in front that uses A.B.C.D domain to be accessed. Now a new client asks to have access using the domain : E.C.D, but to already existing buckets. This is not a scenario discussed in the docs. Apparently, looking at the code and by trying

[ceph-users] Outreachy May 2021

2021-02-22 Thread Mike Perez
Hi everyone, I want to invite you to apply to an internship program called Outreachy! Outreachy provides three-month internships to work in Free and Open Source Software (FOSS). Outreachy internship projects may include programming, user experience, documentation, illustration, graphical desi

[ceph-users] Re: Storing 20 billions of immutable objects in Ceph, 75% <16KB

2021-02-22 Thread Benoît Knecht
Hi, On Sunday, February 21st, 2021 at 12:39, Loïc Dachary wrote: > For the record, here is a summary of the key takeaways from this conversation > (so far): > > - Ambry[0] is a perfect match and I'll keep exploring it[1]. > - To keep billions of small objects manageable, they must be packed

[ceph-users] Re: Data Missing with RBD-Mirror

2021-02-22 Thread Mykola Golub
On Mon, Feb 22, 2021 at 11:37:52AM -0500, Vikas Rana wrote: > That is correct. On Prod we do have 22TB and on DR we only have 5.5TB But did you check that you really have missing files/data? Just to make sure it is not just some issue with how data is stored/counted in different clusters. Assumi

[ceph-users] Re: Data Missing with RBD-Mirror

2021-02-22 Thread Vikas Rana
That is correct. On Prod we do have 22TB and on DR we only have 5.5TB Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Monday, February 22, 2021 10:47 AM To: Vikas Rana Cc: 'Eugen Block' ; ceph-users@ceph.io; dilla...@redhat.com Subject: Re: [ceph-users] Re: Data Missing with

[ceph-users] Re: 10G stackabe lacp switches

2021-02-22 Thread Fabien Sirjean
Hi, Yes, Cisco supports VPC (virtual port-channel) for LACP over multiple switches. We're using 2x10G VPC on our Ceph and Proxmox hosts with 2 Cisco Nexus 3064PQ (48 x 10G SFP+ & 4 x 40G). These refs can be found for ~1000€ in refurb with lifetime warranty. Happy users here :-) Cheers, F

[ceph-users] Question on multi-site

2021-02-22 Thread Cary FitzHugh
Hello - We're using libRADOS directly for our communication between services. Some of the features are faster and more featured for our use cases than an S3 gateway. But we do want to leverage the ES Metadata search. It appears that the Metadata search is built on the object gateway. Question

[ceph-users] Re: Data Missing with RBD-Mirror

2021-02-22 Thread Mykola Golub
On Mon, Feb 22, 2021 at 09:41:44AM -0500, Vikas Rana wrote: > # rbd journal info -p cifs --image research_data > rbd journal '11cb6c2ae8944a': > header_oid: journal.11cb6c2ae8944a > object_oid_prefix: journal_data.17.11cb6c2ae8944a. > order: 24 (16MiB objects) > sp

[ceph-users] Re: multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Janne Johansson
Den mån 22 feb. 2021 kl 15:27 skrev Simon Pierre DESROSIERS < simonpierre.desrosi...@montreal.ca>: > Hello, > > We have functional ceph swarm with a pair of S3 rgw in front that uses > A.B.C.D domain to be accessed. > > Now a new client asks to have access using the domain : E.C.D, but to > alread

[ceph-users] Re: 10G stackabe lacp switches

2021-02-22 Thread mj
I wanted to say thank you to all of those who have offered us (both on- and offlist as well) help and support for this arista move. It feels good to know that with it, we are on a path that many here are walking on. Lastly, if I may, one more question: My arista seller tells me that the aris

[ceph-users] Re: Data Missing with RBD-Mirror

2021-02-22 Thread Vikas Rana
Hello Mykola/Eugen, Here's the output. We also restarted the rbd-mirror process # rbd journal info -p cifs --image research_data rbd journal '11cb6c2ae8944a': header_oid: journal.11cb6c2ae8944a object_oid_prefix: journal_data.17.11cb6c2ae8944a. order: 24 (16MiB objects)

[ceph-users] Re: multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Freddy Andersen
You need to enable users with tenants … https://docs.ceph.com/en/latest/radosgw/multitenancy/ From: Simon Pierre DESROSIERS Date: Monday, February 22, 2021 at 7:27 AM To: ceph-users@ceph.io Subject: [ceph-users] multiple-domain for S3 on rgws with same ceph backend on one zone Hello, We have

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread Marc
So on the disks that crash anyway, do the fio test. If it crashes, you will know it has nothing to do with ceph. If it does not crash you will probably get poor fio result, which would explain the problems with ceph. This is what someone wrote in the past. If you did not do your research on dri

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread zxcs
Haven’t do any fio test for single disk , but did fio for the ceph cluster, actually the cluster has 12 nodes, and each node has same disks(means, 2 nvmes for cache, and 3 ssds as osd, 4 hdds also as osd). Only two nodes has such problem. And these two nodes are crash many times(at least 4 time

[ceph-users] Re: Ceph nvme timeout and then aborting

2021-02-22 Thread Marc
Don't you have problems, just because the Samsung 970 PRO is not suitable for this? Have you run fio tests to make sure it would work ok? https://yourcmc.ru/wiki/Ceph_performance https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit#gid=0 > -Original Mess