Re: [ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-12-05 Thread Maxime Guyot
://paste.openstack.org/show/736713/ And here is the snippet to easily turn on/off S3 versioning on a given bucket: https://gist.github.com/Miouge1/b8ae19b71411655154e74e609b61f24e Cheers, Maxime On Fri, 30 Nov 2018 at 22:28 Florian Haas wrote: > On 28/11/2018 19:06, Maxime Guyot wrote: > > H

Re: [ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-11-28 Thread Maxime Guyot
ilable. Since the doc says "S3 and Swift APIs share a common namespace, so you may write data with one API and retrieve it with the other", it might be useful to document this kind of limitations somewhere. Cheers, / Maxime On Wed, 28 Nov 2018 at 17:58 Florian Haas wrote: > On 27/11/201

[ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-11-27 Thread Maxime Guyot
Hi, I'm running into an issue with the RadosGW Swift API when the S3 bucket versioning is enabled. It looks like it silently drops any metadata sent with the "X-Object-Meta-foo" header (see example below). This is observed on a Luminous 12.2.8 cluster. Is that a normal thing? Am I misconfiguring s

Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread Maxime Guyot
Hi, This is a common problem when doing custom CRUSHmap, the default behavior is to update the OSD node to location in the CRUSHmap on start. did you keep to the defaults there? If that is the problem, you can either: 1) Disable the update on start option: "osd crush update on start = false" (see

Re: [ceph-users] where is a RBD in use

2017-08-31 Thread Maxime Guyot
Hi Götz, Something like "rbd status image-spec" usually works for me. Man page says: "Show the status of the image, including which clients have it open." I'll tell you which IPs have it open which should help you to track it down. Cheers, Maxime On Thu, 31 Aug 2017 at 16:26 Götz Reinicke wrote

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Maxime Guyot
Hi Massimiliano, I am a little surprised to see 6x NVMe, 64GB of RAM, 2x100 NICs and E5-2603 v4, that's one of the cheapest E5 Intel CPU mixed with some pretty high end gear, it does not make sense. Wildo's right go with much higher frequency: E5-2637 v4, E5-2643 v4, E5-1660 v4, E5-1650 v4. If you

Re: [ceph-users] 300 active+undersized+degraded+remapped

2017-07-01 Thread Maxime Guyot
Hi Deepak, As Wildo pointed it out in the thread you linked, "osd crush update on start" and osd crush location are quick ways to fix this. If you are doing custom locations (like for tiering NVMe vs HDD) "osd crush location hook" (Doc: http://docs.ceph.com/docs/master/rados/operations/crush-map/#

Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals

2017-06-22 Thread Maxime Guyot
Hi, One of the benefits of PCIe NVMe is that it does not take a disk slot, resulting in a higher density. For example a 6048R-E1CR36N with 3x PCIe NVMe yields 36 OSDs per servers (12 OSD per NVMe) where it yields 30 OSDs per server if using SATA SSDs (6 OSDs per SSD). Since you say that you used

Re: [ceph-users] design guidance

2017-06-06 Thread Maxime Guyot
Hi Daniel, The flexibility of Ceph is that you can start with your current config, scale out and upgrade (CPUs, journals etc...) as your performance requirement increase. 6x1.7Ghz, are we speaking about the Xeon E5 2603L v4? Any chance to bump that to 2620 v4 or 2630 v4? Test how the 6x1.7Ghz han

Re: [ceph-users] handling different disk sizes

2017-06-06 Thread Maxime Guyot
Hi Félix, Changing the failure domain to OSD is probably the easiest option if this is a test cluster. I think the commands would go like: - ceph osd getcrushmap -o map.bin - crushtool -d map.bin -o map.txt - sed -i 's/step chooseleaf firstn 0 type host/step chooseleaf firstn 0 type osd/' map.txt

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-01 Thread Maxime Guyot
Hi, Lots of good info on SSD endurance in this thread. For Ceph journal you should also consider the size of the backing OSDs: the SSD journal won't last as long if backing 5x8TB OSDs or 5x1TB OSDs. For example, the S3510 480GB (275TB of endurance), if backing 5x8TB (40TB) OSDs, will provide ver

Re: [ceph-users] Data not accessible after replacing OSD with larger volume

2017-05-01 Thread Maxime Guyot
Hi, "Yesterday I replaced one of the 100 GB volumes with a new 2 TB volume which includes creating a snapshot, detaching the old volume, attaching the new volume, then using parted to correctly set the start/end of the data partition. This all went smoothly and no issues reported from AWS or the s

Re: [ceph-users] Ceph with Clos IP fabric

2017-04-22 Thread Maxime Guyot
up, but I am not sure if that’s workable for ceph’s listening address. Cheers, Maxime From: Richard Hesse Date: Thursday 20 April 2017 16:36 To: Maxime Guyot Cc: Jan Marquardt , "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Ceph with Clos IP fabric On Thu, Apr

Re: [ceph-users] Ceph with Clos IP fabric

2017-04-20 Thread Maxime Guyot
Hi, >2) Why did you choose to run the ceph nodes on loopback interfaces as opposed >to the /24 for the "public" interface? I can’t speak for this example, but in a clos fabric you generally want to assign the routed IPs on loopback rather than physical interfaces. This way if one of the link go

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Maxime Guyot
Hi, >> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, >1:4-5 is common but depends on your needs and the devices in question, ie. >assuming LFF drives and that you aren’t using crummy journals. You might be speaking about different ratios here. I think that Anthony is

Re: [ceph-users] Adding a new rack to crush map without pain?

2017-04-19 Thread Maxime Guyot
Hi Matthew, I would expect the osd_crush_location parameter to take effect from the OSD activation. Maybe ceph-ansible would have info there? A work around might be “set noin”, restart all the OSDs once the ceph.conf includes the crush location and enjoy the automatic CRUSHmap update (if you ha

Re: [ceph-users] Ceph extension - how to equilibrate ?

2017-04-19 Thread Maxime Guyot
Hi Pascal, I ran into the same situation some time ago: a small cluster and adding a node with HDDs double the size of the existing ones and wrote about it here: http://ceph.com/planet/the-schrodinger-ceph-cluster/ When adding OSDs to a cluster rebalancing/data movement is unavoidable in most

Re: [ceph-users] Flapping OSDs

2017-04-02 Thread Maxime Guyot
Hi Vlad, I am curious if those OSDs are flapping all at once? If a single host is affected I would consider the network connectivity (bottlenecks and misconfigured bonds can generate strange situations), storage controller and firmware. Cheers, Maxime From: ceph-users on behalf of Vlad Bland

Re: [ceph-users] How to think a two different disk's technologies architecture

2017-03-23 Thread Maxime Guyot
Hi Alexandro, As I understand you are planning NVMe for Journal for SATA HDD and collocated journal for SATA SSD? Option 1: - 24x SATA SSDs per server, will have a bottleneck with the storage bus/controller. Also, I would consider the network capacity 24xSSDs will deliver more performance tha

Re: [ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Maxime Guyot
Hi Vincent, There is no buffering until the object reaches 8MB. When the object is written, it has a given size. RADOS just splits the object in K chunks, padding occurs if the object size is not a multiple of K. See also: http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/devel

Re: [ceph-users] total storage size available in my CEPH setup?

2017-03-14 Thread Maxime Guyot
Hi, >> My question is how much total CEPH storage does this allow me? Only 2.3TB? >> or does the way CEPH duplicates data enable more than 1/3 of the storage? > 3 means 3, so 2.3TB. Note that Ceph is spare, so that can help quite a bit. To expand on this, you probably want to keep some margins a

Re: [ceph-users] cephfs and erasure coding

2017-03-08 Thread Maxime Guyot
Hi, >“The answer as to how to move an existing cephfs pool from replication to >erasure coding (and vice versa) is to create the new pool and rsync your data >between them.” Shouldn’t it be possible to just do the “ceph osd tier add ecpool cachepool && ceph osd tier cache-mode cachepool writeb

Re: [ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-08 Thread Maxime Guyot
Hi Kevin, I don’t know about those flags, but if you want to shrink your cluster you can simply set the weight of the OSDs to be removed to 0 like so: “ceph osd reweight osd.X 0” You can either do it gradually if your are concerned about client I/O (probably not since you speak of a test / semi

Re: [ceph-users] Replication vs Erasure Coding with only 2 elementsinthe failure-domain.

2017-03-08 Thread Maxime Guyot
Hi, If using Erasure Coding, I think that should be using “choose indep” rather than “firstn” (according to http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007306.html) “- min_size 4 - max_size 4 - step take - step chooseleaf firstn 2 type host - step emit - step take - step

Re: [ceph-users] Mix HDDs and SSDs togheter

2017-03-03 Thread Maxime Guyot
Hi Matteo, That’s some interesting questions. >What’s the best method? Modify the crush map via ceph CLI or via text editor? Others might have different experience, but IMO the best way is using the “osd crush location hook = /path/to/script” (See http://docs.ceph.com/docs/master/rados/operatio

Re: [ceph-users] replica questions

2017-03-03 Thread Maxime Guyot
Hi Henrik and Matteo, While I agree with Henrik: increasing your replication factor won’t improve recovery or read performance on its own. If you are changing from replica 2 to replica 3, you might need to scale-out your cluster to have enough space for the additional replica, and that would im

Re: [ceph-users] CrushMap Rule Change

2017-03-02 Thread Maxime Guyot
Hi Ahsley, The rule you indicated, with “step choose indep 0 type osd” should select 13 different OSDs but not necessary on 13 different servers. So you should be able to test that on say 4 servers if you have ~4 OSDs per server. To split the selected OSDs across 4 hosts, I think you would do s

Re: [ceph-users] Increase number of replicas per node

2017-02-27 Thread Maxime Guyot
Hi Massimiliano, You’ll need to update the rule with something like that: rule rep6 { ruleset 1 type replicated min_size 6 max_size 6 step take root step choose firstn 3 type host step choose firstn 2 type osd step emit } Testing it

Re: [ceph-users] Experience with 5k RPM/archive HDDs

2017-02-18 Thread Maxime Guyot
end against SMR. > > Go for normal SATA drives with only slightly higher price/capacity ratios. > > - mike > >> On 2/3/17 2:46 PM, Stillwell, Bryan J wrote: >> On 2/3/17, 3:23 AM, "ceph-users on behalf of Wido den Hollander" >> wrote: >>> >>&g

Re: [ceph-users] Experience with 5k RPM/archive HDDs

2017-02-03 Thread Maxime Guyot
access, you're likely to run out of IOPS (again) long > before filling these monsters up. > I fully agree. These large disks have very low IOps specs and will probably work very, very bad with Ceph. Wido > Christian > > > > С ува

[ceph-users] Experience with 5k RPM/archive HDDs

2017-02-02 Thread Maxime Guyot
Hi everyone, I’m wondering if anyone in the ML is running a cluster with archive type HDDs, like the HGST Ultrastar Archive (10TB@7.2k RPM) or the Seagate Enterprise Archive (8TB@5.9k RPM)? As far as I read they both fall in the enterprise class HDDs so *might* be suitable for a low performance

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Maxime Guyot
OSD. If it contains some of > the incomplete PGs , would be possible to add then into the new disks? > Maybe following this steps? http://ceph.com/community/incomplete-pgs-oh-my/ > > El 31/01/17 a las 10:44, Maxime Guyot escribió: >> Hi José, >>

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Maxime Guyot
Hi José, Too late, but you could have updated the CRUSHmap *before* moving the disks. Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2 host=loki05” would move the osd.0 to loki05 and would trigger the appropriate PG movements before any physical move. Then the physic

Re: [ceph-users] All SSD cluster performance

2017-01-16 Thread Maxime Guyot
Meijs" wrote: Hi Maxime, Given your remark below, what kind of SATA SSD do you recommend for OSD usage? Thanks! Regards, Kees On 15-01-17 21:33, Maxime Guyot wrote: > I don’t have firsthand experience with the S3520, as Christian pointed out th

Re: [ceph-users] All SSD cluster performance

2017-01-15 Thread Maxime Guyot
Hi, I don’t have firsthand experience with the S3520, as Christian pointed out their endurance doesn’t make them suitable for OSDs in most cases. I can only advise you to keep a close eye on the SMART status of the SSDs. Anyway, the S3520 960GB is advertised at 380 MB/s for write. Assuming this

Re: [ceph-users] All SSD Ceph Journal Placement

2016-12-20 Thread Maxime Guyot
SSDs. Cheers, Maxime Guyot <mailto:maxime.gu...@elits.se> On 20/12/16 15:59, "ceph-users on behalf of Jeldrik" wrote: Hi all, i know this topic has been discussed a few times from different perspectives here, but I could not really get to the answer I need

Re: [ceph-users] Production System Evaluation / Problems

2016-11-28 Thread Maxime Guyot
problems with quorum, a third location with 1 MON can help break ties. 2. Zap & re-create? 3. It is common to use 2 VLANs on a LACP bond instead of 1 NIC on each VLAN. You just need to size the pipes accordingly to avoid bottlenecks. Cheers, Maxime Guyot<mailto:maxime.gu...@e

Re: [ceph-users] general ceph cluster design

2016-11-25 Thread Maxime Guyot
Hi Nick, See inline comments. Cheers, Maxime On 25/11/16 16:01, "ceph-users on behalf of nick" wrote: >Hi, >we are currently planning a new ceph cluster which will be used for >virtualization (providing RBD storage for KVM machines) and we have some >general questions. >

Re: [ceph-users] rados cppool slooooooowness

2016-08-16 Thread Maxime Guyot
Hi Simon, If everything is in the same Ceph cluster and you want to move the whole “.rgw.buckets” (I assume your RBD traffic is targeted into a “data” or “rbd” pool) to your cold storage OSD maybe you could edit the CRUSH map, then it’s just a matter of rebalancing. You can check the ssd/platte

Re: [ceph-users] High-performance way for access Windows of users to Ceph.

2016-08-12 Thread Maxime Guyot
Hi, > “Clients run program written by them, which generates files of various sizes > - from 1 KB to 200 GB” If the clients are running custom software on Windows and if at all possible, I would consider using librados. The library is

Re: [ceph-users] Intel SSD (DC S3700) Power_Loss_Cap_Test failure

2016-08-03 Thread Maxime Guyot
Hi, I haven’t had problems with Power_Loss_Cap_Test so far. Regarding Reallocated_Sector_Ct (SMART ID: 5/05h), you can check the “Available Reserved Space” (SMART ID: 232/E8h), the data sheet (http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3610-spec.

Re: [ceph-users] Mon placement over wide area

2016-04-12 Thread Maxime Guyot
ven't tested the federated gateways RadosGW. Best Regards Maxime Guyot System Engineer On 12/04/16 03:28, "ceph-users on behalf of Adrian Saul" wrote: >Hello again Christian :) > > >> > We are close to being given approval to deploy a 3.5PB Ceph cluster th

Re: [ceph-users] 800TB - Ceph Physical Architecture Proposal

2016-04-08 Thread Maxime Guyot
Hello, On 08/04/16 04:47, "ceph-users on behalf of Christian Balzer" wrote: > >> 11 OSD nodes: >> -SuperMicro 6047R-E1R36L >> --2x E5-2603v2 >Vastly underpowered for 36 OSDs. >> --128GB RAM >> --36x 6TB OSD >> --2x Intel P3700 (journals) >Which exact model? >If it's the 400GB one, that's 2G

Re: [ceph-users] XFS and nobarriers on Intel SSD

2016-03-03 Thread Maxime Guyot
ckage using “alien”. Then just a matter of “isdct show –intelssd” and “isdct load –intelssd 0" It has been a week since the cluster runs with the latest firmware, I can’t reproduce the problem so it looks like the issue is solved. Thank you Christian for the info! Regards Maxime Guyot&