Re: [ceph-users] monitor ghosted

2020-01-09 Thread Peter Eisch
host issues were observed in the rest of the cluster or at the site. Thank you for your replies and I'll gather better loggin next time. peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzeg

[ceph-users] monitor ghosted

2020-01-08 Thread Peter Eisch
-01-08 13:33:29.541 7fec1a736700 1 mon.cephmon02@1(probing) e7 handle_auth_request failed to assign global_id ... There is nothing in the logs of the two remaining/healthy monitors. What is my best practice to get this host back in the cluster? peter Peter Eisch Senior Site Reliability

Re: [ceph-users] RGW/swift segments

2019-10-31 Thread Peter Eisch
of this be willing to file it as a bug, please? peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA Confidentiality Notice: The

Re: [ceph-users] RGW/swift segments

2019-10-31 Thread Peter Eisch
could confer? peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA Confidentiality Notice: The information contained in this e-mail

Re: [ceph-users] RGW/swift segments

2019-10-28 Thread Peter Eisch
. rgw relaxed s3 bucket names = true rgw s3 auth use keystone = true rgw thread pool size = 4096 rgw keystone revocation interval = 300 rgw keystone token cache size = 1 rgw swift versioning enabled = true rgw log nonexistent bucket = true All tips accepted… peter Peter Eisch

[ceph-users] RGW/swift segments

2019-10-28 Thread Peter Eisch
file is delete but all the segments remain. Am I misconfigured or is this a bug where it won’t expire the actual data? Shouldn’t RGW set the expiration on the uploaded segments too if they’re managed separately? Thanks, peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228

Re: [ceph-users] Ceph for "home lab" / hobbyist use?

2019-09-06 Thread Peter Woodman
2GB ram is gonna be really tight, probably. However, I do something similar at home with a bunch of rock64 4gb boards, and it works well. There are sometimes issues with the released ARM packages (frequently crc32 doesn;'t work, which isn't great), so you may have to build your own on the board you

Re: [ceph-users] health: HEALTH_ERR Module 'devicehealth' has failed: Failed to import _strptime because the import lockis held by another thread.

2019-08-28 Thread Peter Eisch
> Restart of single module is: `ceph mgr module disable devicehealth ; ceph mgr > module enable devicehealth`. Thank you for your reply. The I receive an error as the module can't be disabled. I may have worked through this by restarting the nodes in a rapid succession. peter

[ceph-users] health: HEALTH_ERR Module 'devicehealth' has failed: Failed to import _strptime because the import lockis held by another thread.

2019-08-27 Thread Peter Eisch
ns active (cephrgw-a01, cephrgw-a02, cephrgw-a03) data: pools: 18 pools, 4901 pgs objects: 4.28M objects, 16 TiB usage: 49 TiB used, 97 TiB / 146 TiB avail pgs: 4901 active+clean io: client: 7.4 KiB/s rd, 24 MiB/s wr, 7 op/s rd, 628 op/s wr Peter Eis

Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
On 26.07.19 15:03, Stefan Kooman wrote: > Quoting Peter Sabaini (pe...@sabaini.at): >> What kind of commit/apply latency increases have you seen when adding a >> large numbers of OSDs? I'm nervous how sensitive workloads might react >> here, esp. with spinners. &

Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
What kind of commit/apply latency increases have you seen when adding a large numbers of OSDs? I'm nervous how sensitive workloads might react here, esp. with spinners. cheers, peter. On 24.07.19 20:58, Reed Dier wrote: > Just chiming in to say that this too has been my preferred me

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
store OSD due to missing devices') RuntimeError: Unable to activate bluestore OSD due to missing devices (this is repeated for each of the 16 drives) Any other thoughts? (I’ll delete/create the OSDs with ceph-deply otherwise.) peter Peter Eisch Senior Site Reliability Engineer T1.612.6

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
e to activate bluestore OSD due to missing devices # Okay, this created /etc/ceph/osd/*.json. This is cool. Is there a command or option which will read these files and mount the devices? peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/g

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
x27;t do anything to specific commands for just updating the ceph RPMs in this process. peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland |

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
[2019-07-24 13:40:49,602][ceph_volume.process][INFO ] Running command: /bin/systemctl show --no-pager --property=Id --state=running ceph-osd@* This is the only log event. At the prompt: # ceph-volume simple scan # peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
/ceph/osd/ceph-18 ├─sdc2 8:34 0 1.7T 0 part │ └─fdad7618-1234-4021-a63e-40d973712e7b 253:13 0 1.7T 0 crypt ... Thank you for your time on this, peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/g

[ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
x27;type' files but I'm unsure how to get the lockboxes mounted to where I can get the OSDs running. The osd-lockbox directory is otherwise untouched from when the OSDs were deployed. Is there a way to run ceph-deploy or some other tool to rebuild the mounts for the drives? peter Peter

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread Peter Eisch
earching a resolution for this? peter  Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA Confidentiality Notice: The information co

[ceph-users] RGW Multisite Q's

2019-06-12 Thread Peter Eisch
data sync: ERROR: failed to read remote data log info: ret=-2 ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 ... etc. These seem to fire off every 30 seconds but doesn't seem to be managed by "rgw usage log tick interval" nor "rgw init timeout" val

Re: [ceph-users] Global Data Deduplication

2019-05-30 Thread Peter Wienemann
Hi Felix, there is a seven year old open issue asking for this feature [0]. An alternative option would be using Benji [1]. Peter [0] https://tracker.ceph.com/issues/1576 [1] https://benji-backup.me On 29.05.19 10:25, Felix Hüttner wrote: > Hi everyone, > > We are currently using Ce

Re: [ceph-users] Cephfs free space vs ceph df free space disparity

2019-05-28 Thread Peter Wienemann
er thread about this: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/thread.html#24801 > > Gr. Stefan Hi Robert, some more questions: Are all your OSDs of equal size? If yes, have you enabled balancing for your cluster (see [0])? You might also be interest

Re: [ceph-users] [events] Ceph Day CERN September 17 - CFP now open!

2019-05-27 Thread Peter Wienemann
Hi Mike, there is a date incompatibility between your announcement and Dan's initial announcement [0]. Which date is correct: September 16 or September 17? Peter [0] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/034259.html On 27.05.19 11:22, Mike Perez wrote: > Hey

Re: [ceph-users] pool migration for cephfs?

2019-05-15 Thread Peter Woodman
I actually made a dumb python script to do this. It's ugly and has a lot of hardcoded things in it (like the mount location where i'm copying things to to move pools, names of pools, the savings i was expecting, etc) but should be easy to adapt to what you're trying to do https://gist.github.com/p

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-05 Thread Peter Woodman
Last time I had to do this, I used the command outlined here: https://tracker.ceph.com/issues/10098 On Mon, Mar 4, 2019 at 11:05 AM Daniel K wrote: > > Thanks for the suggestions. > > I've tried both -- setting osd_find_best_info_ignore_history_les = true and > restarting all OSDs, as well as '

Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-18 Thread Peter Woodman
At the risk of hijacking this thread, like I said I've ran into this problem again, and have captured a log with debug_osd=20, viewable at https://www.dropbox.com/s/8zoos5hhvakcpc4/ceph-osd.3.log?dl=0 - any pointers? On Tue, Jan 8, 2019 at 11:31 AM Peter Woodman wrote: > > For the rec

Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-08 Thread Peter Woodman
For the record, in the linked issue, it was thought that this might be due to write caching. This seems not to be the case, as it happened again to me with write caching disabled. On Tue, Jan 8, 2019 at 11:15 AM Sage Weil wrote: > > I've seen this on luminous, but not on mimic. Can you generate

Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Peter Woodman
not to mention that the current released version of mimic (.2) has a bug that is potentially catastrophic to cephfs, known about for months, yet it's not in the release notes. would have upgraded and destroyed data had i not caught a thread on this list. hopefully crowing like this isn't coming of

Re: [ceph-users] cephfs speed

2018-08-31 Thread Peter Eisch
[replying to myself] I set aside cephfs and created an rbd volume. I get the same splotchy throughput with rbd as I was getting with cephfs. (image attached) So, withdrawing this as a question here as a cephfs issue. #backingout peter Peter Eisch virginpulse.com

Re: [ceph-users] cephfs speed

2018-08-30 Thread Peter Eisch
Thanks for the thought. It’s mounted with this entry in fstab (one line, if email wraps it): cephmon-s01,cephmon-s02,cephmon-s03:/     /loam    ceph    noauto,name=clientname,secretfile=/etc/ceph/secret,noatime,_netdev    0       2 Pretty plain, but I'm open to tweaking! peter Peter

[ceph-users] cephfs speed

2018-08-30 Thread Peter Eisch
in bandwidth (MB/sec): 1084 Average IOPS: 279 Stddev IOPS:1 Max IOPS: 285 Min IOPS: 271 Average Latency(s): 0.057239 Stddev Latency(s): 0.0354817 Max latency(s): 0.367037 Min latency(s): 0.0120791 peter Peter E

Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-24 Thread Peter Linder
To keep up with the SSD's you will need so many HDDs for an average workload that in order to keep up performance you will not save any money. Regards, Peter Den 2018-05-23 kl. 14:37, skrev Paul Emmerich: You can't mix HDDs and SSDs in a server if you want to use such a rule. Th

Re: [ceph-users] [SOLVED] Replicated pool with an even size - has min_size to be bigger than half the size?

2018-03-29 Thread Peter Linder
Den 2018-03-29 kl. 14:26, skrev David Rabel: On 29.03.2018 13:50, Peter Linder wrote: Den 2018-03-29 kl. 12:29, skrev David Rabel: On 29.03.2018 12:25, Janne Johansson wrote: 2018-03-29 11:50 GMT+02:00 David Rabel : You are right. But with my above example: If I have min_size 2 and size 4

Re: [ceph-users] [SOLVED] Replicated pool with an even size - has min_size to be bigger than half the size?

2018-03-29 Thread Peter Linder
Den 2018-03-29 kl. 12:29, skrev David Rabel: On 29.03.2018 12:25, Janne Johansson wrote: 2018-03-29 11:50 GMT+02:00 David Rabel : You are right. But with my above example: If I have min_size 2 and size 4, and because of a network issue the 4 OSDs are split into 2 and 2, is it possible that I

Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Peter Linder
1285M 9312G  0.01    0  0 101   hdd       0  1.0 9313G  1271M 9312G  0.01    0  0 On Tue, Mar 27, 2018 at 2:29 PM, Peter Linder mailto:peter.lin...@fiberdirekt.se>> wrote: I've had similar issues, but I think your problem might be something else. Could you send the output of

Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Peter Linder
I've had similar issues, but I think your problem might be something else. Could you send the output of "ceph osd df"? Other people will probably be interested in what version you are using as well. Den 2018-03-27 kl. 20:07, skrev Jon Light: Hi all, I'm adding a new OSD node with 36 OSDs t

Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-12 Thread Peter Woodman
from what i've heard, xfs has problems on arm. use btrfs, or (i believe?) ext4+bluestore will work. On Sun, Mar 11, 2018 at 9:49 PM, Christian Wuerdig wrote: > Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of > storage? Literally everything posted on this list in relation to H

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Peter Linder
iginal plan, so PGs per OSD will decrease over time. At the time we thought to aim for 300 PGs per OSD, which I realize now was probably not a great idea, something like 150 would have been better. /Peter Den 2018-01-31 kl. 13:42, skrev Thomas Bennett: Hi Peter, Relooking at your problem, yo

Re: [ceph-users] CRUSH straw2 can not handle big weight differences

2018-01-29 Thread Peter Linder
st add another layer in between and make sure the weights there do not differ too much when we plan it out. /Peter Den 2018-01-29 kl. 17:52, skrev Gregory Farnum: CRUSH is a pseudorandom, probabilistic algorithm. That can lead to problems with extreme input. In this case, you've given

Re: [ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Peter Linder
But the OSDs themselves introduce latency also, even if they are NVMe. We find that it is in the same ballpark. Latency does reduce I/O, but for sub-ms ones it is still thousands of IOPS even for a single thread. For a use case with many concurrent writers/readers (VMs), aggregated throughput

Re: [ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Peter Linder
doing well :) /Peter Den 2018-01-29 kl. 19:26, skrev Nico Schottelius: Hey Wido, [...] Like I said, latency, latency, latency. That's what matters. Bandwidth usually isn't a real problem. I imagined that. What latency do you have with a 8k ping between hosts? As the link will be setup

Re: [ceph-users] CRUSH straw2 can not handle big weight differences

2018-01-29 Thread Peter Linder
We kind of turned the crushmap inside out a little bit. Instead of the traditional "for 1 PG, select OSDs from 3 separate data centers" we did "force selection from only one datacenter (out of 3) and leave enough options only to make sure precisely 1 SSD and 2 HDD are selected". We then orga

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-26 Thread Peter Linder
Ok, by randomly toggling settings *MOST* of the PGs in the test cluster is online, but a few are not. No matter how much I change, a few of them seem to not activate. They are running bluestore with version 12.2.2, i think created with ceph-volume. Here is the output from ceph pg X query of on

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-26 Thread Peter Linder
r, great success :). Now I only have to learn how to fix it, any ideas anyone? Den 2018-01-26 kl. 12:59, skrev Peter Linder: Well, we do, but our problem is with our hybrid setup (1 nvme and 2 hdds). The other two (that we rarely use) are nvme only and hdd only, as far as I can tell they

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-26 Thread Peter Linder
type replicated     min_size 1     max_size 3     step take default class nvme     step chooseleaf firstn 0 type datacenter     step emit } # end crush map Den 2018-01-26 kl. 11:22, skrev Thomas Bennett: Hi Peter, Just to check if your problem is similar to mine:

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-26 Thread Peter Linder
rev Thomas Bennett: Hi Peter, Not sure if you have got to the bottom of your problem, but I seem to have found what might be a similar problem. I recommend reading below,  as there could be a potential hidden problem. Yesterday our cluster went into *HEALTH_WARN* state**and I noticed that one of m

Re: [ceph-users] Stuck pgs (activating+remapped) and slow requests after adding OSD node via ceph-ansible

2018-01-22 Thread Peter Linder
w node or disk. See my email subject "Weird issues related to (large/small) weights in mixed nvme/hdd pool" from 2018-01-20 and see if there are some similarities? Regards, Peter Den 2018-01-07 kl. 12:17, skrev Tzachi Strul: Hi all, We have 5 node ceph cluster (Luminous 12.2.1) i

[ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-20 Thread peter . linder
how the hashing/selection process works though, it does somehow seem that if the values are too far apart, things seem to break. crushtool --test seems to correctly calculate my PGs. Basically when this happens I just randomly change some weights and most of the time it starts working.

Re: [ceph-users] Luminous on armhf

2017-12-18 Thread Peter Woodman
er, yeah, i didn't read before i replied. that's fair, though it is only some of the integration test binaries that tax that limit in a single compile step. On Mon, Dec 18, 2017 at 4:52 PM, Peter Woodman wrote: > not the larger "intensive" instance types! they go up to 128

Re: [ceph-users] Luminous on armhf

2017-12-18 Thread Peter Woodman
t; their 32 bit ARM systems have the same 2 GB limit. I haven’t tried the > cross-compile on the 64 bit ARMv8 they offer and that might be easier than > trying to do it on x86_64. > >> On Dec 18, 2017, at 4:41 PM, Peter Woodman wrote: >> >> https://www.scaleway.com/ >>

Re: [ceph-users] Luminous on armhf

2017-12-18 Thread Peter Woodman
2017 at 4:38 PM, Andrew Knapp wrote: > I have no idea what this response means. > > I have tried building the armhf and arm64 package on my raspberry pi 3 to > no avail. Would love to see someone post Debian packages for stretch on > arm64 or armhf. > > On Dec 18, 2017 4:12

Re: [ceph-users] Luminous on armhf

2017-12-18 Thread Peter Woodman
YMMV, but I've been using Scaleway instances to build packages for arm64- AFAIK you should be able to run any armhf distro on those machines as well. On Mon, Dec 18, 2017 at 4:02 PM, Andrew Knapp wrote: > I would also love to see these packages!!! > > On Dec 18, 2017 3:46 PM, "Ean Price" wrote:

Re: [ceph-users] Random checksum errors (bluestore on Luminous)

2017-12-10 Thread Peter Woodman
IIRC there was a bug related to bluestore compression fixed between 12.2.1 and 12.2.2 On Sun, Dec 10, 2017 at 5:04 PM, Martin Preuss wrote: > Hi, > > > Am 10.12.2017 um 22:06 schrieb Peter Woodman: >> Are you using bluestore compression? > [...] > > As a matter of fact

Re: [ceph-users] Random checksum errors (bluestore on Luminous)

2017-12-10 Thread Peter Woodman
Are you using bluestore compression? On Sun, Dec 10, 2017 at 1:45 PM, Martin Preuss wrote: > Hi (again), > > meanwhile I tried > > "ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-0" > > but that resulted in a segfault (please see attached console log). > > > Regards > Martin > > > Am 10.1

Re: [ceph-users] The way to minimize osd memory usage?

2017-12-10 Thread Peter Woodman
I've had some success in this configuration by cutting the bluestore cache size down to 512mb and only one OSD on an 8tb drive. Still get occasional OOMs, but not terrible. Don't expect wonderful performance, though. Two OSDs would really be pushing it. On Sun, Dec 10, 2017 at 10:05 AM, David Tur

Re: [ceph-users] ceph-disk removal roadmap (was ceph-disk is now deprecated)

2017-11-30 Thread Peter Woodman
How quickly are you planning to cut 12.2.3? On Thu, Nov 30, 2017 at 4:25 PM, Alfredo Deza wrote: > Thanks all for your feedback on deprecating ceph-disk, we are very > excited to be able to move forwards on a much more robust tool and > process for deploying and handling activation of OSDs, remov

Re: [ceph-users] ceph-disk removal roadmap (was ceph-disk is now deprecated)

2017-11-30 Thread Peter Woodman
how quickly are you planning to cut 12.2.3? On Thu, Nov 30, 2017 at 4:25 PM, Alfredo Deza wrote: > Thanks all for your feedback on deprecating ceph-disk, we are very > excited to be able to move forwards on a much more robust tool and > process for deploying and handling activation of OSDs, remo

Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Peter Maloney
_ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Peter Maloney Brockmann Consult Max-Pl

Re: [ceph-users] Cluster hang (deep scrub bug? "waiting for scrub")

2017-11-10 Thread Peter Maloney
o identify the issue. >> >> Thank you. >> Regards, >> >> Matteo >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lis

Re: [ceph-users] Needed help to setup a 3-way replication between 2 datacenters

2017-11-10 Thread Peter Linder
On 11/10/2017 7:17 AM, Sébastien VIGNERON wrote: > Hi everyone, > > Beginner with Ceph, i’m looking for a way to do a 3-way replication > between 2 datacenters as mention in ceph docs (but not describe). > > My goal is to keep access to the data (at least read-only access) even > when the link betw

Re: [ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

2017-11-07 Thread Peter Maloney
bug_tp = 0/0 > debug_auth = 0/0 > debug_finisher = 0/0 > debug_heartbeatmap = 0/0 > debug_perfcounter = 0/0 > debug_asok = 0/0 > debug_throttle = 0/0 > debug_mon = 0/0 > debug_paxos = 0/0 > debug_rgw = 0/0 > > [osd] > osd op threads = 4 > osd disk threads = 2 &g

Re: [ceph-users] Ceph not recovering after osd/host failure

2017-10-16 Thread Peter Maloney
item osd.60 weight 1.818 > item osd.62 weight 1.818 > item osd.64 weight 1.818 > item osd.67 weight 1.818 > item osd.70 weight 1.818 > item osd.68 weight 1.818 > item osd.72 weight 1.818 > item osd.74 weight 1.818 > item osd.7

Re: [ceph-users] All replicas of pg 5.b got placed on the same host - how to correct?

2017-10-10 Thread Peter Linder
Probably chooseleaf also instead of choose. Konrad Riedel skrev: (10 oktober 2017 17:05:52 CEST) >Hello Ceph-users, > >after switching to luminous I was excited about the great >crush-device-class feature - now we have 5 servers with 1x2TB NVMe >based OSDs, 3 of them additionally with 4 HDDS per

Re: [ceph-users] All replicas of pg 5.b got placed on the same host - how to correct?

2017-10-10 Thread Peter Linder
I think your failure domain within your rules is wrong. step choose firstn 0 type osd Should be: step choose firstn 0 type host On 10/10/2017 5:05 PM, Konrad Riedel wrote: > Hello Ceph-users, > > after switching to luminous I was excited about the great > crush-device-class feature - now we ha

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-09 Thread Peter Linder
even a consideration there) to similar values that problem went away. Perhaps that is a bug? /Peter On 10/8/2017 3:22 PM, David Turner wrote: > > That's correct. It doesn't matter how many copies of the data you have > in each datacenter. The mons control the maps and you shoul

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-08 Thread Peter Linder
gt; in each datacenter. The mons control the maps and you should be good > as long as you have 1 mon per DC. You should test this to see how the > recovery goes, but there shouldn't be a problem. > > > On Sat, Oct 7, 2017, 6:10 PM Дробышевский, Владимир <mailto

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread Peter Linder
ut that would leave me manually balancing load. And if one node went down, some RBDs would completely loose their SSD read capability instead of just 1/3 of it...  perhaps acceptable, but not optimal :) /Peter > > On Sat, Oct 7, 2017 at 3:36 PM Peter Linder > mailto:peter.lin...@fiberdirek

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread Peter Linder
s selecting buckets?). > On Sat, Oct 7, 2017, 1:48 PM Peter Linder <mailto:peter.lin...@fiberdirekt.se>> wrote: > > On 10/7/2017 7:36 PM, Дробышевский, Владимир wrote: >> Hello! >> >> 2017-10-07 19:12 GMT+05:00 Peter Linder >> ma

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread Peter Linder
s. > > Op 7 okt. 2017 om 19:39 heeft Peter Linder > mailto:peter.lin...@fiberdirekt.se>> het > volgende geschreven: > >> On 10/7/2017 7:36 PM, Дробышевский, Владимир wrote: >>> Hello! >>> >>> 2017-10-07 19:12 GMT+05:00 Peter Linder >> <m

Re: [ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread Peter Linder
On 10/7/2017 7:36 PM, Дробышевский, Владимир wrote: > Hello! > > 2017-10-07 19:12 GMT+05:00 Peter Linder <mailto:peter.lin...@fiberdirekt.se>>: > > The idea is to select an nvme osd, and > then select the rest from hdd osds in different datacenters (see crush &g

[ceph-users] PGs get placed in the same datacenter (Trying to make a hybrid NVMe/HDD pool with 6 servers, 2 in each datacenter)

2017-10-07 Thread Peter Linder
Hello Ceph-users! Ok, so I've got 3 separate datacenters (low latency network in between) and I want to make a hybrid NMVe/HDD pool for performance and cost reasons. There are 3 servers with NVMe based OSDs, and 2 servers with normal HDDS (Yes, one is missing, will be 3 of course. It needs some m

[ceph-users] bluestore compression statistics

2017-09-18 Thread Peter Gervai
Hello, Is there any way to get compression stats of compressed bluestore storage? Thanks, Peter ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous CephFS on EC - how?

2017-08-30 Thread Peter Maloney
What kind of terrible mail client is this that sends a multipart message where one part is blank and that's the one Thunderbird chooses to show? (see blankness below) Yes you're on the right track. As long as the main fs is on a replicated pool (the one with omap), the ones below it (using file la

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-10 Thread Peter Maloney
"op": "chooseleaf_firstn", > >"num": 0, > >"type": "host" > >}, > >{ > >"op": "emit" > >} > >] > > } > > > # ceph osd crush rule dump ip-10-0-

Re: [ceph-users] IO Error reaching client when primary osd get funky but secondaries are ok

2017-08-09 Thread Peter Gervai
sd? Host (and datacenter). > What version of ceph are you running? See the first line of my mail: version 0.94.10 (hammer) Thanks, Peter ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] IO Error reaching client when primary osd get funky but secondaries are ok

2017-08-09 Thread Peter Gervai
versions. Your shared wisdom would be appreciated. Thanks, Peter ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Peter Maloney
On 08/03/17 11:05, Dan van der Ster wrote: > On Fri, Jul 28, 2017 at 9:42 PM, Peter Maloney > wrote: >> Hello Dan, >> >> Based on what I know and what people told me on IRC, this means basicaly the >> condition that the osd is not acting nor up for any pg. And for on

Re: [ceph-users] ceph osd safe to remove

2017-07-28 Thread Peter Maloney
0 0 0 0 True The "old" vs "new" suffixes refer to the position of data now and after recovery is complete, respectively. (the magic that made my reweight script efficient compared to the official reweight script) And I have not used such a method in

Re: [ceph-users] High iowait on OSD node

2017-07-27 Thread Peter Maloney
0.00 0.000.000.50 0.00 6.00 > 24.00 0.008.00 0.008.00 8.00 0.40 > dm-1 0.00 0.000.000.00 0.00 0.00 > 0.00 0.000.000.000.00 0.00 0.00 > > > >

Re: [ceph-users] Adding multiple osd's to an active cluster

2017-07-19 Thread Peter Gervai
On Fri, Feb 17, 2017 at 10:42 AM, nigel davies wrote: > How is the best way to added multiple osd's to an active cluster? > As the last time i done this i all most killed the VM's we had running on > the cluster You possibly mean that messing with OSDs caused the cluster to reorganise the date a

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Peter Maloney
... probably poor performance with sync writes on filestore, and not sure what would happen with bluestore... probably much better than filestore though if you use a large block size. > > > -Gencer. > > > -Original Message- > From: Peter Maloney [mailto:peter

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Peter Maloney
eed 200mb/s? What > prevents it im really wonder this. > > Gencer. > > On 2017-07-17 23:24, Peter Maloney wrote: >> You should have a separate public and cluster network. And journal or >> wal/db performance is important... are the devices fast NVMe? >> >> On

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread Peter Maloney
You should have a separate public and cluster network. And journal or wal/db performance is important... are the devices fast NVMe? On 07/17/17 21:31, gen...@gencgiyen.com wrote: > > Hi, > > > > I located and applied almost every different tuning setting/config > over the internet. I couldn’t m

Re: [ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Peter Maloney
___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Te

Re: [ceph-users] Specifying a cache tier for erasure-coding?

2017-07-07 Thread Peter Maloney
On 07/07/17 14:03, David Turner wrote: > > So many of your questions depends on what your cluster is used for. We > don't even know rbd or cephfs from what you said and that still isn't > enough to fully answer your questions. I have a much smaller 3 node > cluster using Erasure coding for rbds as

Re: [ceph-users] Adding storage to exiting clusters with minimal impact

2017-07-06 Thread Peter Maloney
Here's my possibly unique method... I had 3 nodes with 12 disks each, and when adding 2 more nodes, I had issues with the common method you describe, totally blocking clients for minutes, but this worked great for me: > my own method > - osd max backfills = 1 and osd recovery max active = 1 > - cr

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Peter Maloney
Hey, I have some SAS Micron S630DC-400 which came with firmware M013 which did the same or worse (takes very long... 100% blocked for about 5min for 16GB trimmed), and works just fine with firmware M017 (4s for 32GB trimmed). So maybe you just need an update. Peter On 07/06/17 18:39, Reed

Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-06-30 Thread Peter Maloney
On 06/30/17 05:21, Sage Weil wrote: > We're having a series of problems with the valgrind included in xenial[1] > that have led us to restrict all valgrind tests to centos nodes. At teh > same time, we're also seeing spurious ENOSPC errors from btrfs on both > centos on xenial kernels[2], makin

Re: [ceph-users] Very HIGH Disk I/O latency on instances

2017-06-29 Thread Peter Maloney
On 06/28/17 21:57, Gregory Farnum wrote: > > > On Wed, Jun 28, 2017 at 9:17 AM Peter Maloney > <mailto:peter.malo...@brockmann-consult.de>> wrote: > > On 06/28/17 16:52, keynes_...@wistron.com > <mailto:keynes_...@wistron.com> wrote: >> [.

Re: [ceph-users] Very HIGH Disk I/O latency on instances

2017-06-28 Thread Peter Maloney
On 06/28/17 16:52, keynes_...@wistron.com wrote: > > We were using HP Helion 2.1.5 ( OpenStack + Ceph ) > > The OpenStack version is *Kilo* and Ceph version is *firefly* > > > > The way we backup VMs is create a snapshot by Ceph commands (rbd > snapshot) then download (rbd export) it. > > > > W

Re: [ceph-users] Snapshot removed, cluster thrashed...

2017-06-26 Thread Peter Maloney
ble when doing snapshots and snap removal) And keep in mind all the "priority" stuff possibly doesn't have any effect without the cfq disk scheduler (at least in hammer... I think I've heard different for jewel and later). Check with: > grep . /sys/block/*/queue/scheduler --

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Peter Maloney
with scrub. Restarting the osd that is mentioned there (osd.155 in your case) will fix it for now. And tuning scrub changes the way it behaves (defaults make it happen more rarely than what I had before). -- Peter Maloney Brockmann Consult Max-Planck

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Peter Maloney
sat waiting to see when the > ones I care about will finally be handled so I can get on with replacing > those disks. > > Rich > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Peter Rosell
That sound like an easy rule to follow. thanks again for you reply. /Peter mån 19 juni 2017 kl 10:19 skrev Wido den Hollander : > > > Op 19 juni 2017 om 9:55 schreef Peter Rosell : > > > > > > I have my servers on UPS and shutdown them manually the way I use to tur

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Peter Rosell
I have my servers on UPS and shutdown them manually the way I use to turn them off. There where enough power in the UPS after the servers were shutdown because is continued to beep. Anyway, I will wipe it and re-add it. Thanks for your reply. /Peter mån 19 juni 2017 kl 09:11 skrev Wido den

[ceph-users] FAILED assert(i.first <= i.last)

2017-06-18 Thread Peter Rosell
3 island sh[7068]: 14: (clone()+0x6d) [0x7f4fe05e3b5d] Jun 18 13:52:23 island sh[7068]: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Jun 18 13:52:23 island sh[7068]: --- begin dump of recent events --- Jun 18 13:52:23 island sh[7068]: -2051> 2017-06-18 13:50:36.086036 7f4fe36bb8c0 5 asok(0x559fef2d6000) register_command perfcounters_dump hook 0x559fef216030 /Peter ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] removing cluster name support

2017-06-11 Thread Peter Maloney
On 06/08/17 21:37, Sage Weil wrote: > Questions: > > - Does anybody on the list use a non-default cluster name? > - If so, do you have a reason not to switch back to 'ceph'? > > Thanks! > sage Will it still be possible for clients to use multiple clusters? Also how does this affect rbd mirroring

Re: [ceph-users] PG that should not be on undersized+degraded on multi datacenter Ceph cluster

2017-06-07 Thread Peter Maloney
On 06/06/17 19:23, Alejandro Comisario wrote: > Hi all, i have a multi datacenter 6 nodes (6 osd) ceph jewel cluster. > There are 3 pools in the cluster, all three with size 3 and min_size 2. > > Today, i shut down all three nodes (controlled and in order) on > datacenter "CPD2" just to validate th

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney
On 06/02/17 12:25, koukou73gr wrote: > On 2017-06-02 13:01, Peter Maloney wrote: >>> Is it easy for you to reproduce it? I had the same problem, and the same >>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for >>> a gcore dump of a hung p

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney
recautions before > posting potentially sensitive data (for example, logs or data > directories that contain Ceph secrets). > -K. > > > On 2017-06-02 12:59, Peter Maloney wrote: >> On 06/01/17 17:12, koukou73gr wrote: >>> Hello list, >>> >>> Today I

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney
On 06/02/17 11:59, Peter Maloney wrote: > On 06/01/17 17:12, koukou73gr wrote: >> Hello list, >> >> Today I had to create a new image for a VM. This was the first time, >> since our cluster was updated from Hammer to Jewel. So far I was just >> copying an existi

  1   2   3   >