date:20180131

[ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread shadow_lin

Hi list, if I create an osd with journal(wal,db if it is bluestore) in the same hdd, I use ceph-disk zap to clean the disk when I want to remove the osd and clean the data on the disk. But if I use a ssd partition as the journal(wal,db if it is bluestore) , how should I clean the journal (wal,db

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread David Turner

I use gdisk to remove the partition and partprobe for the OS to see the new partition table. You can script it with sgdisk. On Wed, Jan 31, 2018, 4:10 AM shadow_lin wrote: > Hi list, > if I create an osd with journal(wal,db if it is bluestore) in the same > hdd, I use ceph-disk zap to clean the

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread Wido den Hollander

On 01/31/2018 10:24 AM, David Turner wrote: I use gdisk to remove the partition and partprobe for the OS to see the new partition table. You can script it with sgdisk. That works indeed! I usually write 100M as well using dd just to be sure any other left-overs are gone. $ dd if=/dev/zer

[ceph-users] Can't enable backfill because of "recover_replicas: object added to missing set for backfill, but is not in recovering, error!"

2018-01-31 Thread Philip Poten

Hello, i have this error message: 2018-01-25 00:59:27.357916 7fd646ae1700 -1 osd.3 pg_epoch: 9393 pg[9.139s0( v 8799'82397 (5494'79049,8799'82397] local-lis/les=9392/9393 n=10003 ec=1478/1478 lis/c 9392/6304 les/c/f 9393/6307/807 9391/9392/9392) [3,6,12,9]/[3,6,2147483647,4] r=0 lpr=9392 pi=[6304

Re: [ceph-users] cephfs(10.2.10, kernel client4.12 ), gitlab use cephfs as backend storage, git push error, report "Permission denied"

2018-01-31 Thread Yan, Zheng

> On 31 Jan 2018, at 15:23, donglifec...@gmail.com wrote: > > ZhengYan, > > I meet a problem, I use cephfs(10.2.10, kernel client4.12) as backend > storage when config gitlab, so: > 1. git clone ssh://git@10.100.161.182/source/test.git > 2. git add test.file > 3.git commit -am "test" > 4.git

[ceph-users] cephfs(10.2.10, kernel client4.12 ), gitlab use cephfs as backend storage, git push error, report "Permission denied"

2018-01-31 Thread donglifec...@gmail.com

ZhengYan, I meet a problem, I use cephfs(10.2.10, kernel client4.12) as backend storage when config gitlab, so: 1. git clone ssh://git@10.100.161.182/source/test.git 2. git add test.file 3.git commit -am "test" 4.git push origin master, error message: Counting objects: 3, done. Writing objects

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Thomas Bennett

Hi Peter, >From your reply, I see that: 1. pg 3.12c is part of pool 3. 2. The osd's in the "up" for pg 3.12c are: 6, 0, 12. I suggest to check on this 'activating' issue do the following: 1. What is the rule that pool 3 should follow, 'hybrid', 'nvme' or 'hdd'? (Use the *ceph osd

Re: [ceph-users] Cephalocon APAC Call for Proposals

2018-01-31 Thread Leonardo Vaz

On Thu, Jan 25, 2018 at 11:41 AM, Leonardo Vaz wrote: > Hey Cephers, > > This is a friendly reminder that the Call for Proposals for the > Cephalocon APAC 2018[1] ends next Wednesday, January 31st. > > [1] http://cephalocon.doit.com.cn/guestreg_en.html > > If you haven't submitted your proposal s

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski

Hi, I'm wondering why slow requests are being reported mainly when the request has been put into the queue for processing by its PG (queued_for_pg , http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request ). Could it be due too low pg_num/pgp_num ? I

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Thomas Bennett

Hi Peter, Relooking at your problem, you might want to keep track of this issue: http://tracker.ceph.com/issues/22440 Regards, Tom On Wed, Jan 31, 2018 at 11:37 AM, Thomas Bennett wrote: > Hi Peter, > > From your reply, I see that: > >1. pg 3.12c is part of pool 3. >2. The osd's in the

[ceph-users] ceph auth list

2018-01-31 Thread Marc Roos

I have some osd's with this auth? I guess this osw is incorrect not, should be osd? osd.12 key: xx== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osw] allow * ___ ceph-users mailing list ceph-u

Re: [ceph-users] ceph auth list

2018-01-31 Thread John Spray

On Wed, Jan 31, 2018 at 12:53 PM, Marc Roos wrote: > > I have some osd's with this auth? I guess this osw is incorrect not, > should be osd? Right. John > > osd.12 > key: xx== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osw] allow

Re: [ceph-users] ceph auth list

2018-01-31 Thread Marc Roos

ceph auth caps osd.10 mgr 'allow profile osd' mon 'allow profile osd' osd 'allow *' Generates this: osd.10 key: x== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * ceph auth caps osd.10 mgr 'profile osd' mon 'profile osd

Re: [ceph-users] ceph auth list

2018-01-31 Thread John Spray

On Wed, Jan 31, 2018 at 1:08 PM, Marc Roos wrote: > > > ceph auth caps osd.10 mgr 'allow profile osd' mon 'allow profile osd' > osd 'allow *' > Generates this: > osd.10 > key: x== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd]

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-01-31 Thread Janne Johansson

2018-01-30 17:24 GMT+01:00 Bryan Banister : > Hi all, > > > > We are still very new to running a Ceph cluster and have run a RGW cluster > for a while now (6-ish mo), it mainly holds large DB backups (Write once, > read once, delete after N days). The system is now warning us about an OSD > that

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski

Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes and default.rgw.buckets.data pools? How will it impact cluster behavior? I guess cluster rebalancing will occur and will take long time considering amount of data we have on it ? Regards Jakub On Wed, Jan 31, 2018 at 1:3

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-01-31 Thread Bryan Banister

Thanks for the response, Janne! Here is what test-reweight-by-utilization gives me: [root@carf-ceph-osd01 ~]# ceph osd test-reweight-by-utilization no change moved 12 / 4872 (0.246305%) avg 36.6316 stddev 5.37535 -> 5.29218 (expected baseline 6.02961) min osd.48 with 25 -> 25 pgs (0.682471 -> 0.6

Re: [ceph-users] High apply latency

2018-01-31 Thread Luis Periquito

on a cursory look of the information it seems the cluster is overloaded with the requests. Just a guess, but if you look at IO usage on those spindles they'll be at or around 100% usage most of the time. If that is the case then increasing the pg_num and pgp_num won't help, and short term, will m

[ceph-users] problem with automounting cephfs on KVM VM boot

2018-01-31 Thread knawnd

Hello! I need to mount automatically cephfs on KVM VM boot . I tried to follow recommendations mentioned at http://docs.ceph.com/docs/master/cephfs/fstab/ but in both cases (kernel mode or fuse) as well as by specifying mounting command in /etc/rc.local it always fails to get mounted cephfs s

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-01-31 Thread Janne Johansson

2018-01-31 15:58 GMT+01:00 Bryan Banister : > > > > Given that this will move data around (I think), should we increase the > pg_num and pgp_num first and then see how it looks? > > > I guess adding pgs and pgps will move stuff around too, but if the PGCALC formula says you should have more then

[ceph-users] Ceph luminous - throughput performance issue

2018-01-31 Thread Steven Vacaroaia

Hi, Is there anyone using DELL servers with PERC controllers willing to provide advise on configuring it for good throughput performance ? I have 3 servers with 1 SSD and 3 HDD each All drives are Entreprise grade Connector : 00: Slot 0 Vendor Id

[ceph-users] ceph - OS on SD card

2018-01-31 Thread Steven Vacaroaia

Hi, Is the performance of a cluster depended on where the OS is running from ? Example: OS installed on SSD OS installed on HDD OS installed on SD Using atop I noticed that, during bench test, the SD OS partition is used at 100% quite often Thanks STeven ___

[ceph-users] Custom Prometheus alerts for Ceph?

2018-01-31 Thread Lenz Grimmer

Just curious, is anyone aware of $SUBJECT? As Prometheus provides a built-in alert mechanism [1], are there any custom rules that people use to receive notifications about critical situations in a Ceph cluster? Would it make sense to collect these and have them included in a git repo under the Ce

Re: [ceph-users] ceph - OS on SD card

2018-01-31 Thread David Turner

It probably depends where your mon daemon is running from as well as where your logging is going. As long as everything inside of /var/lib/ceph/ is not mounted on the SD card and your logging for Ceph isn't going to /var/log/ceph (unless that too is mounted elsewhere), then I don't think the SD ca

Re: [ceph-users] Custom Prometheus alerts for Ceph?

2018-01-31 Thread John Spray

On Wed, Jan 31, 2018 at 4:11 PM, Lenz Grimmer wrote: > > Just curious, is anyone aware of $SUBJECT? As Prometheus provides a > built-in alert mechanism [1], are there any custom rules that people use > to receive notifications about critical situations in a Ceph cluster? > > Would it make sense to

Re: [ceph-users] Ceph luminous - throughput performance issue

2018-01-31 Thread Steven Vacaroaia

Hi Sean, Thanks for your willingness to help I used RAID0 because HBA mode in not available on PERC H710 Did misunderstood you ? How can you set RAID level to NONE? Running fio with more jobs provide results closer to the expected throughput ( 450MB/s) for SSD drive fio --filename=/dev/sda --d

Re: [ceph-users] Ceph luminous - throughput performance issue

2018-01-31 Thread Andrew Ferris

Dell calls those sort of drives "Non-RAID" drives and that's what you would set them to be in either the iDRAC or the PERC BIOS. Andrew Ferris Network & System Management UBC Centre for Heart & Lung Innovation St. Paul's Hospital, Vancouver http://www.hli.ubc.ca >>> Steven Vacaroaia 1/31

Re: [ceph-users] OSDs failing to start after host reboot

2018-01-31 Thread Alfredo Deza

On Tue, Jan 30, 2018 at 3:23 PM, Andre Goree wrote: > On 2018/01/29 2:31 pm, Alfredo Deza wrote: > >>> So I'm wondering what my options are at this point. Perhaps rebuild this >>> OSD node, using ceph-volume and 'simple', but would not be able to use >>> encryption? >> >> >> Ungh, I forgot to men

Re: [ceph-users] Ceph luminous - throughput performance issue

2018-01-31 Thread Kevin Hrpcek

Steven, I've recently done some performance testing on dell hardware. Here are some of my messy results. I was mainly testing the effects of the R0 stripe sizing on the perc card. Each disk has it's own R0 so that write back is enabled. VDs were created like this but with different stripesize

[ceph-users] Switching failure domains

2018-01-31 Thread Bryan Stillwell

We're looking into switching the failure domains on several of our clusters from host-level to rack-level and I'm trying to figure out the least impactful way to accomplish this. First off, I've made this change before on a couple large (500+ OSDs) OpenStack clusters where the volumes, images, and

Re: [ceph-users] Can't enable backfill because of "recover_replicas: object added to missing set for backfill, but is not in recovering, error!"

2018-01-31 Thread Gregory Farnum

On Wed, Jan 31, 2018 at 1:40 AM Philip Poten wrote: > Hello, > > i have this error message: > > 2018-01-25 00:59:27.357916 7fd646ae1700 -1 osd.3 pg_epoch: 9393 > pg[9.139s0( v 8799'82397 (5494'79049,8799'82397] local-lis/les=9392/9393 > n=10003 ec=1478/1478 lis/c 9392/6304 les/c/f 9393/6307/807 9

Re: [ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-31 Thread Gregory Farnum

On Tue, Jan 30, 2018 at 5:49 AM Alessandro De Salvo < alessandro.desa...@roma1.infn.it> wrote: > Hi, > > we have several times a day different OSDs running Luminous 12.2.2 and > Bluestore crashing with errors like this: > > > starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 > /var/lib/ceph/os

[ceph-users] Silly question regarding PGs/per OSD

2018-01-31 Thread Martin Preuss

Hi, I guess this is an extremely silly question but... I often read that the ideal PG/OSD ratio should be 100-200 PGs per OSD. How is this calculated? When I do "ceph -s" it correctly says I have 320 PGs in 5 pools. However, this doesn't account for the replicas, does it? I mean I have the foll

Re: [ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-31 Thread Alessandro De Salvo

Hi Greg, many thanks. This is a new cluster created initially with luminous 12.2.0. I'm not sure the instructions on jewel really apply on my case too, and all the machines have ntp enabled, but I'll have a look, many thanks for the link. All machines are set to CET, although I'm running over

[ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread Steven Vacaroaia

Hi, Why is ceph osd tree reports that osd.4 is up when the server on which osd.4 is running is actually down ?? Any help will be appreciated [root@osd01 ~]# ping -c 2 osd02 PING osd02 (10.10.30.182) 56(84) bytes of data. >From osd01 (10.10.30.181) icmp_seq=1 Destination Host Unreachable >From os

[ceph-users] Disaster Backups

2018-01-31 Thread Dyweni - Ceph-Users

Hi, I'm trying to plan for a disaster, in which all data and all hardware (excluding the full set of Ceph OSD data drives) is lost. What data do I need to backup in order to put those drives into new machines and startup my cluster? Would a flat file backup of /var/lib/ceph/mon (while the

Re: [ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread Marc Roos

Maybe the process is still responding on an active session? If you can't ping a host, that only means you cannot ping it. -Original Message- From: Steven Vacaroaia [mailto:ste...@gmail.com] Sent: woensdag 31 januari 2018 19:47 To: ceph-users Subject: [ceph-users] Ceph - incorrect outp

Re: [ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread Andras Pataki

There is a config option "mon osd min up ratio" (defaults to 0.3) - and if too many OSDs are down, the monitors will not mark further OSDs down. Perhaps that's the culprit here? Andras On 01/31/2018 02:21 PM, Marc Roos wrote: Maybe the process is still responding on an active session? If

Re: [ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread Maged Mokhtar

try setting: mon_osd_min_down_reporters = 1 On 2018-01-31 20:46, Steven Vacaroaia wrote: > Hi, > > Why is ceph osd tree reports that osd.4 is up when the server on which osd.4 > is running is actually down ?? > > Any help will be appreciated > > [root@osd01 ~]# ping -c 2 osd02 > PING

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Peter Linder

Yes, this did turn out to be our main issue. We also had a smaller issue, but this was the one that caused parts of our pools to go offline for a short time. Or, 'cause' was us adding some new NVMe drives that were much larger than the ones we already had so too many PGs got mapped to them but

Re: [ceph-users] OSDs failing to start after host reboot

2018-01-31 Thread Andre Goree

On 2018/01/31 12:20 pm, Alfredo Deza wrote: I was going to ask about encryption support (again) for lvm, as I see it's mentioned here in master/docs (http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html#lvm) and I remembered you mentioned ceph-volume supported it...then I just re-rea

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski

Hi Luis, Thanks for your comment, I see high %util for few HDDs per each ceph node but actually there is very low traffic from client. iostat -xd shows ongoing operations Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda

[ceph-users] recovered osds come back into cluster with 2-3 times the data

2018-01-31 Thread Andrew Ferris

We are running jewel (10.2.10) on our Ceph cluster with 6 OSDs and 3 MONs. 144 8TB drives across the 6 OSD hosts with uniform weights. In tests to simulate the failure of one entire OSD host or even just a few drives on an OSD host we see that each osd drive we add back in comes back in with a

Re: [ceph-users] Disaster Backups

2018-01-31 Thread Gregory Farnum

On Wed, Jan 31, 2018 at 11:05 AM, Dyweni - Ceph-Users <6exbab4fy...@dyweni.com> wrote: > Hi, > > I'm trying to plan for a disaster, in which all data and all hardware > (excluding the full set of Ceph OSD data drives) is lost. What data do I > need to backup in order to put those drives into new m

Re: [ceph-users] High apply latency

2018-01-31 Thread Sergey Malinin

Deep scrub is I/O-expensive. If deep scrub is unnecessary, you can disable it with "ceph osd pool set nodeep-scrub". On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote: > 3active+clean+scrubbing+deep ___ ceph-users mail

Re: [ceph-users] Switching failure domains

2018-01-31 Thread David Turner

I don't know if a non-impactful way to change this. If any host, rack, etc IDs change it will cause movement. If any crush rule changes where it chooses from our the failure domain, it will cause movement. I once ran a test cluster where I changed every host to be in its own "rack" just to change

Re: [ceph-users] Silly question regarding PGs/per OSD

2018-01-31 Thread David Turner

Yes, the recommendation is taking into account the number of replicas. If you have size=3, then multiply that pool's PG count by 3. If you have EC M=4 K=2, then multiply that pool's PGs by 6. You want to take into account all copies of a PG for the 100-200 PG/osd count. On Wed, Jan 31, 2018, 1:44

Re: [ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread David Turner

I agree with Maged that perhaps not enough osds were able to report the osd as down to the mons. Setting that variable will make sure that any 1 osd can report any other osd as down. I usually prefer seeing that value to at least 1 more than a single host so that a networking event on a single note

[ceph-users] Any issues with old tunables (cluster/pool created at dumpling)?

2018-01-31 Thread David Majchrzak

Hi, Upgrading an old cluster that was created with dumpling up to luminous soon (with a quick stop at jewel, currently upgrading deb7 -> deb8 so we can get any newer packages). My idea is to keep the tuneables as they are, since this pool has active data and I've already disabled tunable warni

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread shadow_lin

Hi David, Thanks for your reply. I am wondering what if I don't remove the journal(wal,db for bluestore) partion on the ssd and only zap the data disk.Then I assign the journal(wal,db for bluestore) partion to a new osd.What would happen? 2018-02-01 lin.yunfan 发件人：David Turner 发送时间：2018-0

Re: [ceph-users] cephfs(10.2.10, kernel client4.12 ), gitlab use cephfs as backend storage, git push error, report "Permission denied"

2018-01-31 Thread donglifec...@gmail.com

ZhengYan, I only do "chown -R git:wwwgrp-phabricator /mnt/fstest/", "/mnt/fstest" is cephfs dir. donglifec...@gmail.com From: Yan, Zheng Date: 2018-01-31 18:12 To: donglifec...@gmail.com CC: ceph-users Subject: Re: [ceph-users]cephfs(10.2.10, kernel client4.12 ), gitlab use cephfs as back

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread David Turner

I know that for filestore journals that is fine. I think it is also safe for bluestore. Doing Wido's recommendation of writing 100MB would be a good idea, but not necessary. On Wed, Jan 31, 2018, 10:10 PM shadow_lin wrote: > Hi David, > Thanks for your reply. > I am wondering what if I don't r

Re: [ceph-users] Can't enable backfill because of "recover_replicas: object added to missing set for backfill, but is not in recovering, error!"

2018-01-31 Thread Philip Poten

2018-01-31 19:20 GMT+01:00 Gregory Farnum : > On Wed, Jan 31, 2018 at 1:40 AM Philip Poten > wrote: > >> Hello, >> >> i have this error message: >> >> 2018-01-25 00:59:27.357916 7fd646ae1700 -1 osd.3 pg_epoch: 9393 >> pg[9.139s0( v 8799'82397 (5494'79049,8799'82397] local-lis/les=9392/9393 >> n=1

[ceph-users] LVM+bluestore via ceph-volume vs bluestore via ceph-disk

2018-01-31 Thread Brady Deetz

I recently became aware that LVM has become a component of the preferred OSD provision process when using ceph-volume. We'd already started our migration to bluestore before ceph-disk's deprecation was announced and decided to stick with the process with which we started. I'm concerned my decision

Re: [ceph-users] Importance of Stable Mon and OSD IPs

2018-01-31 Thread Mayank Kumar

Thanks Gregory and Burkhard In kubernetes we use rbd create and rbd map/unmap commands. In this perspective are you referring to rbd as the client or after the image is created and mapped, is there a different client running inside the kernel that you are referring to which can get osd and mon up

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread Maged Mokhtar

I would recommend as Wido to use the dd command. block db device holds the metada/allocation of objects stored in data block, not cleaning this is asking for problems, besides it does not take any time. In our testing building new custer on top of older installation, we did see many cases where os

[ceph-users] Bluestore osd daemon crash

2018-01-31 Thread jaywaychou

HI , Dear cephers:my lab env:ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)Yestoday , I restart all my OSD using systemctl restart ceph-osd.target and it stuck in fsck mount ,But I don't think about this.Today, I set bluestore fsck on mount

Re: [ceph-users] cephfs(10.2.10, kernel client4.12 ), gitlab use cephfs as backend storage, git push error, report "Permission denied"

2018-01-31 Thread donglifec...@gmail.com

ZhengYan, I find "git push origin master", git generate "VAX COFF executable" file error, The screenshot below: donglifec...@gmail.com From: donglifec...@gmail.com Date: 2018-02-01 11:25 To: zyan CC: ceph-users Subject: Re: Re: [ceph-users]cephfs(10.2.10, kernel client4.12 ), gitlab use c

Re: [ceph-users] Importance of Stable Mon and OSD IPs

2018-01-31 Thread Burkhard Linke

Hi, On 02/01/2018 07:21 AM, Mayank Kumar wrote: Thanks Gregory and Burkhard In kubernetes we use rbd create and rbd map/unmap commands. In this perspective are you referring to rbd as the client or after the image is created and mapped, is there a different client running inside the kernel

59 matches

Mail list logo