date:20171019

Re: [ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-19 Thread Marc Roos

What about not using deploy? -Original Message- From: Sean Sullivan [mailto:lookcr...@gmail.com] Sent: donderdag 19 oktober 2017 2:28 To: ceph-users@lists.ceph.com Subject: [ceph-users] Luminous can't seem to provision more than 32 OSDs per server I am trying to install Ceph lumino

[ceph-users] [filestore][journal][prepare_entry] rebuild data_align is 4086, maybe a bug

2017-10-19 Thread zhaomingyue

Hi: when I analyzed the performance of ceph, I found that rebuild_aligned was time-consuming, and the analysis found that rebuild operations were performed every time. Source code: FileStore::queue_transactions –> journal->prepare_entry(o->tls, &tbl); -> data_align = ((*p).get_data_alignment() -

Re: [ceph-users] High mem with Luminous/Bluestore

2017-10-19 Thread Hans van den Bogert

> Memory usage is still quite high here even with a large onode cache! > Are you using erasure coding? I recently was able to reproduce a bug in > bluestore causing excessive memory usage during large writes with EC, > but have not tracked down exactly what's going on yet. > > Mark No, this is

Re: [ceph-users] Slow requests

2017-10-19 Thread Sean Purdy

Are you using radosgw? I found this page useful when I had a similar issue: http://www.osris.org/performance/rgw.html Sean On Wed, 18 Oct 2017, Ольга Ухина said: > Hi! > > I have a problem with ceph luminous 12.2.1. It was upgraded from kraken, > but I'm not sure if it was a problem in kraken

Re: [ceph-users] ceph inconsistent pg missing ec object

2017-10-19 Thread Stijn De Weirdt

hi greg, i attached the gzip output of the query and some more info below. if you need more, let me know. stijn > [root@mds01 ~]# ceph -s > cluster 92beef0a-1239-4000-bacf-4453ab630e47 > health HEALTH_ERR > 1 pgs inconsistent > 40 requests are blocked > 512 sec >

Re: [ceph-users] Slow requests

2017-10-19 Thread Ольга Ухина

Mostly I'm using ceph as a storage to my vms in Proxmox. I have radosgw but only for tests. It doesn't seem cause of problem. I've tuned these parameters. It should improve speed of requests in recovery stage, but I receive warnings anyway: osd_client_op_priority = 63 osd_recovery_op_priority = 1 o

[ceph-users] how does recovery work

2017-10-19 Thread Dennis Benndorf

Hello @all, givin the following config: * ceph.conf: ... mon osd down out subtree limit = host osd_pool_default_size = 3 osd_pool_default_min_size = 2 ... * each OSD has its journal on a 30GB partition on a PCIe-Flash-Card * 3 hosts What would

Re: [ceph-users] OSD crashed while reparing inconsistent PG luminous

2017-10-19 Thread Ana Aviles

Hi Greg, Thanks for your findings! We've updated the issue with the log files of osd.93 and osd.69 which corresponds to the period of the log we posted. Also, we've recreated a new set of logs for that pair of OSDs. As we explain in the issue, right now the OSDs fail on that other assert you menti

[ceph-users] PG's stuck unclean active+remapped

2017-10-19 Thread Roel de Rooy

Hi all, I'm hoping some of you have some experience in dealing with this, as unfortunately this is the first time we encountered this issue. We currently have placement groups that are stuck unclean with 'active+remapped' as last state. The rundown of what happened: Yesterday morning, one of o

[ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread Caspar Smit

Hi all, I'm testing some scenario's with the new Ceph luminous/bluestore combination. I've created a demo setup with 3 nodes (each has 10 HDD's and 2 SSD's) So i created 10 BlueStore OSD's with a seperate 20GB block.db on the SSD's (5 HDD's per block.db SSD). I'm testing a failure of one of thos

Re: [ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread David Turner

I'm speaking to the method in general and don't know the specifics of bluestore. Recovering from a failed journal in this way is only a good idea if you were able to flush the journal before making a new one. If the journal failed during operation and you couldn't cleanly flush the journal, then

[ceph-users] Erasure code settings

2017-10-19 Thread Josy

Hi, I would like to set up an erasure code profile with k=10 amd m=4 settings. Is there any minimum requirement of OSD nodes and OSDs to achieve this setting ? Can I create a pool with 8 OSD servers, with one disk each in it ? ___ ceph-users mail

Re: [ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread Caspar Smit

Hi David, Thank you for your answer, but wouldn't scrub (deep-scrub) handle that? It will flag the unflushed journal pg's as inconsistent and you would have to repair the pg's. Or am i overlooking something here? The official blog doesn't state anything about this method being a bad idea. Caspar

Re: [ceph-users] Erasure code settings

2017-10-19 Thread Denes Dolhay

Hi, If you want to split your data to 10 peaces (stripes), and hold 4 parity peaces in extra (so your cluster can handle the loss of any 4 osds), then you need a minimum of 14 osds to hold your data. Denes. On 10/19/2017 04:24 PM, Josy wrote: Hi, I would like to set up an erasure code pr

Re: [ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread Wido den Hollander

> Op 19 oktober 2017 om 16:47 schreef Caspar Smit : > > > Hi David, > > Thank you for your answer, but wouldn't scrub (deep-scrub) handle > that? It will flag the unflushed journal pg's as inconsistent and you > would have to repair the pg's. Or am i overlooking something here? The > official b

Re: [ceph-users] Erasure code settings

2017-10-19 Thread Josy

Hi, I created a testprofile, but not able to create a pool using it == $ ceph osd erasure-code-profile get testprofile1 crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=10 m=4 plugin=jerasure technique=reed_sol_van w=8 $ ceph osd pool cr

Re: [ceph-users] how does recovery work

2017-10-19 Thread Richard Hesketh

On 19/10/17 11:00, Dennis Benndorf wrote: > Hello @all, > > givin the following config: > > * ceph.conf: > > ... > mon osd down out subtree limit = host > osd_pool_default_size = 3 > osd_pool_default_min_size = 2 > ... > > * each OSD has its j

[ceph-users] Ceph delete files and status

2017-10-19 Thread nigel davies

Hay all I am looking at my small test Ceph cluster, i have uploaded a 200MB iso and checked the space on "ceph status" and see it incress. But when i delete the file the space used does not go down. Have i missed a configuration somewhere or something? ___

Re: [ceph-users] Erasure code settings

2017-10-19 Thread Josy

Please ignore. I found the mistake. On 19-10-2017 21:08, Josy wrote: Hi, I created a testprofile, but not able to create a pool using it == $ ceph osd erasure-code-profile get testprofile1 crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false

Re: [ceph-users] Ceph delete files and status

2017-10-19 Thread Jamie Fargen

Nigel- What method did you use to upload and delete the file? How did you check the space utilization? I believe the reason that you are still seeing the space being utilized when you issue your ceph -df is because even after the file is deleted, the file system doesn't actually delete the file, i

Re: [ceph-users] Ceph delete files and status

2017-10-19 Thread Ronny Aasen

On 19.10.2017 17:54, nigel davies wrote: Hay all I am looking at my small test Ceph cluster, i have uploaded a 200MB iso and checked the space on "ceph status" and see it incress. But when i delete the file the space used does not go down. Have i missed a configuration somewhere or something

Re: [ceph-users] Slow requests

2017-10-19 Thread J David

On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина wrote: > I have a problem with ceph luminous 12.2.1. > […] > I have slow requests on different OSDs on random time (for example at night, > but I don’t see any problems at the time of problem > […] > 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:

Re: [ceph-users] auth error with ceph-deploy on jewel to luminous upgrade

2017-10-19 Thread John Spray

On Wed, Oct 18, 2017 at 6:34 PM, Gary Molenkamp wrote: > Sorry to reply to my own question, but I noticed that the cephx key for > client.bootstrap-mgr was inconsistent with the key in > /var/lib/ceph/bootstrap-mgr/ceph.keyring. > > I deleted the entry in ceph: > > ceph auth del client.bootstr

Re: [ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-19 Thread Sean Sullivan

I have tried using ceph-disk directly and i'm running into all sorts of trouble but I'm trying my best. Currently I am using the following cobbled script which seems to be working: https://github.com/seapasulli/CephScripts/blob/master/provision_storage.sh I'm at 11 right now. I hope this works. ___

[ceph-users] Not able to start OSD

2017-10-19 Thread Josy

Hi, I am not able to start some of the OSDs in the cluster. This is a test cluster and had 8 OSDs. One node was taken out for maintenance. I set the noout flag and after the server came back up I unset the noout flag. Suddenly couple of OSDs went down. And now I can start the OSDs manually

Re: [ceph-users] Not able to start OSD

2017-10-19 Thread Jean-Charles Lopez

Hi, have you checked the output of "ceph-disk list” on the nodes where the OSDs are not coming back on? This should give you a hint on what’s going one. Also use dmesg to search for any error message And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced by the OSD its

[ceph-users] Bluestore compression and existing CephFS filesystem

2017-10-19 Thread Michael Sudnick

Hello, I recently migrated to Bluestore on Luminous and have enabled aggressive snappy compression on my CephFS data pool. I was wondering if there was a way to see how much space was being saved. Also, are existing files compressed at all, or do I have a bunch of resyncing ahead of me? Sorry if th

[ceph-users] RBD-image permissions

2017-10-19 Thread Jorge Pinilla López

I want to give permissions to my clients but only for reading/writting an specific RBD image not the hole pool. If I give permissions to the hole pool, a client could delete all the images in the pool or mount any other image and I don't really want that. I've read about using prefix (https://blo

Re: [ceph-users] Not able to start OSD

2017-10-19 Thread Josy

Hi, >> have you checked the output of "ceph-disk list” on the nodes where the OSDs are not coming back on? Yes, it shows all the disk correctly mounted. >> And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced by the OSD itself when it starts. This is the error mess

Re: [ceph-users] RBD-image permissions

2017-10-19 Thread Jason Dillaman

The most realistic backlog feature would be for adding support for namespaces within RBD [1], but it's not being actively developed at the moment. Of course, the usual caveat that "everyone with access to the cluster network would be trusted" would still apply. It's because of that assumption that

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Russell Glaue

I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as suggested. Out of the 4 servers: 3 of them performed with 17% to 30% disk %busy, and 11% CPU wait. Momentarily spiking up to 50% on one server, and 80% on another The 2nd newest server was almost averaging 90% disk %busy an

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread David Turner

Have you ruled out the disk controller and backplane in the server running slower? On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue wrote: > I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as > suggested. > > Out of the 4 servers: > 3 of them performed with 17% to 30% disk %

[ceph-users] Erasure code failure

2017-10-19 Thread Jorge Pinilla López

Imagine we have a 3 OSDs cluster and I make an erasure pool with k=2 m=1. If I have an OSD fail, we can rebuild the data but (I think) the hole cluster won't be able to perform IOS. Wouldn't be possible to make the cluster work in a degraded mode? I think it would be a good idea to make the clust

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Russell Glaue

No, I have not ruled out the disk controller and backplane making the disks slower. Is there a way I could test that theory, other than swapping out hardware? -RG On Thu, Oct 19, 2017 at 3:44 PM, David Turner wrote: > Have you ruled out the disk controller and backplane in the server running > s

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread David Turner

Assuming the problem with swapping out hardware is having spare hardware... you could always switch hardware between nodes and see if the problem follows the component. On Thu, Oct 19, 2017 at 4:49 PM Russell Glaue wrote: > No, I have not ruled out the disk controller and backplane making the >

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Jean-Charles Lopez

Hi Russell, as you have 4 servers, assuming you are not doing EC pools, just stop all the OSDs on the second questionable server, mark the OSDs on that server as out, let the cluster rebalance and when all PGs are active+clean just replay the test. All IOs should then go only to the other 3 se

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Russell Glaue

I'm better off trying to solve the first hurdle. This ceph cluster is in production serving 186 guest VMs. -RG On Thu, Oct 19, 2017 at 3:52 PM, David Turner wrote: > Assuming the problem with swapping out hardware is having spare > hardware... you could always switch hardware between nodes and s

Re: [ceph-users] Erasure code failure

2017-10-19 Thread David Turner

In a 3 node cluster with EC k=2 m=1, you can turn off one of the nodes and the cluster will still operate normally. If you lose a disk during this state or another server goes offline, then you lose access to your data. But assuming that you bring up the third node and let it finish backfilling/re

Re: [ceph-users] Erasure code failure

2017-10-19 Thread Jorge Pinilla López

Well I was trying it some days ago and it didn't work for me. maybe because of this: http://tracker.ceph.com/issues/18749 https://github.com/ceph/ceph/pull/17619 I don't know if now it's actually working El 19/10/2017 a las 22:55, David Turner escribió: > In a 3 node cluster with EC k=2 m=1,

Re: [ceph-users] Ceph delete files and status

2017-10-19 Thread nigel davies

Hay I some how got the space back, by tweeking the reweights. but i am a tad confused i uploaded a file (200MB) then removed the file and the space is not changed. i am not sure why that happens and what i can do On Thu, Oct 19, 2017 at 6:42 PM, nigel davies wrote: > PS was not aware of fstrim

Re: [ceph-users] Erasure code failure

2017-10-19 Thread David Turner

Running a cluster on various versions of Hammer and Jewel I haven't had any problems. I haven't upgraded to Luminous quite yet, but I'd be surprised if there is that severe of a regression especially since they did so many improvements to Erasure Coding. On Thu, Oct 19, 2017 at 4:59 PM Jorge Pini

Re: [ceph-users] Erasure code failure

2017-10-19 Thread Jorge Pinilla López

Yes, I am trying it over luminous. Well the bug has been going for 8 month and it hasn't been merged yet. Idk if that is whats preventing me to make it work. Tomorrow I will try to prove it again. El 19/10/2017 a las 23:00, David Turner escribió: > Running a cluster on various versions of Hammer

Re: [ceph-users] Ceph delete files and status

2017-10-19 Thread David Turner

How are you uploading a file? RGW, librados, CephFS, or RBD? There are multiple reasons that the space might not be updating or cleaning itself up. The more information you can give us about how you're testing, the more we can help you. On Thu, Oct 19, 2017 at 5:00 PM nigel davies wrote: > Ha

Re: [ceph-users] Ceph delete files and status

2017-10-19 Thread nigel davies

I am using RGW, with an S3 bucket setup. The live vershion also uses rbd as well On 19 Oct 2017 10:04 pm, "David Turner" wrote: How are you uploading a file? RGW, librados, CephFS, or RBD? There are multiple reasons that the space might not be updating or cleaning itself up. The more informa

Re: [ceph-users] Erasure code failure

2017-10-19 Thread David Turner

Unless your min_size is set to 3, then you are not hitting the bug in the tracker you linked. Most likely you are running with a min_size of 2 which means that bug is not relevant to your cluster. Upload this if you wouldn't mind. `ceph osd pool get {pool_name} all` On Thu, Oct 19, 2017 at 5:03

Re: [ceph-users] Backup VM (Base image + snapshot)

2017-10-19 Thread Oscar Segarra

Hi Richard, Thanks a lot for sharing your experience... I have made deeper investigation and it looks export-diff is the most common tool used for backup as you have suggested. I will make some tests with export-diff and I will share my experience. Again, thanks a lot! 2017-10-16 12:00 GMT+02:

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Russell Glaue

That is a good idea. However, a previous rebalancing processes has brought performance of our Guest VMs to a slow drag. On Thu, Oct 19, 2017 at 3:55 PM, Jean-Charles Lopez wrote: > Hi Russell, > > as you have 4 servers, assuming you are not doing EC pools, just stop all > the OSDs on the second

Re: [ceph-users] [filestore][journal][prepare_entry] rebuild data_align is 4086, maybe a bug

2017-10-19 Thread Gregory Farnum

On Thu, Oct 19, 2017 at 12:59 AM, zhaomingyue wrote: > Hi: > > when I analyzed the performance of ceph, I found that rebuild_aligned was > time-consuming, and the analysis found that rebuild operations were > performed every time. > > > > Source code: > > FileStore::queue_transactions > > –> journ

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Christian Balzer

Hello, On Thu, 19 Oct 2017 17:14:17 -0500 Russell Glaue wrote: > That is a good idea. > However, a previous rebalancing processes has brought performance of our > Guest VMs to a slow drag. > Never mind that I'm not sure that these SSDs are particular well suited for Ceph, your problem is clearl

Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-19 Thread Tyler Bishop

Where did you find the iscsi rpms ect? I looked all through the repo and can't find anything but the documentation. _ Tyler Bishop Founder EST 2007 O: 513-299-7108 x10 M: 513-646-5809 [ http://beyondhosting.net/ | http://BeyondHosting.net ]

Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-19 Thread Jason Dillaman

Development versions of the RPMs can be found here [1]. We don't have production signed builds in place for our ceph-iscsi-XYZ packages yet and the other packages would eventually come from a distro (or third party add-on) repo. [1] https://shaman.ceph.com/repos/ On Thu, Oct 19, 2017 at 8:27 PM,

Re: [ceph-users] ceph inconsistent pg missing ec object

2017-10-19 Thread Gregory Farnum

Okay, you're going to need to explain in very clear terms exactly what happened to your cluster, and *exactly* what operations you performed manually. The PG shards seem to have different views of the PG in question. The primary has a different log_tail, last_user_version, and last_epoch_clean fro

Re: [ceph-users] Not able to start OSD

2017-10-19 Thread Brad Hubbard

On Fri, Oct 20, 2017 at 6:32 AM, Josy wrote: > Hi, > >>> have you checked the output of "ceph-disk list” on the nodes where the >>> OSDs are not coming back on? > > Yes, it shows all the disk correctly mounted. > >>> And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages >>> produced

Re: [ceph-users] Slow requests

2017-10-19 Thread Brad Hubbard

I guess you have both read and followed http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests What was the result? On Fri, Oct 20, 2017 at 2:50 AM, J David wrote: > On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина wrote: >> I have a p

[ceph-users] Ceph Upstream @The Pub in Prague

2017-10-19 Thread Leonardo Vaz

Hi Cephers, Brett Niver and Orit Wasserman are organizing a Ceph Upstream meeting on next Thursday October 25 in Prague. The meeting will happen at The Pub from 5pm to 9pm (CEST): http://www.thepub.cz/praha-1/?lng=en At the moment we are working on the participant list, if you're interested o

[ceph-users] Two CEPHFS Issues

2017-10-19 Thread Daniel Pryor

Hello Everyone, We are currently running into two issues. 1) We are noticing huge pauses during directory creation, but our file write times are super fast. The metadata and data pools are on the same infrastructure. - https://gist.github.com/pryorda/a0d5c37f119c4a320fa4ca9d48c8752b - http

Re: [ceph-users] Two CEPHFS Issues

2017-10-19 Thread Sage Weil

On Thu, 19 Oct 2017, Daniel Pryor wrote: > Hello Everyone, > > We are currently running into two issues. > > 1) We are noticing huge pauses during directory creation, but our file write > times are super fast. The metadata and data pools are on the same > infrastructure. > * https://gist.gith

Re: [ceph-users] Slow requests

2017-10-19 Thread J David

On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote: > I guess you have both read and followed > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests > > What was the result? Not sure if you’re asking Ольга or myself, but in my cas

Re: [ceph-users] Slow requests

2017-10-19 Thread Brad Hubbard

On Fri, Oct 20, 2017 at 1:09 PM, J David wrote: > On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote: >> I guess you have both read and followed >> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests >> >> What was the result?

59 matches

Mail list logo