Re: [ceph-users] ceph-fuse "Transport endpoint is not connected" on Jewel 10.2.2

2016-08-30 Thread Goncalo Borges
From: Dennis Kramer (DBS) [den...@holmes.nl] Sent: 30 August 2016 20:59 To: Goncalo Borges; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-fuse "Transport endpoint is not connected" on Jewel 10.2.2 Hi Goncalo, Thank you for providing below info. I'm getting

Re: [ceph-users] cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest"

2016-08-30 Thread Goncalo Borges
ot; Sorry for the extra email Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Goncalo Borges [goncalo.bor...@sydney.edu.au] Sent: 30 August 2016 18:53 To: Brad Hubbard Cc: ceph-us...@ceph.com Subject: Re: [ceph-use

Re: [ceph-users] cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest"

2016-08-30 Thread Goncalo Borges
Hi Brad... Thanks for the feedback. I think we are making some progress. I have opened the following tracker issue: http://tracker.ceph.com/issues/17177 . There I give pointers for all the logs, namely the result of the pg query and all osd logs after increasing the log levels (debug_ms=1, de

Re: [ceph-users] cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest"

2016-08-30 Thread Goncalo Borges
Here it goes: # xfs_info /var/lib/ceph/osd/ceph-78 meta-data=/dev/sdu1 isize=2048 agcount=4, agsize=183107519 blks = sectsz=512 attr=2, projid32bit=1 = crc=0finobt=0 data = bsize=4096

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-31 Thread Goncalo Borges
Hi Kenneth, All Just an update for completeness on this topic. We have been hit again by this issue. I have been discussing it with Brad (RH staff) in another ML thread, and I have opened a tracker issue: http://tracker.ceph.com/issues/17177 I believe this is a bug since there are other peop

Re: [ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-04 Thread Goncalo Borges
Hi Dan. It might be worthwhile to read: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17820.html. Have you seen that one. >From Sam Just: "For each of those pgs, you'll need to identify the pg copy you >want to be the winner and either 1) Remove all of the other ones using >ceph-

Re: [ceph-users] Upgrade steps from Infernalis to Jewel

2016-09-06 Thread Goncalo Borges
Hi Simon. Simple answer is that you can upgrade directly to 10.2.2. We did it from 9.2.0. In cases where you have to pass by an intermediate release, the release notes should be clear about it. Cheers Goncalo From: ceph-users [ceph-users-boun...@lists

Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-07 Thread Goncalo Borges
Hi Greg... I've had to force recreate some PGs on my cephfs data pool due to some cascading disk failures in my homelab cluster. Is there a way to easily determine which files I need to restore from backup? My metadata pool is completely intact. Assuming you're on Jewel, run a recursive "scru

Re: [ceph-users] Scrub and deep-scrub repeating over and over

2016-09-08 Thread Goncalo Borges
Can you please share the result of ceph pg 11.34a query ? On 09/08/2016 05:03 PM, Arvydas Opulskis wrote: 2016-09-08 08:45:01.441945 osd.24 [INF] 11.34a scrub starts 2016-09-08 08:45:03.585039 osd.24 [INF] 11.34a scrub ok -- Goncalo Borges Research Computing ARC Centre of Excellence for

[ceph-users] Recover pgs from cephfs metadata pool (sharing experience)

2016-09-12 Thread Goncalo Borges
: 0, "omap_digest": "0xaa3fd281", "data_digest": "0x" }, { "osd": 78, "missing": false, "read_error": false,

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Goncalo Borges
Hi Daznis... Something is not quite right. You have pools with 2 replicas (right?). The fact that you have 18 down pgs says that both the OSDS acting on those pgs are with problems. You should try to understand which PGs are down and which OSDs are acting on them ('ceph pg dump_stuck' or 'ceph

Re: [ceph-users] cephfs/ceph-fuse: mds0: Client XXX:XXX failing to respond to capability release

2016-09-14 Thread Goncalo Borges
Hi Dennis Have you checked http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007207.html ? The issue there was some near full osd blocking IO. Cheers G. From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dennis Kramer (DBS

Re: [ceph-users] Increase PG number

2016-09-18 Thread Goncalo Borges
Hi I am assuming that you do not have any near full osd (either before or along the pg splitting process) and that your cluster is healthy. To minimize the impact on the clients during recover or operations like pg splitting, it is good to set the following configs. Obviously the whole operat

Re: [ceph-users] cephfs-client Segmentation fault with not-root mount point

2016-09-18 Thread Goncalo Borges
Hi... I think you are seeing an issue we saw some time ago. Your segfault seems the same we had but please confirm against the info in https://github.com/ceph/ceph/pull/10027 We solve it by recompiling ceph with the patch described above. I think it should be solved in the next bug release ve

Re: [ceph-users] Mount Cephfs subtree

2016-09-27 Thread Goncalo Borges
ectory. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydne

Re: [ceph-users] cephfs/ceph-fuse: mds0: Client XXX:XXX failingtorespond to capability release

2016-10-03 Thread Goncalo Borges
_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 20

Re: [ceph-users] New OSD Nodes, pgs haven't changed state

2016-10-10 Thread Goncalo Borges
Hi Mike... I was hoping that someone with a bit more experience would answer you since I never had similar situation. So, I'll try to step in and help. The peering process means that the OSDs are agreeing on the state of objects in the PGs they share. The peering process can take some time and

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-18 Thread Goncalo Borges
Hi John. That would be good. In our case we are just picking that up simply through nagios and some fancy scripts parsing the dump of the MDS maps. Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John Spray [jsp...@red

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Goncalo Borges
Hi Kostis... That is a tale from the dark side. Glad you recover it and that you were willing to doc it all up, and share it. Thank you for that, Can I also ask which tool did you use to recover the leveldb? Cheers Goncalo From: ceph-users [ceph-users-boun.

Re: [ceph-users] ceph df show 8E pool

2016-10-27 Thread Goncalo Borges
Hi Dan... Have you tried 'rados df' to see if it agrees with 'ceph df' ? Cheers G. From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan van der Ster [d...@vanderster.com] Sent: 28 October 2016 03:01 To: ceph-users Subject: [ceph-users] cep

Re: [ceph-users] Locating CephFS clients in warn message

2016-11-10 Thread Goncalo Borges
Hi "ceph daemon mds. session ls", executed in your mds server, should give you hostname and client id of all your cephfs clients. "ceph daemon mds. dump_ops_in_flight" should give you operations not completed or pending to complete for certain clients ids. In case of problems, that those probl

Re: [ceph-users] Intermittent permission denied using kernel client with mds path cap

2016-11-10 Thread Goncalo Borges
Hi Dan,,, I know there are path restriction issues in the kernel client. See the discussion here. http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-June/010656.html http://tracker.ceph.com/issues/16358 Cheers Goncalo From: ceph-users [ceph

Re: [ceph-users] Locating CephFS clients in warn message

2016-11-10 Thread Goncalo Borges
Doesn't the mds log tell you which clients ids are with problems? Does you mds has enough RAM so that you can increase the default value 10 of the mds cache size ? Cheers G. From: Yutian Li [l...@megvii.com] Sent: 11 November 2016 14:03 To: Goncalo B

Re: [ceph-users] stuck unclean since forever

2016-11-12 Thread Goncalo Borges
Hi Joel. The pgs of a given pool start with the id of the pool. So, the 19.xx mean that those pgs are from pool 19. I think that a 'ceph osd dump' should give you a summary of all pools and their ids at the very beginning of the output. My guess is that this will confirm that your volume or im

[ceph-users] Standby-replay mds: 10.2.2

2016-11-13 Thread Goncalo Borges
Hi Greg, Jonh, Zheng, CephFSers Maybe a simple question but I think it is better to ask first than to complain after. We are currently undergoing an infrastructure migration. One of the first machines to go through this migration process is our standby-replay mds. We are running 10.2.2. My pla

Re: [ceph-users] Standby-replay mds: 10.2.2

2016-11-14 Thread Goncalo Borges
o ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Phy

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-16 Thread Goncalo Borges
Olá Pedro... These are extremely generic questions, and therefore, hard to answer. Nick did a good job in defining the risks. In our case, we are running a Ceph/CephFS system in production for over an year, and before that, we tried to understand Ceph for a year also. Ceph is incredibility go

Re: [ceph-users] Ceph Down on Cluster

2016-11-18 Thread Goncalo Borges
Olá Bruno I am not understanding your outputs. On the first 'ceph -s' it says one mon is down but hour 'ceph health detail' does not report it further. On your crush map I count 7 osds= 0,1,2,3,4,6,7 but ceph -s says only 6 are active. Can you send the output of 'ceph osd tree, 'ceph osd df'

[ceph-users] ceph-fuse clients taking too long to update dir sizes

2016-12-04 Thread Goncalo Borges
; } ], "mdsmap_epoch": 5224 } ---> Running following command in the mds: { "id": 616338, "num_leases": 0, "num_caps": 16078, "state": "open",

[ceph-users] cephfs quotas reporting

2016-12-04 Thread Goncalo Borges
153T of used space with respect to 306T in total (case 1) 51T of used space with respect to 81TB in total (case 2) Am i doing something wrong here? Cheers Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Phys

Re: [ceph-users] cephfs quotas reporting

2016-12-05 Thread Goncalo Borges
Hi Greg, Jonh... To Jonh: Nothing is done in tge background between two consecutive df commands, I have opened the following tracker issue: http://tracker.ceph.com/issues/18151 (sorry, all the issue headers are empty apart from the title. I've hit enter before actually filling all the appropr

Re: [ceph-users] ceph-fuse clients taking too long to update dir sizes

2016-12-05 Thread Goncalo Borges
Hi John... >> We are running ceph/cephfs in 10.2.2. All infrastructure is in the same >> version (rados cluster, mons, mds and cephfs clients). We mount cephfs using >> ceph-fuse. >> >> Last week I triggered some of my heavy users to delete data. In the >> following example, the user in question

[ceph-users] segfault in ceph-fuse when quota is enabled

2016-12-05 Thread Goncalo Borges
Hi John, Greg, Zheng And now a much more relevant problem. Once again, my environment: - ceph/cephfs in 10.2.2 but patched for o client: add missing client_lock for get_root (https://github.com/ceph/ceph/pull/10027) o Jewel: segfault in ObjectCacher::FlusherThread (http://tracker.ceph.com/

Re: [ceph-users] segfault in ceph-fuse when quota is enabled

2016-12-06 Thread Goncalo Borges
Thanks Dan for your critical eye. Somehow I did not notice that there was already a tracker for it. Cheers G. From: Dan van der Ster [d...@vanderster.com] Sent: 06 December 2016 19:30 To: Goncalo Borges Cc: ceph-us...@ceph.com Subject: Re: [ceph-users

Re: [ceph-users] Parallel reads with CephFS

2016-12-07 Thread Goncalo Borges
s.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-use

Re: [ceph-users] CephFS FAILED assert(dn->get_linkage()->is_null())

2016-12-08 Thread Goncalo Borges
Hi John. I have been hitting that issue also although have not seen any asserts in my mds yet. Could you please clarify a bit further your proposal about manually removing omap info from strays? Should it be applied: - to the problematic replicas of the stray object which triggered the inconsi

Re: [ceph-users] CephFS FAILED assert(dn->get_linkage()->is_null())

2016-12-09 Thread Goncalo Borges
Hi Sean, Rob. I saw on the tracker that you were able to resolve the mds assert by manually cleaning the corrupted metadata. Since I am also hitting that issue and I suspect that i will face an mds assert of the same type sooner or later, can you please explain a bit further what operations did

[ceph-users] Revisiting: Many clients (X) failing to respond to cache pressure

2016-12-12 Thread Goncalo Borges
t;: 0, "dir_split": 0, "inode_max": 200, "inodes": 258, "inodes_top": 0, "inodes_bottom": 1993207, "inodes_pin_tail": 6851, "inodes_pinned": 12413

Re: [ceph-users] cephfs quotas reporting

2016-12-13 Thread Goncalo Borges
Borges Cc: John Spray; ceph-us...@ceph.com Subject: Re: [ceph-users] cephfs quotas reporting On Mon, Dec 5, 2016 at 5:24 PM, Goncalo Borges wrote: > Hi Greg, Jonh... > > To Jonh: Nothing is done in tge background between two consecutive df > commands, > > I have opened the follo

Re: [ceph-users] Revisiting: Many clients (X) failing to respond to cache pressure

2016-12-13 Thread Goncalo Borges
"forward": 0, "dir_fetch": 0, "dir_commit": 0, "dir_split": 0, "inode_max": 200, "inodes": 2000058, "inodes_top": 0, "inodes_bottom": 1993207,

Re: [ceph-users] Revisiting: Many clients (X) failing to respond to cache pressure

2016-12-15 Thread Goncalo Borges
s problematic by MDS although inodes < inodes_max. Looking to the number of inodes of that machine, I get "inode_count": 13862. So, it seems that the client is still tagged as problematic although it has an inode_count bellow 16384 and inodes < inodes_max. Maybe a consequence of

Re: [ceph-users] cephfs quota

2016-12-16 Thread Goncalo Borges
Hi all Even when using ceph fuse, quotas are only enabled once you mount with the --client-quota option. Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of gjprabu [gjpr...@zohocorp.com] Sent: 16 December 2016 18:18 To: gjprab

Re: [ceph-users] CephFS metdata inconsistent PG Repair Problem

2016-12-19 Thread Goncalo Borges
Hi Sean In our case, the last time we had this error, we stopped the osd, mark it out, let ceph recover and then reinstall it. We did it because we were suspecting of issues with the osd and that was why we decided to take this approach. The fact is that the pg we were seeing constantly declared

Re: [ceph-users] installation docs

2016-12-30 Thread Goncalo Borges
Hi Manuel I am Goncalo Borges (Portuguese) and I work at the university of Sydney. We have been using ceph and cephfs since almost two years. If you think worthwhile, we can just talk and discuss our experiences. There is good ceph community in Melbourne but you are actually the first one in

Re: [ceph-users] CRUSH puzzle: step weighted-take

2018-09-27 Thread Goncalo Borges
Hi Dan Hope to find you ok. Here goes a suggestion from someone who has been sitting in the side line for the last 2 years but following stuff as much as possible Will weight set per pool help? This is only possible in luminous but according to the docs there is the possibility to adjust positi

Re: [ceph-users] Replacing an mds server

2017-01-24 Thread Goncalo Borges
Hi Jorge Indeed my advice is to configure your high memory mds as a standby mds. Once you restart the service in the low memory mds, the standby one should take over without downtime and the first one becomes the standby one. Cheers Goncalo From: ceph-user

[ceph-users] ceph-fuse and subtree cephfs mount question

2015-12-14 Thread Goncalo Borges
ully understanding how to properly do it? TIA Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list

Re: [ceph-users] ceph-fuse and subtree cephfs mount question

2015-12-14 Thread Goncalo Borges
I think I've understood how to run it... ceph-fuse -m MON_IP:6789 -r /syd /coepp/cephfs/syd does what I want Cheers Goncalo On 12/15/2015 12:04 PM, Goncalo Borges wrote: Dear CephFS experts Before it was possible to mount a subtree of a filesystem using ceph-fuse and -r option.

[ceph-users] ACLs question in cephfs

2015-12-15 Thread Goncalo Borges
Dear Cephfs gurus. I have two questions regarding ACL support on cephfs. 1) Last time we tried ACLs we saw that they were only working properly in the kernel module and I wonder what is the present status of acl support on ceph-fuse. Can you clarify on that? 2) If ceph-fuse is still not proper

Re: [ceph-users] CentOS 7.2, Infernalis, preparing osd's and partprobe issues.

2016-01-06 Thread Goncalo Borges
osd use. I am not 100% sure if this is either a problem with Ceph v9.2.0 or to do with the recent update of CentOS 7.2 Has anyone else encountered a similar problem? Also, should I be posting this on ceph-devel mailing list, or is here OK? Thanks! Regards, Matthew Taylor.

Re: [ceph-users] CentOS 7.2, Infernalis, preparing osd's and partprobe issues.

2016-01-12 Thread Goncalo Borges
15 To: Goncalo Borges; Loic Dachary Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] CentOS 7.2, Infernalis, preparing osd's and partprobe issues. I commented out partprobe and everything seems to work just fine. *If someone has experience with why this is very bad please advise. Make sur

[ceph-users] CephFS - Trying to understand direct OSD connection to ceph-fuse cephfs clients

2016-01-31 Thread Goncalo Borges
Dear CephFS experts... We are using Ceph and CephFS 9.2.0. CephFS clients are being mounted via ceph-fuse. We recently noticed the firewall from certain CephFS clients dropping connections with OSDs as SRC. This is something which is not systematic but we noticed happening at least once. Here

Re: [ceph-users] CephFS - Trying to understand direct OSD connection to ceph-fuse cephfs clients

2016-02-01 Thread Goncalo Borges
Hi Greg. We are using Ceph and CephFS 9.2.0. CephFS clients are being mounted via ceph-fuse. We recently noticed the firewall from certain CephFS clients dropping connections with OSDs as SRC. This is something which is not systematic but we noticed happening at least once. Here is an example

[ceph-users] CEPHFS: standby-replay mds crash

2016-02-01 Thread Goncalo Borges
Hi CephFS experts. 1./ We are using Ceph and CephFS 9.2.0 with an active mds and a standby-replay mds (standard config) # ceph -s cluster health HEALTH_OK monmap e1: 3 mons at {mon1=:6789/0,mon2=:6789/0,mon3=:6789/0} election epoch 98, quorum 0,1,2 mon1,mon3,mon2

Re: [ceph-users] CEPHFS: standby-replay mds crash

2016-02-01 Thread Goncalo Borges
Hi... Seems very similar to http://tracker.ceph.com/issues/14144 Can you confirm it is the same issue? Cheers G. From: Goncalo Borges Sent: 02 February 2016 15:30 To: ceph-us...@ceph.com Cc: rct...@coepp.org.au Subject: CEPHFS: standby-replay mds crash Hi

Re: [ceph-users] Urgent help needed for ceph storage "mount error 5 = Input/output error"

2016-02-02 Thread Goncalo Borges
Hi X Have you tried to inspect the mds for problematic sessions still connected from those clients? To check which sessions are still connected to the mds, do (in ceph 9.2.0, the command might be different or even do not exist in other older versions) ceph daemon mds. session ls Cheers G.

Re: [ceph-users] Urgent help needed for ceph storage "mount error 5 = Input/output error"

2016-02-02 Thread Goncalo Borges
: Zhao Xu [xuzh....@gmail.com] Sent: 03 February 2016 11:31 To: Goncalo Borges Cc: Mykola Dvornik; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Urgent help needed for ceph storage "mount error 5 = Input/output error" I see a lot sessions. How can I clear these session? Since I'

[ceph-users] Optimations of cephfs clients on WAN: Looking for suggestions.

2016-03-21 Thread Goncalo Borges
Dear CephFS gurus... I would like your advise on how to improve performance without compromising reliability for CephFS clients deployed under a WAN. Currently, our infrastructure relies on: - ceph infernalis - a ceph object cluster, with all core infrastructure components sitting in the same d

Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Goncalo Borges
Hi Zhang... If I can add some more info, the change of PGs is a heavy operation, and as far as i know, you should NEVER decrease PGs. From the notes in pgcalc (http://ceph.com/pgcalc/): "It's also important to know that the PG count can be increased, but NEVER decreased without destroying / re

Re: [ceph-users] Need help for PG problem

2016-03-23 Thread Goncalo Borges
From: Zhang Qiang [dotslash...@gmail.com] Sent: 23 March 2016 23:17 To: Goncalo Borges Cc: Oliver Dzombic; ceph-users Subject: Re: [ceph-users] Need help for PG problem And here's the osd tree if it matters. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINI

Re: [ceph-users] ceph pg query hangs for ever

2016-03-31 Thread Goncalo Borges
ning Hammer 0.94.5 in this case. From what I know a OSD had a failing disk and was restarted a couple of times while the disk gave errors. This caused the PG to become incomplete. I've set debug osd to 20, but I can't really tell what is going wrong on osd.68 which causes it to stall this

[ceph-users] CephFS: Issues handling thousands of files under the same dir (?)

2016-04-17 Thread Goncalo Borges
heers Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] unsubscribe

2019-09-13 Thread Goncalo Borges
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

<    1   2