Re: [ceph-users] does CephFS still have no fsck utility?

2014-09-15 Thread Gregory Farnum
On Mon, Sep 15, 2014 at 3:23 PM, brandon li wrote: > If it's true, is there any other tools I can use to check and repair the > file system? Not much, no. That said, you shouldn't really need an fsck unless the underlying RADOS store went through some catastrophic event. Is there anything in part

Re: [ceph-users] does CephFS still have no fsck utility?

2014-09-15 Thread Gregory Farnum
ay never happen and I just use it here to explain my > concern. > > Thanks, > Brandon > > > On Mon, Sep 15, 2014 at 3:49 PM, Gregory Farnum wrote: >> >> On Mon, Sep 15, 2014 at 3:23 PM, brandon li >> wrote: >> > If it's true, is there any other to

Re: [ceph-users] OSD troubles on FS+Tiering

2014-09-16 Thread Gregory Farnum
re Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 16, 2014 at 5:28 AM, Kenneth Waegeman wrote: > > - Message from Gregory Farnum - >Date: Mon, 15 Sep 2014 10:37:07 -0700 >From: Gregory Farnum > Subject: Re: [ceph-users] OSD troubles on

Re: [ceph-users] does CephFS still have no fsck utility?

2014-09-16 Thread Gregory Farnum
http://tracker.ceph.com/issues/4137 contains links to all the tasks we have so far. You can also search any of the ceph-devel list archives for "forward scrub". On Mon, Sep 15, 2014 at 10:16 PM, brandon li wrote: > Great to know you are working on it! > > I am new to the mailing list. Is there a

Re: [ceph-users] what are these files for mon?

2014-09-16 Thread Gregory Farnum
Hi Greg, > > just picked up this one from the archive while researching a different > issue and thought I'd follow up. > > On Tue, Aug 19, 2014 at 6:24 PM, Gregory Farnum > wrote: > > The sst files are files used by leveldb to store its data; you cannot > > remove the

Re: [ceph-users] Mount ceph block device over specific NIC

2014-09-16 Thread Gregory Farnum
Assuming you're using the kernel? In any case, Ceph generally doesn't do anything to select between different NICs; it just asks for a connection to a given IP. So you should just be able to set up a route for that IP. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Se

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Gregory Farnum
On Tue, Sep 16, 2014 at 12:03 AM, Marc wrote: > Hello fellow cephalopods, > > every deep scrub seems to dig up inconsistencies (i.e. scrub errors) > that we could use some help with diagnosing. > > I understand there used to be a data corruption issue before .80.3 so we > made sure that all the no

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Gregory Farnum
efault in 0.80.4... See the thread > "firefly scrub error". > Cheers, > Dan > > > > From: Gregory Farnum > Sent: Sep 16, 2014 8:15 PM > To: Marc > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Still seing scrub errors in .80.5 > > On Tue, S

Re: [ceph-users] Packages for 0.85?

2014-09-16 Thread Gregory Farnum
Thanks for the poke; looks like something went wrong during the release build last week. We're investigating now. -Greg On Tue, Sep 16, 2014 at 11:08 AM, Daniel Swarbrick wrote: > Hi, > > I saw that the development snapshot 0.85 was released last week, and > have been patiently waiting for packag

Re: [ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread Gregory Farnum
On Tue, Sep 16, 2014 at 5:10 PM, JIten Shah wrote: > Hi Guys, > > We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In > order to be able to loose quite a few OSD’s and still survive the load, we > were thinking of making the replication factor to 50. > > Is that too big of a

Re: [ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread Gregory Farnum
> —Jiten > > On Sep 16, 2014, at 5:35 PM, Gregory Farnum wrote: > >> On Tue, Sep 16, 2014 at 5:10 PM, JIten Shah wrote: >>> Hi Guys, >>> >>> We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In >>> order to be able to loose

Re: [ceph-users] [Ceph-community] Can't Start-up MDS

2014-09-17 Thread Gregory Farnum
That looks like the beginning of an mds creation to me. What's your problem in more detail, and what's the output of "ceph -s"? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Sep 15, 2014 at 5:34 PM, Shun-Fa Yang wrote: > Hi all, > > I'm installed ceph v 0.80.5 on Ubu

Re: [ceph-users] ceph mds unable to start with 0.85

2014-09-18 Thread Gregory Farnum
On Wed, Sep 17, 2014 at 9:59 PM, 廖建锋 wrote: > dear, > my ceph cluster worked for about two weeks, mds crashed every 2-3 > days, > Now it stuck on replay , looks like replay crash and restart mds process > again > what can i do for this? > > 1015 => # ceph -s > cluster 07df7765-c2e7-44de-9b

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-18 Thread Gregory Farnum
On Thu, Sep 18, 2014 at 3:09 AM, Marc wrote: > Hi, > > we did run a deep scrub on everything yesterday, and a repair > afterwards. Then a new deep scrub today, which brought new scrub errors. > > I did check the osd config, they report "filestore_xfs_extsize": "false", > as it should be if I under

Re: [ceph-users] CephFS : rm file does not remove object in rados

2014-09-18 Thread Gregory Farnum
On Thu, Sep 18, 2014 at 10:39 AM, Florent B wrote: > On 09/12/2014 07:38 PM, Gregory Farnum wrote: >> On Fri, Sep 12, 2014 at 6:49 AM, Florent Bautista >> wrote: >>> Hi all, >>> >>> Today I have a problem using CephFS. I use firefly last release, with &

Re: [ceph-users] [Ceph-community] Can't Start-up MDS

2014-09-18 Thread Gregory Farnum
untu165:~# ceph -v > ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) > > > thanks. > > 2014-09-18 1:22 GMT+08:00 Gregory Farnum : >> >> That looks like the beginning of an mds creation to me. What's your >> problem in more detail, and what

Re: [ceph-users] ceph mds unable to start with 0.85

2014-09-18 Thread Gregory Farnum
reg > > would you like to log into the server to check? > > > From: Gregory Farnum > Date: 2014-09-19 02:33 > To: 廖建锋 > CC: ceph-users > Subject: Re: [ceph-users] ceph mds unable to start with 0.85 > > On Wed, Sep 17, 2014 at 9:59 PM, 廖建锋 wrote: >> dear, >

Re: [ceph-users] Renaming pools used by CephFS

2014-09-19 Thread Gregory Farnum
On Fri, Sep 19, 2014 at 10:21 AM, Jeffrey Ollie wrote: > I've got a Ceph system (running 0.80.5) at home that I've been messing > around with, partly to learn Ceph, but also as reliable storage for all of > my media. During the process I deleted the data and metadata pools used by > CephFS and re

Re: [ceph-users] Reassigning admin server

2014-09-23 Thread Gregory Farnum
On Mon, Sep 22, 2014 at 1:22 PM, LaBarre, James (CTR) A6IT wrote: > If I have a machine/VM I am using as an Admin node for a ceph cluster, can I > relocate that admin to another machine/VM after I’ve built a cluster? I > would expect as the Admin isn’t an actual operating part of the cluste

Re: [ceph-users] Repetitive builds for Ceph

2015-02-02 Thread Gregory Farnum
Are you actually using CMake? It's an alternative and incomplete build system right now; the autotools build chain is the canonical one. (I don't think it should be causing your problem, but...who knows?) -Greg On Mon, Feb 2, 2015 at 4:21 AM Ritesh Raj Sarraf wrote: > Thanks Loic. I guess I need

Re: [ceph-users] Update 0.80.7 to 0.80.8 -- Restart Order

2015-02-02 Thread Gregory Farnum
The packages might trigger restarts; the behavior has fluctuated a bit and I don't know where it is right now. That said, for a point release it shouldn't matter what order stuff gets restarted in. I wouldn't worry about it. :) -Greg On Mon, Feb 2, 2015 at 6:47 AM Daniel Schneller < daniel.schnel.

Re: [ceph-users] Update 0.80.7 to 0.80.8 -- Restart Order

2015-02-02 Thread Gregory Farnum
Oh, yeah, that'll hurt on a small cluster more than a large one. I'm not sure how much it matters, sorry. On Mon, Feb 2, 2015 at 8:18 AM Daniel Schneller < daniel.schnel...@centerdevice.com> wrote: > On 2015-02-02 16:09:27 +0000, Gregory Farnum said: > > > That

Re: [ceph-users] features of the next stable release

2015-02-02 Thread Gregory Farnum
On Mon, Feb 2, 2015 at 5:27 AM, Andrei Mikhailovsky wrote: > Hi cephers, > > I've got three questions: > > 1. Does anyone have an estimation on the release dates of the next stable > ceph branch? We should be branching Hammer from master today, and it's feature-frozen at this point. I think we're

Re: [ceph-users] features of the next stable release

2015-02-02 Thread Gregory Farnum
On Mon, Feb 2, 2015 at 11:28 AM, Andrei Mikhailovsky wrote: > > > > > I'm not sure what you mean about improvements for SSD disks, but the > OSD should be generally a bit faster. There are several cache tier > improvements included that should improve performance o

Re: [ceph-users] features of the next stable release

2015-02-02 Thread Gregory Farnum
It's not merely unstable, it's not actually complete. The XIOMessenger is merged so that things don't get too far out of sync, but it should not be used by anybody except developers who are working on it. :) -Greg On Mon, Feb 2, 2015 at 7:43 PM Nicheal wrote: > 2015-02-03 0:48

Re: [ceph-users] Monitor Restart triggers half of our OSDs marked down

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 3:38 AM, Christian Eichelmann wrote: > Hi all, > > during some failover tests and some configuration tests, we currently > discover a strange phenomenon: > > Restarting one of our monitors (5 in sum) triggers about 300 of the > following events: > > osd.669 10.76.28.58:6935/

Re: [ceph-users] cephfs-fuse: set/getfattr, change pools

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 5:21 AM, Daniel Schneller wrote: > Hi! > > We have a CephFS directory /baremetal mounted as /cephfs via FUSE on our > clients. > There are no specific settings configured for /baremetal. > As a result, trying to get the directory layout via getfattr does not work > > getfatt

Re: [ceph-users] cephfs-fuse: set/getfattr, change pools

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 9:23 AM, Daniel Schneller wrote: >>> We have a CephFS directory /baremetal mounted as /cephfs via FUSE on our >>> clients. >>> There are no specific settings configured for /baremetal. >>> As a result, trying to get the directory layout via getfattr does not >>> work >>> >>>

Re: [ceph-users] client unable to access files after caching pool addition

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 10:23 AM, J-P Methot wrote: > Hi, > > I tried to add a caching pool in front of openstack vms and volumes pools. I > believed that the process was transparent, but as soon as I set the caching > for both of these pools, the VMs could not find their volumes anymore. > Obvious

Re: [ceph-users] cephfs-fuse: set/getfattr, change pools

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 1:17 PM, John Spray wrote: > On Tue, Feb 3, 2015 at 2:21 PM, Daniel Schneller > wrote: >> Now, say I wanted to put /baremetal into a different pool, how would I go >> about this? >> >> Can I setfattr on the /cephfs mountpoint and assign it a different pool with >> e. g. dif

Re: [ceph-users] cephfs-fuse: set/getfattr, change pools

2015-02-03 Thread Gregory Farnum
On Tue, Feb 3, 2015 at 1:30 PM, John Spray wrote: > On Tue, Feb 3, 2015 at 10:23 PM, Gregory Farnum wrote: >>> If you explicitly change the layout of a file containing data to point >>> to a different pool, then you will see zeros when you try to read it >>> b

Re: [ceph-users] PG to pool mapping?

2015-02-04 Thread Gregory Farnum
On Wed, Feb 4, 2015 at 1:20 PM, Chad William Seys wrote: > Hi all, >How do I determine which pool a PG belongs to? >(Also, is it the case that all objects in a PG belong to one pool?) PGs are of the form "1.a2b3c4". The part prior to the period is the pool ID; the part following distingui

Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Gregory Farnum
On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) wrote: > I've used the upstream module for our production cephfs cluster, but i've > noticed a bug where timestamps aren't being updated correctly. Modified > files are being reset to the beginning of Unix time. > > It looks like this bug only man

Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Gregory Farnum
On Fri, Feb 6, 2015 at 7:11 AM, Dennis Kramer (DT) wrote: > > On Fri, 6 Feb 2015, Gregory Farnum wrote: > >> On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) >> wrote: >>> >>> I've used the upstream module for our production cephfs cluster, but i&

Re: [ceph-users] ceph Performance vs PG counts

2015-02-08 Thread Gregory Farnum
On Sun, Feb 8, 2015 at 6:00 PM, Sumit Gaur wrote: > Hi > I have installed 6 node ceph cluster and doing a performance bench mark for > the same using Nova VMs. What I have observed that FIO random write reports > around 250 MBps for 1M block size and PGs 4096 and 650MBps for iM block size > and PG

Re: [ceph-users] [rbd] Ceph RBD kernel client using with cephx

2015-02-09 Thread Gregory Farnum
Unmapping is an operation local to the host and doesn't communicate with the cluster at all (at least, in the kernel you're running...in very new code it might involve doing an "unwatch", which will require communication). That means there's no need for a keyring, since its purpose is to validate c

Re: [ceph-users] requests are blocked > 32 sec woes

2015-02-09 Thread Gregory Farnum
There are a lot of next steps on http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ You probably want to look at the bits about using the admin socket, and diagnosing slow requests. :) -Greg On Sun, Feb 8, 2015 at 8:48 PM, Matthew Monaco wrote: > Hello! > > *** Shameless plug

Re: [ceph-users] Compilation problem

2015-02-09 Thread Gregory Farnum
On Fri, Feb 6, 2015 at 3:37 PM, David J. Arias wrote: > Hello! > > I am sysadmin for a small IT consulting enterprise in México. > > We are trying to integrate three servers running RHEL 5.9 into a new > CEPH cluster. > > I downloaded the source code and tried compiling it, though I got stuck > wi

Re: [ceph-users] kernel crash after 'ceph: mds0 caps stale' and 'mds0 hung' -- issue with timestamps or HVM virtualization on EC2?

2015-02-09 Thread Gregory Farnum
On Mon, Feb 9, 2015 at 11:58 AM, Christopher Armstrong wrote: > Hi folks, > > One of our users is seeing machine crashes almost daily. He's using Ceph > v0.87 giant, and is seeing this crash: > https://gist.githubusercontent.com/ianblenke/b74e5aa5547130ebc0fb/raw/c3eeab076310d149443fd6118113b9d94f

Re: [ceph-users] requests are blocked > 32 sec woes

2015-02-09 Thread Gregory Farnum
On Mon, Feb 9, 2015 at 7:12 PM, Matthew Monaco wrote: > On 02/09/2015 08:20 AM, Gregory Farnum wrote: >> There are a lot of next steps on >> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ >> >> You probably want to look at the bits about

Re: [ceph-users] Cache pressure fail

2015-02-11 Thread Gregory Farnum
On Wed, Feb 11, 2015 at 5:30 AM, Dennis Kramer (DT) wrote: > After setting the debug level to 2, I can see: > 2015-02-11 13:36:31.922262 7f0b38294700 2 mds.0.cache check_memory_usage > total 58516068, rss 57508660, heap 32676, malloc 1227560 mmap 0, baseline > 39848, buffers 0, max 67108864, 8656

Re: [ceph-users] CephFS removal.

2015-02-12 Thread Gregory Farnum
What version of Ceph are you running? It's varied by a bit. But I think you want to just turn off the MDS and run the "fail" command — deactivate is actually the command for removing a logical MDS from the cluster, and you can't do that for a lone MDS because there's nobody to pass off the data to

Re: [ceph-users] CephFS removal.

2015-02-12 Thread Gregory Farnum
--Original Message----- > From: Gregory Farnum [mailto:g...@gregs42.com] > Sent: 12 February 2015 16:25 > To: Jeffs, Warren (STFC,RAL,ISIS) > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] CephFS removal. > > What version of Ceph are you running? It's varied by a bit. &

Re: [ceph-users] CRUSHMAP for chassis balance

2015-02-13 Thread Gregory Farnum
With sufficiently new CRUSH versions (all the latest point releases on LTS?) I think you can simply have the rule return extra IDs which are dropped if they exceed the number required. So you can choose two chassis, then have those both choose to lead OSDs, and return those 4 from the rule. -Greg O

Re: [ceph-users] Random OSDs respawning continuously

2015-02-13 Thread Gregory Farnum
It's not entirely clear, but it looks like all the ops are just your caching pool OSDs trying to promote objects, and your backing pool OSD's aren't fast enough to satisfy all the IO demanded of them. You may be overloading the system. -Greg On Fri, Feb 13, 2015 at 6:06 AM Mohamed Pakkeer wrote:

Re: [ceph-users] ceph-osd - No Longer Creates osd.X upon Launch - Bug ?

2015-02-15 Thread Gregory Farnum
On Sun, Feb 15, 2015 at 5:39 PM, Sage Weil wrote: > On Sun, 15 Feb 2015, Mykola Golub wrote: >> On Thu, Feb 05, 2015 at 08:33:39AM -0700, Ron Allred wrote: >> > Hello, >> > >> > The latest ceph-osd in Firefly v0.80.8, no longer auto creates its osd.X >> > entry, in the osd map, which it was assign

Re: [ceph-users] CephFS and data locality?

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 10:36 AM, Jake Kugel wrote: > Hi, > > I'm just starting to look at Ceph and CephFS. I see that Ceph supports > dynamic object interfaces to allow some processing of object data on the > same node where the data is stored [1]. This might be a naive question, > but is there

Re: [ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas wrote: > Hello everyone, > > I'm seeing some OSD behavior that I consider unexpected; perhaps > someone can shed some insight. > > Ceph giant (0.87.0), osd max backfills and osd recovery max active > both set to 1. > > Please take a moment to look at

Re: [ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 9:48 PM, Florian Haas wrote: > On Tue, Feb 17, 2015 at 11:19 PM, Gregory Farnum wrote: >> On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas wrote: >>> Hello everyone, >>> >>> I'm seeing some OSD behavior that I consider unexpected;

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Gregory Farnum
On Wed, Feb 18, 2015 at 1:58 PM, Florian Haas wrote: > On Wed, Feb 18, 2015 at 10:28 PM, Oliver Schulz wrote: >> Dear Ceph Experts, >> >> is it possible to define a Ceph user/key with privileges >> that allow for read-only CephFS access but do not allow >> write or other modifications to the Ceph

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Gregory Farnum
On Wed, Feb 18, 2015 at 3:30 PM, Florian Haas wrote: > On Wed, Feb 18, 2015 at 11:41 PM, Gregory Farnum wrote: >> On Wed, Feb 18, 2015 at 1:58 PM, Florian Haas wrote: >>> On Wed, Feb 18, 2015 at 10:28 PM, Oliver Schulz wrote: >>>> Dear Ceph Experts, >>>

Re: [ceph-users] Minor version difference between monitors and OSDs

2015-02-20 Thread Gregory Farnum
On Thu, Feb 19, 2015 at 8:30 PM, Christian Balzer wrote: > > Hello, > > I have a cluster currently at 0.80.1 and would like to upgrade it to > 0.80.7 (Debian as you can guess), but for a number of reasons I can't > really do it all at the same time. > > In particular I would like to upgrade the pr

Re: [ceph-users] OSD not marked as down or out

2015-02-20 Thread Gregory Farnum
That's pretty strange, especially since the monitor is getting the failure reports. What version are you running? Can you bump up the monitor debugging and provide its output from around that time? -Greg On Fri, Feb 20, 2015 at 3:26 AM, Sudarshan Pathak wrote: > Hello everyone, > > I have a clust

Re: [ceph-users] running giant/hammer mds with firefly osds

2015-02-20 Thread Gregory Farnum
On Fri, Feb 20, 2015 at 3:50 AM, Luis Periquito wrote: > Hi Dan, > > I remember http://tracker.ceph.com/issues/9945 introducing some issues with > running cephfs between different versions of giant/firefly. > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html Hmm, yeah, that's

Re: [ceph-users] Power failure recovery woes (fwd)

2015-02-20 Thread Gregory Farnum
You can try searching the archives and tracker.ceph.com for hints about repairing these issues, but your disk stores have definitely been corrupted and it's likely to be an adventure. I'd recommend examining your local storage stack underneath Ceph and figuring out which part was ignoring barriers.

Re: [ceph-users] Wrong object and used space count in cache tier pool

2015-02-24 Thread Gregory Farnum
On Tue, Feb 24, 2015 at 6:21 AM, Xavier Villaneau wrote: > Hello ceph-users, > > I am currently making tests on a small cluster, and Cache Tiering is one of > those tests. The cluster runs Ceph 0.87 Giant on three Ubuntu 14.04 servers > with the 3.16.0 kernel, for a total of 8 OSD and 1 MON. > > S

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-24 Thread Gregory Farnum
On Mon, Feb 23, 2015 at 8:59 AM, Chris Murray wrote: > ... Trying to send again after reporting bounce backs to dreamhost ... > ... Trying to send one more time after seeing mails come through the > list today ... > > Hi all, > > First off, I should point out that this is a 'small cluster' issue a

Re: [ceph-users] Does Ceph rebalance OSDs proportionally

2015-02-25 Thread Gregory Farnum
Yes. :) -Greg On Wed, Feb 25, 2015 at 8:33 AM Jordan A Eliseo wrote: > Hi all, > > Quick qestion, does the Crush map always strive for proportionality when > rebalancing a cluster? i.e. Say I have 8 OSDs (with a two node cluster - 4 > OSDs per host - at ~90% utilization (which I know is bad, this

Re: [ceph-users] Strange 'ceph df' output

2015-02-25 Thread Gregory Farnum
IIRC these global values for total size and available are just summations from the (programmatic equivalent) of running df on each machine locally, but the used values are based on actual space used by each PG. That has occasionally produced some odd results depending on how you've configured your

Re: [ceph-users] mixed ceph versions

2015-02-25 Thread Gregory Farnum
On Wed, Feb 25, 2015 at 3:11 PM, Deneau, Tom wrote: > I need to set up a cluster where the rados client (for running rados > bench) may be on a different architecture and hence running a different > ceph version from the osd/mon nodes. Is there a list of which ceph > versions work together for a

Re: [ceph-users] MDS [WRN] getattr pAsLsXsFs failed to rdlock

2015-02-26 Thread Gregory Farnum
For everybody else's reference, this is addressed in http://tracker.ceph.com/issues/10944. That kernel has several known bugs. -Greg On Tue, Feb 24, 2015 at 12:02 PM, Ilja Slepnev wrote: > Dear All, > > Configuration of MDS and CephFS client is the same: > OS: CentOS 7.0.1406 > ceph-0.87 > Linux

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-26 Thread Gregory Farnum
> Even after the start of these timeouts, /dev/sdb is still having bursts > of write activity, which is interesting. > > sdb 5.27 0.00 683.60 0 41016 > sdb 5.20 0.00 660.40 0 39624 > sdb

Re: [ceph-users] Shutting down a cluster fully and powering it back up

2015-02-28 Thread Gregory Farnum
Sounds good! -Greg On Sat, Feb 28, 2015 at 10:55 AM David wrote: > Hi! > > I’m about to do maintenance on a Ceph Cluster, where we need to shut it > all down fully. > We’re currently only using it for rados block devices to KVM Hypervizors. > > Are these steps sane? > > Shutting it down > > 1. Sh

Re: [ceph-users] old osds take much longer to start than newer osd

2015-03-02 Thread Gregory Farnum
This is probably LevelDB being slow. The monitor has some options to "compact" the store on startup and I thought the osd handled it automatically, but you could try looking for something like that and see if it helps. -Greg On Fri, Feb 27, 2015 at 5:02 AM Corin Langosch wrote: > Hi guys, > > I'm

Re: [ceph-users] What does the parameter journal_align_min_size mean?

2015-03-02 Thread Gregory Farnum
On Fri, Feb 27, 2015 at 5:03 AM, Mark Wu wrote: > > I am wondering how the value of journal_align_min_size gives impact on > journal padding. Is there any document describing the disk layout of > journal? Not much, unfortunately. Just looking at the code, the journal will align any writes which a

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-02 Thread Gregory Farnum
.43 0.00 807.20 0 48440 > > Thanks, > Chris > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Chris Murray > Sent: 27 February 2015 10:32 > To: Gregory Farnum > Cc: ceph-users > Subject

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu wrote: > Hi all, especially devs, > > We have recently pinpointed one of the causes of slow requests in our > cluster. It seems deep-scrubs on pg's that contain the index file for a > large radosgw bucket lock the osds. Incresing op threads and/or disk

Re: [ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 8:44 AM, Daniel Schneller wrote: > On our Ubuntu 14.04/Firefly 0.80.8 cluster we are seeing > problem with log file rotation for the rados gateway. > > The /etc/logrotate.d/radosgw script gets called, but > it does not work correctly. It spits out this message, > coming from

Re: [ceph-users] ceph binary missing from ceph-0.87.1-0.el6.x86_64

2015-03-02 Thread Gregory Farnum
The ceph tool got moved into ceph-common at some point, so it shouldn't be in the ceph rpm. I'm not sure what step in the installation process should have handled that, but I imagine it's your problem. -Greg On Mon, Mar 2, 2015 at 11:24 AM, Michael Kuriger wrote: > Hi all, > When doing a fresh in

Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 3:39 PM, Scottix wrote: > We have a file system running CephFS and for a while we had this issue when > doing an ls -la we get question marks in the response. > > -rw-r--r-- 1 wwwrun root14761 Feb 9 16:06 > data.2015-02-08_00-00-00.csv.bz2 > -? ? ? ?

Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Gregory Farnum
second ls fixes the problem. > > On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders wrote: >> >> Forgive me if this is unhelpful, but could it be something to do with >> permissions of the directory and not Ceph at all? >> >> http://superuser.com/a/528467 >> &

Re: [ceph-users] Update 0.80.5 to 0.80.8 --the VM's read request become too slow

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 7:15 PM, Nathan O'Sullivan wrote: > > On 11/02/2015 1:46 PM, 杨万元 wrote: > > Hello! > We use Ceph+Openstack in our private cloud. Recently we upgrade our > centos6.5 based cluster from Ceph Emperor to Ceph Firefly. > At first,we use redhat yum repo epel to upgrade, th

Re: [ceph-users] problem in cephfs for remove empty directory

2015-03-03 Thread Gregory Farnum
On Tue, Mar 3, 2015 at 9:24 AM, John Spray wrote: > On 03/03/2015 14:07, Daniel Takatori Ohara wrote: > > $ls test-daniel-old/ > total 0 > drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar 2 10:52 ./ > drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar 2 11:41 ../ > > $rm -rf test

Re: [ceph-users] Ceph Cluster Address

2015-03-04 Thread Gregory Farnum
On Tue, Mar 3, 2015 at 9:26 AM, Garg, Pankaj wrote: > Hi, > > I have ceph cluster that is contained within a rack (1 Monitor and 5 OSD > nodes). I kept the same public and private address for configuration. > > I do have 2 NICS and 2 valid IP addresses (one internal only and one > external) for ea

Re: [ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-04 Thread Gregory Farnum
Just to get more specific: the reason you can apparently write stuff to a file when you can't write to the pool it's stored in is because the file data is initially stored in cache. The flush out to RADOS, when it happens, will fail. It would definitely be preferable if there was some way to immed

Re: [ceph-users] Cascading Failure of OSDs

2015-03-06 Thread Gregory Farnum
This might be related to the backtrace assert, but that's the problem you need to focus on. In particular, both of these errors are caused by the scrub code, which Sage suggested temporarily disabling — if you're still getting these messages, you clearly haven't done so successfully. That said, it

Re: [ceph-users] flock() supported on CephFS through Fuse ?

2015-03-10 Thread Gregory Farnum
On Tue, Mar 10, 2015 at 4:20 AM, Florent B wrote: > Hi all, > > I'm testing flock() locking system on CephFS (Giant) using Fuse. > > It seems that lock works per client, and not over all clients. > > Am I right or is it supposed to work over different clients ? Does MDS > has such a locking system

Re: [ceph-users] rados duplicate object name

2015-03-16 Thread Gregory Farnum
This is expected behavior - "put" uses write_full which is an object overwrite command. On Thu, Mar 12, 2015 at 4:17 PM Kapil Sharma wrote: > Hi Cephers, > > Has anyone tested the behavior of rados by adding an object to the > cluster with an object name which already exists in the cluster ? > wi

Re: [ceph-users] osd laggy algorithm

2015-03-16 Thread Gregory Farnum
On Wed, Mar 11, 2015 at 8:40 AM, Artem Savinov wrote: > hello. > ceph transfers osd node in the down status by default , after receiving 3 > reports about disabled nodes. Reports are sent per "osd heartbeat grace" > seconds, but the settings of "mon_osd_adjust_heartbeat_gratse = true, > mon_osd_

Re: [ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-16 Thread Gregory Farnum
On Wed, Mar 11, 2015 at 2:25 PM, Nick Fisk wrote: > > I’m not sure if it’s something I’m doing wrong or just experiencing an > oddity, but when my cache tier flushes dirty blocks out to the base tier, the > writes seem to hit the OSD’s straight away instead of coalescing in the > journals, is t

Re: [ceph-users] PGs stuck unclean "active+remapped" after an osd marked out

2015-03-16 Thread Gregory Farnum
On Wed, Mar 11, 2015 at 3:49 PM, Francois Lafont wrote: > Hi, > > I was always in the same situation: I couldn't remove an OSD without > have some PGs definitely stuck to the "active+remapped" state. > > But I remembered I read on IRC that, before to mark out an OSD, it > could be sometimes a good

Re: [ceph-users] RadosGW Direct Upload Limitation

2015-03-16 Thread Gregory Farnum
On Mon, Mar 16, 2015 at 11:14 AM, Georgios Dimitrakakis wrote: > Hi all! > > I have recently updated to CEPH version 0.80.9 (latest Firefly release) > which presumably > supports direct upload. > > I 've tried to upload a file using this functionality and it seems that is > working > for files up

Re: [ceph-users] Shadow files

2015-03-16 Thread Gregory Farnum
On Mon, Mar 16, 2015 at 12:12 PM, Craig Lewis wrote: > Out of curiousity, what's the frequency of the peaks and troughs? > > RadosGW has configs on how long it should wait after deleting before garbage > collecting, how long between GC runs, and how many objects it can GC in per > run. > > The def

Re: [ceph-users] Firefly, cephfs issues: different unix rights depending on the client and ls are slow

2015-03-16 Thread Gregory Farnum
On Sun, Mar 15, 2015 at 7:06 PM, Yan, Zheng wrote: > On Sat, Mar 14, 2015 at 7:03 AM, Scottix wrote: >> ... >> >> >>> The time variation is caused cache coherence. when client has valid >>> information >>> in its cache, 'stat' operation will be fast. Otherwise the client need to >>> send >>> requ

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-16 Thread Gregory Farnum
On Sat, Mar 14, 2015 at 1:56 AM, Chris Murray wrote: > Good evening all, > > Just had another quick look at this with some further logging on and thought > I'd post the results in case anyone can keep me moving in the right direction. > > Long story short, some OSDs just don't appear to come up a

Re: [ceph-users] CephFS unexplained writes

2015-03-16 Thread Gregory Farnum
The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin

Re: [ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-16 Thread Gregory Farnum
g the sync intervals without also increasing the filestore_wbthrottle_* limits is not going to work well for you. -Greg On Mon, Mar 16, 2015 at 3:58 PM, Nick Fisk wrote: > > > > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behal

Re: [ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-16 Thread Gregory Farnum
On Mon, Mar 16, 2015 at 4:46 PM, Christian Balzer wrote: > On Mon, 16 Mar 2015 16:09:12 -0700 Gregory Farnum wrote: > >> Nothing here particularly surprises me. I don't remember all the >> details of the filestore's rate limiting off the top of my head, but >> i

Re: [ceph-users] ceph-fuse unable to run through Ansible ?

2015-03-17 Thread Gregory Farnum
On Tue, Mar 17, 2015 at 3:24 PM, Florent B wrote: > Hi everyone, > > My problem is about ceph-fuse & Ansible, I first post here to see if > someone have an idea of what happens. > > I configure a mount point like this: > > mount: name=/mnt/cephfs src='daemonize,id={{ cephfs_username > }},mon_host=

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-18 Thread Gregory Farnum
On Wed, Mar 18, 2015 at 3:28 AM, Chris Murray wrote: > Hi again Greg :-) > > No, it doesn't seem to progress past that point. I started the OSD again a > couple of nights ago: > > 2015-03-16 21:34:46.221307 7fe4a8aa7780 10 journal op_apply_finish 13288339 > open_ops 1 -> 0, max_applied_seq 13288

Re: [ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-18 Thread Gregory Farnum
On Wed, Mar 18, 2015 at 8:04 AM, Nick Fisk wrote: > Hi Greg, > > Thanks for your input and completely agree that we cannot expect developers > to fully document what impact each setting has on a cluster, particularly in > a performance related way > > That said, if you or others could spare some t

Re: [ceph-users] Readonly cache tiering and rbd.

2015-03-19 Thread Gregory Farnum
On Thu, Mar 19, 2015 at 4:46 AM, Matthijs Möhlmann wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > - From the documentation: > > Cache Tier readonly: > > Read-only Mode: When admins configure tiers with readonly mode, Ceph > clients write data to the backing tier. On read, C

Re: [ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-19 Thread Gregory Farnum
On Wed, Mar 18, 2015 at 11:10 PM, Christian Balzer wrote: > > Hello, > > On Wed, 18 Mar 2015 11:05:47 -0700 Gregory Farnum wrote: > >> On Wed, Mar 18, 2015 at 8:04 AM, Nick Fisk wrote: >> > Hi Greg, >> > >> > Thanks for your input and completel

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-19 Thread Gregory Farnum
On Thu, Mar 19, 2015 at 2:41 PM, Nick Fisk wrote: > I'm looking at trialling OSD's with a small flashcache device over them to > hopefully reduce the impact of metadata updates when doing small block io. > Inspiration from here:- > > http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/120

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid wrote: > Hi, > > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with > cephFS. I have installed hadoop-1.1.1 in the nodes and changed the > conf/core-site.xml file according to the ceph documentation > http://ceph.com/docs/master/c

Re: [ceph-users] mds log message

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 12:39 PM, Daniel Takatori Ohara wrote: > Hello, > > Anybody help me, please? Appear any messages in log of my mds. > > And after the shell of my clients freeze. > > 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : > client.3197487 isn't responding

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid wrote: > Gregory Farnum writes: > >> >> On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid wrote: >> > Hi, >> > >> > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with >> >

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray wrote: > Ah, I was wondering myself if compression could be causing an issue, but I'm > reconsidering now. My latest experiment should hopefully help troubleshoot. > > So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I > try

Re: [ceph-users] Ceph in Production: best practice to monitor OSD up/down status

2015-03-23 Thread Gregory Farnum
On Sun, Mar 22, 2015 at 2:55 AM, Saverio Proto wrote: > Hello, > > I started to work with CEPH few weeks ago, I might ask a very newbie > question, but I could not find an answer in the docs or in the ml > archive for this. > > Quick description of my setup: > I have a ceph cluster with two server

Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-23 Thread Gregory Farnum
On Mon, Mar 23, 2015 at 4:31 AM, f...@univ-lr.fr wrote: > Hi Somnath, > > Thank you, please find my answers below > > Somnath Roy a écrit le 22/03/15 18:16 : > > Hi Frederick, > > Need some information here. > > > > 1. Just to clarify, you are saying it is happening g in 0.87.1 and not in > Firef

<    1   2   3   4   5   6   7   8   9   10   >