Re: [ceph-users] near full osd

2013-11-07 Thread Gregory Farnum
It sounds like maybe your PG counts on your pools are too low and so you're just getting a bad balance. If that's the case, you can increase the PG count with "ceph osd pool set pgnum ". OSDs should get data approximately equal to /, so higher weights get more data and all its associated traffic.

Re: [ceph-users] About memory usage of ceph-mon on arm

2013-11-07 Thread Gregory Farnum
I don't think this is anything we've observed before. Normally when a Ceph node is using more memory than its peers it's a consequence of something in that node getting backed up. You might try looking at the perf counters via the admin socket and seeing if something about them is different between

Re: [ceph-users] Aging off objects [SEC=UNOFFICIAL]

2013-11-07 Thread Gregory Farnum
On Tue, Nov 5, 2013 at 2:39 PM, Dickson, Matt MR wrote: > UNOFFICIAL > > Hi, > > I'm new to Ceph and investigating how objects can be aged off, ie delete all > objects older than 7 days. Is there funtionality to do this via the Ceph > SWIFT api or alternatively using a java rados libaray? Not in

Re: [ceph-users] RBD back

2013-11-07 Thread Gregory Farnum
On Thu, Nov 7, 2013 at 1:26 AM, lixuehui wrote: > Hi all > Ceph Object Store service can spans geographical locals . Now ceph also > provides FS and RBD .IF our applications need the RBD service .Can we > provide backup and disaster recovery for it via gateway through some > transfermation ? I

Re: [ceph-users] computing PG IDs

2013-11-07 Thread Gregory Farnum
On Thu, Nov 7, 2013 at 6:43 AM, Kenneth Waegeman wrote: > Hi everyone, > > I just started to look at the documentation of Ceph and I've hit something I > don't understand. > It's about something on http://ceph.com/docs/master/architecture/ > > """ > use the following steps to compute PG IDs. > > T

Re: [ceph-users] deployment architecture practices / new ideas?

2013-11-07 Thread Gregory Farnum
On Wed, Nov 6, 2013 at 6:05 AM, Gautam Saxena wrote: > I'm a little confused -- does CEPH support incremental snapshots of either > VMs or the CEPH-FS? I saw in the release notes for "dumpling" release > (http://ceph.com/docs/master/release-notes/#v0-67-dumpling) this statement: > "The MDS now dis

Re: [ceph-users] Kernel Panic / RBD Instability

2013-11-07 Thread Gregory Farnum
Well, as you've noted you're getting some slow requests on the OSDs when they turn back on; and then the iSCSI gateway is panicking (probably because the block device write request is just hanging). We've gotten prior reports that iSCSI is a lot more sensitive to a few slow requests than most use c

Re: [ceph-users] near full osd

2013-11-08 Thread Gregory Farnum
x: +1 312-244-3301 | E-Mail: > kevin.wei...@imc-chicago.com > > > > > > > > On 11/7/13 9:59 PM, "Gregory Farnum" > > wrote: > > >It sounds like maybe your PG counts on your pools are too low and so > >you're just getting a bad balance.

Re: [ceph-users] near full osd

2013-11-08 Thread Gregory Farnum
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: > *kevin.wei...@imc-chicago.com > * > > From: Gregory Farnum 'g...@inktank.com');>> > Date: Friday, November 8, 2013 11:00 AM > To: Kevin Weiler 'kevin.wei...@imc-chicago.com');>> > C

Re: [ceph-users] Not recovering completely on OSD failure

2013-11-08 Thread Gregory Farnum
This is probably a result of some difficulties that CRUSH has when using pool sizes equal to the total number of buckets it can choose from. We made some changes to the algorithm earlier this year to deal with it, but if using a kernel client you need a very new one to be compatible so we haven't e

Re: [ceph-users] Can't activate OSD with journal and data on the same disk

2013-11-08 Thread Gregory Farnum
I made a ticket for this: http://tracker.ceph.com/issues/6740 Thanks for the bug report! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Nov 8, 2013 at 1:51 AM, Michael Lukzak wrote: > Hi, > > News. I tried activate disk without --dmcrypt and there is no problem. After

Re: [ceph-users] Is Ceph a provider of block device too ?

2013-11-08 Thread Gregory Farnum
On Fri, Nov 8, 2013 at 8:49 AM, Listas wrote: > Hi ! > > I have clusters (IMAP service) with 2 members configured with Ubuntu + Drbd > + Ext4. Intend to migrate to the use of Ceph and begin to allow distributed > access to the data. > > Does Ceph provides the distributed filesystem and block devic

Re: [ceph-users] About memory usage of ceph-mon on arm

2013-11-08 Thread Gregory Farnum
s in use > MALLOC: 8192 Tcmalloc page size > -------- > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > Bytes released to the > > > > > On Fri, Nov 8, 2013 at 12:03 PM, G

Re: [ceph-users] ceph osd thrash?

2013-11-11 Thread Gregory Farnum
On Mon, Nov 11, 2013 at 2:16 AM, Ирек Фасихов wrote: > Hello community. > > I do not understand the argument: ceph osd thrash. > Why the need for this option? > Description of the parameter is not found in the documentation. Where you > can read a more detailed description of the parameter? > Than

Re: [ceph-users] Recovery took too long on cuttlefish

2013-11-13 Thread Gregory Farnum
How did you generate these scenarios? At first glance it looks to me like you've got very low limits set on how many PGs an OSD can be recovering at once, and in the first example they were all targeted to that one OSD, while in the second they were distributed. -Greg Software Engineer #42 @ http:/

Re: [ceph-users] CRUSH tunables for production system?

2013-11-13 Thread Gregory Farnum
On Wed, Nov 13, 2013 at 8:34 AM, Oliver Schulz wrote: > Dear Ceph Experts, > > We're running a production Ceph cluster with Ceph Dumpling, > with Ubuntu 12.04.3 (kernel 3.8) on the cluster nodes and > all clients. We're mainly using CephFS (kernel) and RBD > (kernel and user-space/libvirt). > > Wo

Re: [ceph-users] CRUSH tunables for production system? / Data Distribution?

2013-11-13 Thread Gregory Farnum
Ah, the CRUSH tunables basically don't impact placement at all unless CRUSH fails to do a placement for some reason. What you're seeing here is the result of a pseudo-random imbalance. Increasing your PG and pgp_num counts on the data pool should resolve it (though at the cost of some data movement

Re: [ceph-users] locking on mds

2013-11-13 Thread Gregory Farnum
I'm not too familiar with the toolchain you're using, so can you clarify what problem you're seeing with CephFS here? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Nov 13, 2013 at 12:06 PM, M. Piscaer wrote: > > Hi, > > I have an webcluster setup, where on the loadba

Re: [ceph-users] rbd i/o errors

2013-11-14 Thread Gregory Farnum
Yes, you've run across an issue. It's identified and both the preventive fix and the resolver tool are in testing. See: The thread "Emperor upgrade bug 6761" http://tracker.ceph.com/issues/6761#note-19 (Forgive my brevity; it's late here. :) -Greg Software Engineer #42 @ http://inktank.com | http:/

Re: [ceph-users] how to fix active+remapped pg

2013-11-14 Thread Gregory Farnum
Search the docs or mailing list archives for a discussion of the CRUSH tunables; you can see that's the problem here because CRUSH is only mapping the PG to one OSD instead of two. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Nov 14, 2013 at 4:40 AM, Ugis wrote:

Re: [ceph-users] locking on mds

2013-11-14 Thread Gregory Farnum
12:04 AM, M. Piscaer wrote: > Greg, > > What do you mean, > > What info do you need? > > Kind regards, > > Michiel Piscaer > > > On wo, 2013-11-13 at 16:19 -0800, Gregory Farnum wrote: >> I'm not too familiar with the toolchain you're using, so

[ceph-users] Monitor upgrade issue when running "ceph osd pool set ..." commands

2013-11-18 Thread Gregory Farnum
All, We've just discovered an issue that impacts some users running any of the "ceph osd pool set" family of commands while some of their monitors are running Dumpling and some are running Emperor. Doing so can result in the commands being interpreted incorrectly and your cluster being accidentall

Re: [ceph-users] Is Ceph a provider of block device too ?

2013-11-21 Thread Gregory Farnum
On Thu, Nov 21, 2013 at 10:13 AM, John-Paul Robinson wrote: > Is this statement accurate? > > As I understand DRBD, you can replicate online block devices reliably, > but with Ceph the replication for RBD images requires that the file > system be offline. It's not clear to me what replication you

Re: [ceph-users] how to fix active+remapped pg

2013-11-21 Thread Gregory Farnum
On Thu, Nov 21, 2013 at 7:52 AM, Ugis wrote: > Thanks, reread that section in docs and found tunables profile - nice > to have, hadn't noticed it before(ceph docs develop so fast that you > need RSS to follow all changes :) ) > > Still problem persists in a different way. > Did set profile "optima

Re: [ceph-users] CephFS filesystem disapear!

2013-11-21 Thread Gregory Farnum
What do you mean the filesystem disappears? Is it possible you're just pushing more traffic to the disks than they can handle, and not waiting long enough for them to catch up? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Nov 21, 2013 at 5:19 PM, Alphe Salas Michels

Re: [ceph-users] how to enable rbd cache

2013-11-25 Thread Gregory Farnum
On Mon, Nov 25, 2013 at 5:58 AM, Mark Nelson wrote: > On 11/25/2013 07:21 AM, Shu, Xinxin wrote: >> >> Recently , I want to enable rbd cache to identify performance benefit. I >> add rbd_cache=true option in my ceph configure file, I use ’virsh >> attach-device’ to attach rbd to vm, below is my vd

Re: [ceph-users] pg inconsistent : found clone without head

2013-11-25 Thread Gregory Farnum
On Mon, Nov 25, 2013 at 8:10 AM, Laurent Barbe wrote: > Hello, > > Since yesterday, scrub has detected an inconsistent pg :( : > > # ceph health detail(ceph version 0.61.9) > HEALTH_ERR 1 pgs inconsistent; 9 scrub errors > pg 3.136 is active+clean+inconsistent, acting [9,1] > 9 scrub errors >

Re: [ceph-users] PG state diagram

2013-11-25 Thread Gregory Farnum
It's generated from a .dot file which you can render as you like. :) Please be aware that that diagram is for developers and will be meaningless without that knowledge. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Nov 25, 2013 at 6:42 AM, Regola, Nathan (Contractor)

Re: [ceph-users] Number of threads for osd processes

2013-11-26 Thread Gregory Farnum
The largest group of threads is those from the network messenger — in the current implementation it creates two threads per process the daemon is communicating with. That's two threads for each OSD it shares PGs with, and two threads for each client which is accessing any data on that OSD. -Greg So

Re: [ceph-users] Number of threads for osd processes

2013-11-27 Thread Gregory Farnum
On Wed, Nov 27, 2013 at 1:31 AM, Jens-Christian Fischer wrote: >> The largest group of threads is those from the network messenger — in >> the current implementation it creates two threads per process the >> daemon is communicating with. That's two threads for each OSD it >> shares PGs with, and t

Re: [ceph-users] Number of threads for osd processes

2013-11-27 Thread Gregory Farnum
On Wed, Nov 27, 2013 at 7:28 AM, Mark Nelson wrote: > On 11/27/2013 09:25 AM, Gregory Farnum wrote: >> >> On Wed, Nov 27, 2013 at 1:31 AM, Jens-Christian Fischer >> wrote: >>>> >>>> The largest group of threads is those from the network messenger —

Re: [ceph-users] Shrinking active MDS cluster

2013-11-28 Thread Gregory Farnum
On Thu, Nov 28, 2013 at 5:52 PM, Walter Huf wrote: > A long time ago I got my MDS cluster into a state where I have two active > MDS nodes with a third for failover. This setup is not perfectly stable, so > I want to drop down to one active MDS node with two nodes for failover. Is > there any docu

Re: [ceph-users] how to fix active+remapped pg

2013-12-03 Thread Gregory Farnum
ot;straw", >> "hash": "rjenkins1", >> "items": [ >> { "id": 6, >> "weight": 178913, >> "pos": 0}, >> { "id&quo

Re: [ceph-users] pgs stuck in active+clean+replay state

2014-09-25 Thread Gregory Farnum
I imagine you aren't actually using the data/metadata pool that these PGs are in, but it's a previously-reported bug we haven't identified: http://tracker.ceph.com/issues/8758 They should go away if you restart the OSDs that host them (or just remove those pools), but it's not going to hurt anythin

Re: [ceph-users] PG stuck creating

2014-09-30 Thread Gregory Farnum
> I rebuilt the primary OSD (29) in the hopes it would unblock whatever it > was, but no luck. I'll check the admin socket and see if there is anything I > can find there. > > On Tue, Sep 30, 2014 at 10:36 AM, Gregory Farnum wrote: >> >> On Tuesday, September 30, 201

Re: [ceph-users] PG stuck creating

2014-09-30 Thread Gregory Farnum
On Tuesday, September 30, 2014, Robert LeBlanc wrote: > On our dev cluster, I've got a PG that won't create. We had a host fail > with 10 OSDs that needed to be rebuilt. A number of other OSDs were down > for a few days (did I mention this was a dev cluster?). The other OSDs > eventually came up

Re: [ceph-users] Why performance of benchmarks with small blocks is extremely small?

2014-10-01 Thread Gregory Farnum
On Wed, Oct 1, 2014 at 5:24 AM, Andrei Mikhailovsky wrote: > Timur, > > As far as I know, the latest master has a number of improvements for ssd > disks. If you check the mailing list discussion from a couple of weeks back, > you can see that the latest stable firefly is not that well optimised fo

Re: [ceph-users] Why performance of benchmarks with small blocks is extremely small?

2014-10-01 Thread Gregory Farnum
nktank.com | http://ceph.com On Wed, Oct 1, 2014 at 7:07 AM, Andrei Mikhailovsky wrote: > > Greg, are they going to be a part of the next stable release? > > Cheers > ________ > > From: "Gregory Farnum" > To: "Andrei Mikhailovsky"

Re: [ceph-users] Why performance of benchmarks with small blocks is extremely small?

2014-10-01 Thread Gregory Farnum
On Wed, Oct 1, 2014 at 9:21 AM, Mark Nelson wrote: > On 10/01/2014 11:18 AM, Gregory Farnum wrote: >> >> All the stuff I'm aware of is part of the testing we're doing for >> Giant. There is probably ongoing work in the pipeline, but the fast >> dispatch, shar

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-07 Thread Gregory Farnum
gt; Jasper Siero > > > ____ > Van: Jasper Siero > Verzonden: donderdag 21 augustus 2014 16:43 > Aan: Gregory Farnum > Onderwerp: RE: [ceph-users] mds isn't working anymore after osd's running full > > I did restart it but you

Re: [ceph-users] accept: got bad authorizer

2014-10-08 Thread Gregory Farnum
Check your clock sync on that node. That's the usual cause of this issue. -Greg On Wednesday, October 8, 2014, Nathan Stratton wrote: > I have one out of 16 of my OSDs doing something odd. The logs show some > sort of authentication issue. If I restart the OSD things are fine, but in > a few hou

Re: [ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Gregory Farnum
On Thu, Oct 9, 2014 at 10:55 AM, Johnu George (johnugeo) wrote: > Hi All, > I have few questions regarding the Primary affinity. In the > original blueprint > (https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role_affinity > ), one example has been given. > > For PG x

Re: [ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Gregory Farnum
On Thu, Oct 9, 2014 at 4:24 PM, Johnu George (johnugeo) wrote: > Hi Greg, > Thanks for your extremely informative post. My related questions > are posted inline > > On 10/9/14, 2:21 PM, "Gregory Farnum" wrote: > >>On Thu, Oct 9, 2014 at 10:55 AM, Johnu

Re: [ceph-users] Blueprints

2014-10-09 Thread Gregory Farnum
On Thu, Oct 9, 2014 at 4:01 PM, Robert LeBlanc wrote: > I have a question regarding submitting blueprints. Should only people who > intend to do the work of adding/changing features of Ceph submit blueprints? > I'm not primarily a programmer (but can do programming if needed), but have > a feature

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-10 Thread Gregory Farnum
: > Hello Greg, > > No problem thanks for looking into the log. I attached the log to this email. > I'm looking forward for the new release because it would be nice to have more > possibilities to diagnose problems. > > Kind regards, > > Jasper Siero >

Re: [ceph-users] ceph tell osd.6 version : hang

2014-10-12 Thread Gregory Farnum
On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary wrote: > Hi, > > On a 0.80.6 cluster the command > > ceph tell osd.6 version > > hangs forever. I checked that it establishes a TCP connection to the OSD, > raised the OSD debug level to 20 and I do not see > > https://github.com/ceph/ceph/blob/firefl

Re: [ceph-users] ceph tell osd.6 version : hang

2014-10-12 Thread Gregory Farnum
On Sun, Oct 12, 2014 at 9:10 AM, Loic Dachary wrote: > > > On 12/10/2014 17:48, Gregory Farnum wrote: >> On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary wrote: >>> Hi, >>> >>> On a 0.80.6 cluster the command >>> >>> ceph tell osd.6 versio

Re: [ceph-users] ceph tell osd.6 version : hang

2014-10-12 Thread Gregory Farnum
On Sun, Oct 12, 2014 at 9:29 AM, Loic Dachary wrote: > > > On 12/10/2014 18:22, Gregory Farnum wrote: >> On Sun, Oct 12, 2014 at 9:10 AM, Loic Dachary wrote: >>> >>> >>> On 12/10/2014 17:48, Gregory Farnum wrote: >>>> On Sun, Oct 12, 2014 at

Re: [ceph-users] Handling of network failures in the cluster network

2014-10-13 Thread Gregory Farnum
On Mon, Oct 13, 2014 at 11:32 AM, Martin Mailand wrote: > Hi List, > > I have a ceph cluster setup with two networks, one for public traffic > and one for cluster traffic. > Network failures in the public network are handled quite well, but > network failures in the cluster network are handled ver

Re: [ceph-users] Misconfigured caps on client.admin key, anyway to recover from EAESS denied?

2014-10-13 Thread Gregory Farnum
On Mon, Oct 13, 2014 at 4:04 PM, Wido den Hollander wrote: > On 14-10-14 00:53, Anthony Alba wrote: >> Following the manual starter guide, I set up a Ceph cluster with HEALTH_OK, >> (1 mon, 2 osd). In testing out auth commands I misconfigured the >> client.admin key by accidentally deleting "mon

Re: [ceph-users] Ceph OSD very slow startup

2014-10-14 Thread Gregory Farnum
On Monday, October 13, 2014, Lionel Bouton wrote: > Hi, > > # First a short description of our Ceph setup > > You can skip to the next section ("Main questions") to save time and > come back to this one if you need more context. > > We are currently moving away from DRBD-based storage backed by R

Re: [ceph-users] Misconfigured caps on client.admin key, anyway to recover from EAESS denied?

2014-10-14 Thread Gregory Farnum
On Monday, October 13, 2014, Anthony Alba wrote: > > You can disable cephx completely, fix the key and enable cephx again. > > > > auth_cluster_required, auth_service_required and auth_client_required > > That did not work: i.e disabling cephx in the cluster conf and > restarting the cluster. > T

Re: [ceph-users] Handling of network failures in the cluster network

2014-10-14 Thread Gregory Farnum
version is ceph version 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82). > > On 13.10.2014 21:45, Gregory Farnum wrote: >> How did you test taking down the connection? >> What config options have you specified on the OSDs and in the monitor? >> >> None of the scenarios you're describing make mu

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-14 Thread Gregory Farnum
t; Software Engineer #42 @ http://inktank.com | http://ceph.com > > On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero > wrote: >> Hello Greg, >> >> No problem thanks for looking into the log. I attached the log to this email. >> I'm looking forward for the new release

Re: [ceph-users] Firefly maintenance release schedule

2014-10-15 Thread Gregory Farnum
On Wed, Oct 15, 2014 at 9:39 AM, Dmitry Borodaenko wrote: > On Tue, Sep 30, 2014 at 6:49 PM, Dmitry Borodaenko > wrote: >> Last stable Firefly release (v0.80.5) was tagged on July 29 (over 2 >> months ago). Since then, there were twice as many commits merged into >> the firefly branch than there

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-16 Thread Gregory Farnum
If you're running a single client to drive these tests, that's your bottleneck. Try running multiple clients and aggregating their numbers. -Greg On Thursday, October 16, 2014, Mark Wu wrote: > Hi list, > > During my test, I found ceph doesn't scale as I expected on a 30 osds > cluster. > The fo

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-16 Thread Gregory Farnum
n reach the peak. The client is fio and also running on osd nodes. > But there're no bottlenecks on cpu or network. I also tried running client > on two non osd servers, but the same result. > > 2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" 写道: > >> If you're

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Gregory Farnum
This is a common constraint in many erasure coding storage system. It arises because random writes turn into a read-modify-write cycle (in order to redo the parity calculations). So we simply disallow them in EC pools, which works fine for the target use cases right now. -Greg On Monday, October 2

Re: [ceph-users] Ceph OSD very slow startup

2014-10-20 Thread Gregory Farnum
On Mon, Oct 20, 2014 at 8:25 AM, Lionel Bouton wrote: > Hi, > > More information on our Btrfs tests. > > Le 14/10/2014 19:53, Lionel Bouton a écrit : > > > > Current plan: wait at least a week to study 3.17.0 behavior and upgrade the > 3.12.21 nodes to 3.17.0 if all goes well. > > > 3.17.0 and 3.1

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-21 Thread Gregory Farnum
On Tuesday, October 21, 2014, Chad Seys wrote: > Hi Craig, > > > It's part of the way the CRUSH hashing works. Any change to the CRUSH > map > > causes the algorithm to change slightly. > > Dan@cern could not replicate my observations, so I plan to follow his > procedure (fake create an OSD, wai

Re: [ceph-users] Question/idea about performance problems with a few overloaded OSDs

2014-10-21 Thread Gregory Farnum
On Tue, Oct 21, 2014 at 10:15 AM, Lionel Bouton wrote: > Hi, > > I've yet to install 0.80.7 on one node to confirm its stability and use > the new IO prirority tuning parameters enabling prioritized access to > data from client requests. > > In the meantime, faced with large slowdowns caused by re

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Gregory Farnum
Are these tests conducted using a local fs on RBD, or using CephFS? If CephFS, do you have multiple clients mounting the FS, and what are they doing? What client (kernel or ceph-fuse)? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazar

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Gregory Farnum
ctivity. > Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64. > > On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum wrote: >> Are these tests conducted using a local fs on RBD, or using CephFS? >> If CephFS, do you have multiple clients mounting the FS, and what are >

Re: [ceph-users] Fio rbd stalls during 4M reads

2014-10-24 Thread Gregory Farnum
There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Jason is working on it: http://tracker.ceph.com/issues/9854) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 23, 2014 at 5

Re: [ceph-users] error when executing ceph osd pool set foo-hot cache-mode writeback

2014-10-28 Thread Gregory Farnum
On Tue, Oct 28, 2014 at 3:24 AM, Cristian Falcas wrote: > Hello, > > In the documentation about creating an cache pool, you find this: > > "Cache mode > > The most important policy is the cache mode: > > ceph osd pool set foo-hot cache-mode writeback" > > But when trying to run the above command,

Re: [ceph-users] Adding a monitor to

2014-10-28 Thread Gregory Farnum
On Mon, Oct 27, 2014 at 11:37 AM, Patrick Darley wrote: > Hi there > > Over the last week or so, I've been trying to connect a ceph monitor node > running on a baserock system > to connect to a simple 3-node ubuntu ceph cluster. > > The 3 node ubunutu cluster was created by following the documente

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-28 Thread Gregory Farnum
ing the undumping and starting the mds: > http://pastebin.com/y14pSvM0 > > Kind Regards, > > Jasper > > Van: john.sp...@inktank.com [john.sp...@inktank.com] namens John Spray > [john.sp...@redhat.com] > Verzonden: donderdag 16 oktober 2014 12:23 &g

Re: [ceph-users] Adding a monitor to

2014-10-28 Thread Gregory Farnum
has gone wrong. You don't need to run it from the new monitor, so if you're having trouble getting the keys to behave I'd just run it from an existing system. :) -Greg On Tue, Oct 28, 2014 at 10:11 AM, Patrick Darley wrote: > On 2014-10-28 16:08, Gregory Farnum wrote: >>

Re: [ceph-users] Troubleshooting Incomplete PGs

2014-10-28 Thread Gregory Farnum
On Thu, Oct 23, 2014 at 6:41 AM, Chris Kitzmiller wrote: > On Oct 22, 2014, at 8:22 PM, Craig Lewis wrote: > > Shot in the dark: try manually deep-scrubbing the PG. You could also try > marking various osd's OUT, in an attempt to get the acting set to include > osd.25 again, then do the deep-scru

Re: [ceph-users] Adding a monitor to

2014-10-29 Thread Gregory Farnum
[Re-adding the list, so this is archived for future posterity.] On Wed, Oct 29, 2014 at 6:11 AM, Patrick Darley wrote: > > Thanks again for the reply Greg! > > On 2014-10-28 17:39, Gregory Farnum wrote: >> >> I'm sorry, you're right — I misread it. :( >

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-29 Thread Gregory Farnum
On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero wrote: > Hello Greg, > > I added the debug options which you mentioned and started the process again: > > [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file > /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph > --res

Re: [ceph-users] Delete pools with low priority?

2014-10-29 Thread Gregory Farnum
Dan (who wrote that slide deck) is probably your best bet here, but I believe pool deletion is not very configurable and fairly expensive right now. I suspect that it will get better in Hammer or Infernalis, once we have a unified op work queue that we can independently prioritize all IO through (t

Re: [ceph-users] Crash with rados cppool and snapshots

2014-10-29 Thread Gregory Farnum
On Wed, Oct 29, 2014 at 7:49 AM, Daniel Schneller wrote: > Hi! > > We are exploring options to regularly preserve (i.e. backup) the > contents of the pools backing our rados gateways. For that we create > nightly snapshots of all the relevant pools when there is no activity > on the system to get

Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?

2014-10-31 Thread Gregory Farnum
On Fri, Oct 31, 2014 at 9:55 AM, Narendra Trivedi (natrived) wrote: > Hi All, > > > > I have been working with Openstack Swift + radosgw to stress the whole > object storage from the Swift side (I have been creating containers and > objects for days now) but can’t actually find the limitation when

Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?

2014-10-31 Thread Gregory Farnum
s been configured? > > --Narendra > > -Original Message----- > From: Gregory Farnum [mailto:g...@gregs42.com] > Sent: Friday, October 31, 2014 11:58 AM > To: Narendra Trivedi (natrived) > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Swift + radosgw: How do I fi

Re: [ceph-users] giant release osd down

2014-11-02 Thread Gregory Farnum
What happened when you did the OSD prepare and activate steps? Since your OSDs are either not running or can't communicate with the monitors, there should be some indication from those steps. -Greg On Sun, Nov 2, 2014 at 6:44 AM Shiv Raj Singh wrote: > Hi All > > I am new to ceph and I have been

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 7:46 AM, Chad Seys wrote: > Hi All, >I upgraded from emperor to firefly. Initial upgrade went smoothly and all > placement groups were active+clean . > Next I executed > 'ceph osd crush tunables optimal' > to upgrade CRUSH mapping. Okay...you know that's a data mov

Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand wrote: > Update : > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] > [21787] 0 21780 492110 185044 920 240143 0 > ceph-mon > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] > [131

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
[ Re-adding the list. ] On Mon, Nov 3, 2014 at 10:49 AM, Chad Seys wrote: > >> > Next I executed >> > >> > 'ceph osd crush tunables optimal' >> > >> > to upgrade CRUSH mapping. >> >> Okay...you know that's a data movement command, right? > > Yes. > >> So you should expect it to impact operati

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with "debug osd = 20", "debug filestore = 20", and "debug ms = 1" in the config file and then put the OSD log somewhere accessible after it's crashed? Can you also verify that all of your monitors are r

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys wrote: > On Monday, November 03, 2014 13:22:47 you wrote: >> Okay, assuming this is semi-predictable, can you start up one of the >> OSDs that is going to fail with "debug osd = 20", "debug filestore = >> 20", and "debug ms = 1" in the config file and the

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 12:28 PM, Chad Seys wrote: > On Monday, November 03, 2014 13:50:05 you wrote: >> On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys wrote: >> > On Monday, November 03, 2014 13:22:47 you wrote: >> >> Okay, assuming this is semi-predictable, can you start up one of the >> >> OSDs tha

Re: [ceph-users] Crash with rados cppool and snapshots

2014-11-05 Thread Gregory Farnum
On Thu, Oct 30, 2014 at 4:48 AM, Daniel Schneller wrote: > Apart from the current "there is a bug" part, is the idea to copy a snapshot > into a new pool a viable one for a full-backup-restore? Well, kinda? I mean, it's funneling everything through a single node and relies on being able to copy t

Re: [ceph-users] osd 100% cpu, very slow writes

2014-11-05 Thread Gregory Farnum
On Thu, Oct 30, 2014 at 8:13 AM, Cristian Falcas wrote: > Hello, > > I have an one node ceph installation and when trying to import an > image using qemu, it works fine for some time and after that the osd > process starts using ~100% of cpu and the number of op/s increases and > the writes decrea

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-05 Thread Gregory Farnum
On Wed, Nov 5, 2014 at 7:24 AM, Chad Seys wrote: > Hi Sam, > >> Incomplete usually means the pgs do not have any complete copies. Did >> you previously have more osds? > > No. But could have OSDs quitting after hitting assert(0 == "we got a bad > state machine event"), or interacting with kernel

Re: [ceph-users] Strange configuration with many SAN and few servers

2014-11-07 Thread Gregory Farnum
Yes, you can get the OSDs back if you replace the server. In fact, in your case you might not want to bother including hosts as a distinguishable entity in the crush map; and then to "replace the server" you could hair mount the LUNs somewhere else and turn on the OSDs. You would need to set a few

Re: [ceph-users] Ceph Monitoring with check_MK

2014-11-07 Thread Gregory Farnum
I believe we use base-2 space accounting everywhere. Joao could confirm on that. -Greg On Fri, Nov 7, 2014 at 5:50 AM Robert Sander wrote: > Hi, > > I just create a simple check_MK agent plugin and accompanying checks to > monitor the overall health status and pool usage with the check_MK / OMD >

Re: [ceph-users] Cache pressure fail

2014-11-07 Thread Gregory Farnum
Did you upgrade your clients along with the MDS? This warning indicates the MDS asked the clients to boot some inboxes out of cache and they have taken too long to do so. It might also just mean that you're actively using more inodes at any given time than your MDS is configured to keep in memory.

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-07 Thread Gregory Farnum
e so I uploaded it to another one: >>> >>> http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz >>> >>> Thanks, >>> >>> Jasper >>> >>> >>> Van: gregory.far...@inktank.com [gregory.f

Re: [ceph-users] MDS slow, logging rdlock failures

2014-11-07 Thread Gregory Farnum
On Fri, Nov 7, 2014 at 2:40 PM, Erik Logtenberg wrote: > Hi, > > My MDS is very slow, and it logs stuff like this: > > 2014-11-07 23:38:41.154939 7f8180a31700 0 log_channel(default) log > [WRN] : 2 slow requests, 1 included below; oldest blocked for > > 187.777061 secs > 2014-11-07 23:38:41.15495

Re: [ceph-users] Strange configuration with many SAN and few servers

2014-11-08 Thread Gregory Farnum
Yep! I mean, you don't do anything to register the osd with the cluster again, you just turn it on and it goes to register its new location. -Greg On Sat, Nov 8, 2014 at 2:35 AM Mario Giammarco wrote: > Gregory Farnum writes: > > > > > > > and then to "replac

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
When acting as a cache pool it needs to go do a lookup on the base pool for every object it hasn't encountered before. I assume that's why it's slower. (The penalty should not be nearly as high as you're seeing here, but based on the low numbers I imagine you're running everything on an overloaded

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
It's all about the disk accesses. What's the slow part when you dump historic and in-progress ops? On Sat, Nov 8, 2014 at 2:30 PM Loic Dachary wrote: > Hi Greg, > > On 08/11/2014 20:19, Gregory Farnum wrote:> When acting as a cache pool it > needs to go do a lookup o

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
On Sat, Nov 8, 2014 at 3:24 PM, Loic Dachary wrote: > > > On 09/11/2014 00:03, Gregory Farnum wrote: >> It's all about the disk accesses. What's the slow part when you dump >> historic and in-progress ops? > > This is what I see on g1 (6% iowait) Yeah,

Re: [ceph-users] osds fails to start with mismatch in id

2014-11-09 Thread Gregory Farnum
On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) wrote: > Hi > > I am on ceph 0.87, RHEL 7 > > Out of 60 few osd’s start and the rest complain about mismatch about id’s as > below. > > > > 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53 > > 2014-11-09 07:09:55.81004

Re: [ceph-users] Clone field from rados df command

2014-11-09 Thread Gregory Farnum
clones == snapshotted objects On Sun, Nov 9, 2014 at 9:59 PM Mallikarjun Biradar < mallikarjuna.bira...@gmail.com> wrote: > Anybody observed this? > > On Thu, Oct 30, 2014 at 12:18 PM, Mallikarjun Biradar < > mallikarjuna.bira...@gmail.com> wrote: > >> What exactly "clone" field from "rados df"mea

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-10 Thread Gregory Farnum
4-08-21 > 15:14:45.430926 602'31312014-08-18 15:14:37.494913 > > I there a way to solve this? > > Kind regards, > > Jasper > > Van: Gregory Farnum [g...@gregs42.com] > Verzonden: vrijdag 7 november 2014 2

Re: [ceph-users] Node down question

2014-11-10 Thread Gregory Farnum
On Mon, Nov 10, 2014 at 2:21 PM, Jason wrote: > I have searched the list archives, and have seen a couple of references > to this question, but no real solution, unfortunately... > > We are running multiple ceph clusters, pretty much as media appliances. > As such, the number of nodes is variable,

Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress

2014-11-10 Thread Gregory Farnum
On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar wrote: > Hi all, > > Triggering shallow scrub on OSD where scrub is already in progress, restarts > scrub from beginning on that OSD. > > > Steps: > Triggered shallow scrub on an OSD (Cluster is running heavy IO) > While scrub is in progress, tri

Re: [ceph-users] long term support version?

2014-11-11 Thread Gregory Farnum
Yep! Every other stable release gets the LFS treatment. We're still fixing bugs and backporting some minor features to Dumpling, but haven't done any serious updates to Emperor since Firefly came out. Giant will be superseded by Hammer in the February timeframe, if I have my dates right. -Greg On T

<    3   4   5   6   7   8   9   10   11   12   >