It sounds like maybe your PG counts on your pools are too low and so
you're just getting a bad balance. If that's the case, you can
increase the PG count with "ceph osd pool set pgnum ".
OSDs should get data approximately equal to /, so higher weights get more data and all its associated
traffic.
I don't think this is anything we've observed before. Normally when a
Ceph node is using more memory than its peers it's a consequence of
something in that node getting backed up. You might try looking at the
perf counters via the admin socket and seeing if something about them
is different between
On Tue, Nov 5, 2013 at 2:39 PM, Dickson, Matt MR
wrote:
> UNOFFICIAL
>
> Hi,
>
> I'm new to Ceph and investigating how objects can be aged off, ie delete all
> objects older than 7 days. Is there funtionality to do this via the Ceph
> SWIFT api or alternatively using a java rados libaray?
Not in
On Thu, Nov 7, 2013 at 1:26 AM, lixuehui wrote:
> Hi all
> Ceph Object Store service can spans geographical locals . Now ceph also
> provides FS and RBD .IF our applications need the RBD service .Can we
> provide backup and disaster recovery for it via gateway through some
> transfermation ? I
On Thu, Nov 7, 2013 at 6:43 AM, Kenneth Waegeman
wrote:
> Hi everyone,
>
> I just started to look at the documentation of Ceph and I've hit something I
> don't understand.
> It's about something on http://ceph.com/docs/master/architecture/
>
> """
> use the following steps to compute PG IDs.
>
> T
On Wed, Nov 6, 2013 at 6:05 AM, Gautam Saxena wrote:
> I'm a little confused -- does CEPH support incremental snapshots of either
> VMs or the CEPH-FS? I saw in the release notes for "dumpling" release
> (http://ceph.com/docs/master/release-notes/#v0-67-dumpling) this statement:
> "The MDS now dis
Well, as you've noted you're getting some slow requests on the OSDs
when they turn back on; and then the iSCSI gateway is panicking
(probably because the block device write request is just hanging).
We've gotten prior reports that iSCSI is a lot more sensitive to a few
slow requests than most use c
x: +1 312-244-3301 | E-Mail:
> kevin.wei...@imc-chicago.com
>
>
>
>
>
>
>
> On 11/7/13 9:59 PM, "Gregory Farnum" >
> wrote:
>
> >It sounds like maybe your PG counts on your pools are too low and so
> >you're just getting a bad balance.
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
> *kevin.wei...@imc-chicago.com
> *
>
> From: Gregory Farnum 'g...@inktank.com');>>
> Date: Friday, November 8, 2013 11:00 AM
> To: Kevin Weiler 'kevin.wei...@imc-chicago.com');>>
> C
This is probably a result of some difficulties that CRUSH has when using
pool sizes equal to the total number of buckets it can choose from. We made
some changes to the algorithm earlier this year to deal with it, but if
using a kernel client you need a very new one to be compatible so we
haven't e
I made a ticket for this: http://tracker.ceph.com/issues/6740
Thanks for the bug report!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Nov 8, 2013 at 1:51 AM, Michael Lukzak wrote:
> Hi,
>
> News. I tried activate disk without --dmcrypt and there is no problem. After
On Fri, Nov 8, 2013 at 8:49 AM, Listas wrote:
> Hi !
>
> I have clusters (IMAP service) with 2 members configured with Ubuntu + Drbd
> + Ext4. Intend to migrate to the use of Ceph and begin to allow distributed
> access to the data.
>
> Does Ceph provides the distributed filesystem and block devic
s in use
> MALLOC: 8192 Tcmalloc page size
> --------
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the
>
>
>
>
> On Fri, Nov 8, 2013 at 12:03 PM, G
On Mon, Nov 11, 2013 at 2:16 AM, Ирек Фасихов wrote:
> Hello community.
>
> I do not understand the argument: ceph osd thrash.
> Why the need for this option?
> Description of the parameter is not found in the documentation. Where you
> can read a more detailed description of the parameter?
> Than
How did you generate these scenarios? At first glance it looks to me
like you've got very low limits set on how many PGs an OSD can be
recovering at once, and in the first example they were all targeted to
that one OSD, while in the second they were distributed.
-Greg
Software Engineer #42 @ http:/
On Wed, Nov 13, 2013 at 8:34 AM, Oliver Schulz wrote:
> Dear Ceph Experts,
>
> We're running a production Ceph cluster with Ceph Dumpling,
> with Ubuntu 12.04.3 (kernel 3.8) on the cluster nodes and
> all clients. We're mainly using CephFS (kernel) and RBD
> (kernel and user-space/libvirt).
>
> Wo
Ah, the CRUSH tunables basically don't impact placement at all unless
CRUSH fails to do a placement for some reason. What you're seeing here
is the result of a pseudo-random imbalance. Increasing your PG and
pgp_num counts on the data pool should resolve it (though at the cost
of some data movement
I'm not too familiar with the toolchain you're using, so can you
clarify what problem you're seeing with CephFS here?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Nov 13, 2013 at 12:06 PM, M. Piscaer wrote:
>
> Hi,
>
> I have an webcluster setup, where on the loadba
Yes, you've run across an issue. It's identified and both the
preventive fix and the resolver tool are in testing.
See:
The thread "Emperor upgrade bug 6761"
http://tracker.ceph.com/issues/6761#note-19
(Forgive my brevity; it's late here. :)
-Greg
Software Engineer #42 @ http://inktank.com | http:/
Search the docs or mailing list archives for a discussion of the CRUSH
tunables; you can see that's the problem here because CRUSH is only
mapping the PG to one OSD instead of two. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Nov 14, 2013 at 4:40 AM, Ugis wrote:
12:04 AM, M. Piscaer wrote:
> Greg,
>
> What do you mean,
>
> What info do you need?
>
> Kind regards,
>
> Michiel Piscaer
>
>
> On wo, 2013-11-13 at 16:19 -0800, Gregory Farnum wrote:
>> I'm not too familiar with the toolchain you're using, so
All,
We've just discovered an issue that impacts some users running any of
the "ceph osd pool set" family of commands while some of their
monitors are running Dumpling and some are running Emperor. Doing so
can result in the commands being interpreted incorrectly and your
cluster being accidentall
On Thu, Nov 21, 2013 at 10:13 AM, John-Paul Robinson wrote:
> Is this statement accurate?
>
> As I understand DRBD, you can replicate online block devices reliably,
> but with Ceph the replication for RBD images requires that the file
> system be offline.
It's not clear to me what replication you
On Thu, Nov 21, 2013 at 7:52 AM, Ugis wrote:
> Thanks, reread that section in docs and found tunables profile - nice
> to have, hadn't noticed it before(ceph docs develop so fast that you
> need RSS to follow all changes :) )
>
> Still problem persists in a different way.
> Did set profile "optima
What do you mean the filesystem disappears? Is it possible you're just
pushing more traffic to the disks than they can handle, and not waiting
long enough for them to catch up?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Nov 21, 2013 at 5:19 PM, Alphe Salas Michels
On Mon, Nov 25, 2013 at 5:58 AM, Mark Nelson wrote:
> On 11/25/2013 07:21 AM, Shu, Xinxin wrote:
>>
>> Recently , I want to enable rbd cache to identify performance benefit. I
>> add rbd_cache=true option in my ceph configure file, I use ’virsh
>> attach-device’ to attach rbd to vm, below is my vd
On Mon, Nov 25, 2013 at 8:10 AM, Laurent Barbe wrote:
> Hello,
>
> Since yesterday, scrub has detected an inconsistent pg :( :
>
> # ceph health detail(ceph version 0.61.9)
> HEALTH_ERR 1 pgs inconsistent; 9 scrub errors
> pg 3.136 is active+clean+inconsistent, acting [9,1]
> 9 scrub errors
>
It's generated from a .dot file which you can render as you like. :)
Please be aware that that diagram is for developers and will be
meaningless without that knowledge.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Nov 25, 2013 at 6:42 AM, Regola, Nathan (Contractor)
The largest group of threads is those from the network messenger — in
the current implementation it creates two threads per process the
daemon is communicating with. That's two threads for each OSD it
shares PGs with, and two threads for each client which is accessing
any data on that OSD.
-Greg
So
On Wed, Nov 27, 2013 at 1:31 AM, Jens-Christian Fischer
wrote:
>> The largest group of threads is those from the network messenger — in
>> the current implementation it creates two threads per process the
>> daemon is communicating with. That's two threads for each OSD it
>> shares PGs with, and t
On Wed, Nov 27, 2013 at 7:28 AM, Mark Nelson wrote:
> On 11/27/2013 09:25 AM, Gregory Farnum wrote:
>>
>> On Wed, Nov 27, 2013 at 1:31 AM, Jens-Christian Fischer
>> wrote:
>>>>
>>>> The largest group of threads is those from the network messenger —
On Thu, Nov 28, 2013 at 5:52 PM, Walter Huf wrote:
> A long time ago I got my MDS cluster into a state where I have two active
> MDS nodes with a third for failover. This setup is not perfectly stable, so
> I want to drop down to one active MDS node with two nodes for failover. Is
> there any docu
ot;straw",
>> "hash": "rjenkins1",
>> "items": [
>> { "id": 6,
>> "weight": 178913,
>> "pos": 0},
>> { "id&quo
I imagine you aren't actually using the data/metadata pool that these
PGs are in, but it's a previously-reported bug we haven't identified:
http://tracker.ceph.com/issues/8758
They should go away if you restart the OSDs that host them (or just
remove those pools), but it's not going to hurt anythin
> I rebuilt the primary OSD (29) in the hopes it would unblock whatever it
> was, but no luck. I'll check the admin socket and see if there is anything I
> can find there.
>
> On Tue, Sep 30, 2014 at 10:36 AM, Gregory Farnum wrote:
>>
>> On Tuesday, September 30, 201
On Tuesday, September 30, 2014, Robert LeBlanc wrote:
> On our dev cluster, I've got a PG that won't create. We had a host fail
> with 10 OSDs that needed to be rebuilt. A number of other OSDs were down
> for a few days (did I mention this was a dev cluster?). The other OSDs
> eventually came up
On Wed, Oct 1, 2014 at 5:24 AM, Andrei Mikhailovsky wrote:
> Timur,
>
> As far as I know, the latest master has a number of improvements for ssd
> disks. If you check the mailing list discussion from a couple of weeks back,
> you can see that the latest stable firefly is not that well optimised fo
nktank.com | http://ceph.com
On Wed, Oct 1, 2014 at 7:07 AM, Andrei Mikhailovsky wrote:
>
> Greg, are they going to be a part of the next stable release?
>
> Cheers
> ________
>
> From: "Gregory Farnum"
> To: "Andrei Mikhailovsky"
On Wed, Oct 1, 2014 at 9:21 AM, Mark Nelson wrote:
> On 10/01/2014 11:18 AM, Gregory Farnum wrote:
>>
>> All the stuff I'm aware of is part of the testing we're doing for
>> Giant. There is probably ongoing work in the pipeline, but the fast
>> dispatch, shar
gt; Jasper Siero
>
>
> ____
> Van: Jasper Siero
> Verzonden: donderdag 21 augustus 2014 16:43
> Aan: Gregory Farnum
> Onderwerp: RE: [ceph-users] mds isn't working anymore after osd's running full
>
> I did restart it but you
Check your clock sync on that node. That's the usual cause of this issue.
-Greg
On Wednesday, October 8, 2014, Nathan Stratton wrote:
> I have one out of 16 of my OSDs doing something odd. The logs show some
> sort of authentication issue. If I restart the OSD things are fine, but in
> a few hou
On Thu, Oct 9, 2014 at 10:55 AM, Johnu George (johnugeo)
wrote:
> Hi All,
> I have few questions regarding the Primary affinity. In the
> original blueprint
> (https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role_affinity
> ), one example has been given.
>
> For PG x
On Thu, Oct 9, 2014 at 4:24 PM, Johnu George (johnugeo)
wrote:
> Hi Greg,
> Thanks for your extremely informative post. My related questions
> are posted inline
>
> On 10/9/14, 2:21 PM, "Gregory Farnum" wrote:
>
>>On Thu, Oct 9, 2014 at 10:55 AM, Johnu
On Thu, Oct 9, 2014 at 4:01 PM, Robert LeBlanc wrote:
> I have a question regarding submitting blueprints. Should only people who
> intend to do the work of adding/changing features of Ceph submit blueprints?
> I'm not primarily a programmer (but can do programming if needed), but have
> a feature
:
> Hello Greg,
>
> No problem thanks for looking into the log. I attached the log to this email.
> I'm looking forward for the new release because it would be nice to have more
> possibilities to diagnose problems.
>
> Kind regards,
>
> Jasper Siero
>
On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary wrote:
> Hi,
>
> On a 0.80.6 cluster the command
>
> ceph tell osd.6 version
>
> hangs forever. I checked that it establishes a TCP connection to the OSD,
> raised the OSD debug level to 20 and I do not see
>
> https://github.com/ceph/ceph/blob/firefl
On Sun, Oct 12, 2014 at 9:10 AM, Loic Dachary wrote:
>
>
> On 12/10/2014 17:48, Gregory Farnum wrote:
>> On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary wrote:
>>> Hi,
>>>
>>> On a 0.80.6 cluster the command
>>>
>>> ceph tell osd.6 versio
On Sun, Oct 12, 2014 at 9:29 AM, Loic Dachary wrote:
>
>
> On 12/10/2014 18:22, Gregory Farnum wrote:
>> On Sun, Oct 12, 2014 at 9:10 AM, Loic Dachary wrote:
>>>
>>>
>>> On 12/10/2014 17:48, Gregory Farnum wrote:
>>>> On Sun, Oct 12, 2014 at
On Mon, Oct 13, 2014 at 11:32 AM, Martin Mailand wrote:
> Hi List,
>
> I have a ceph cluster setup with two networks, one for public traffic
> and one for cluster traffic.
> Network failures in the public network are handled quite well, but
> network failures in the cluster network are handled ver
On Mon, Oct 13, 2014 at 4:04 PM, Wido den Hollander wrote:
> On 14-10-14 00:53, Anthony Alba wrote:
>> Following the manual starter guide, I set up a Ceph cluster with HEALTH_OK,
>> (1 mon, 2 osd). In testing out auth commands I misconfigured the
>> client.admin key by accidentally deleting "mon
On Monday, October 13, 2014, Lionel Bouton wrote:
> Hi,
>
> # First a short description of our Ceph setup
>
> You can skip to the next section ("Main questions") to save time and
> come back to this one if you need more context.
>
> We are currently moving away from DRBD-based storage backed by R
On Monday, October 13, 2014, Anthony Alba wrote:
> > You can disable cephx completely, fix the key and enable cephx again.
> >
> > auth_cluster_required, auth_service_required and auth_client_required
>
> That did not work: i.e disabling cephx in the cluster conf and
> restarting the cluster.
> T
version is ceph version 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82).
>
> On 13.10.2014 21:45, Gregory Farnum wrote:
>> How did you test taking down the connection?
>> What config options have you specified on the OSDs and in the monitor?
>>
>> None of the scenarios you're describing make mu
t; Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero
> wrote:
>> Hello Greg,
>>
>> No problem thanks for looking into the log. I attached the log to this email.
>> I'm looking forward for the new release
On Wed, Oct 15, 2014 at 9:39 AM, Dmitry Borodaenko
wrote:
> On Tue, Sep 30, 2014 at 6:49 PM, Dmitry Borodaenko
> wrote:
>> Last stable Firefly release (v0.80.5) was tagged on July 29 (over 2
>> months ago). Since then, there were twice as many commits merged into
>> the firefly branch than there
If you're running a single client to drive these tests, that's your
bottleneck. Try running multiple clients and aggregating their numbers.
-Greg
On Thursday, October 16, 2014, Mark Wu wrote:
> Hi list,
>
> During my test, I found ceph doesn't scale as I expected on a 30 osds
> cluster.
> The fo
n reach the peak. The client is fio and also running on osd nodes.
> But there're no bottlenecks on cpu or network. I also tried running client
> on two non osd servers, but the same result.
>
> 2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" 写道:
>
>> If you're
This is a common constraint in many erasure coding storage system. It
arises because random writes turn into a read-modify-write cycle (in order
to redo the parity calculations). So we simply disallow them in EC pools,
which works fine for the target use cases right now.
-Greg
On Monday, October 2
On Mon, Oct 20, 2014 at 8:25 AM, Lionel Bouton wrote:
> Hi,
>
> More information on our Btrfs tests.
>
> Le 14/10/2014 19:53, Lionel Bouton a écrit :
>
>
>
> Current plan: wait at least a week to study 3.17.0 behavior and upgrade the
> 3.12.21 nodes to 3.17.0 if all goes well.
>
>
> 3.17.0 and 3.1
On Tuesday, October 21, 2014, Chad Seys wrote:
> Hi Craig,
>
> > It's part of the way the CRUSH hashing works. Any change to the CRUSH
> map
> > causes the algorithm to change slightly.
>
> Dan@cern could not replicate my observations, so I plan to follow his
> procedure (fake create an OSD, wai
On Tue, Oct 21, 2014 at 10:15 AM, Lionel Bouton wrote:
> Hi,
>
> I've yet to install 0.80.7 on one node to confirm its stability and use
> the new IO prirority tuning parameters enabling prioritized access to
> data from client requests.
>
> In the meantime, faced with large slowdowns caused by re
Are these tests conducted using a local fs on RBD, or using CephFS?
If CephFS, do you have multiple clients mounting the FS, and what are
they doing? What client (kernel or ceph-fuse)?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazar
ctivity.
> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.
>
> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum wrote:
>> Are these tests conducted using a local fs on RBD, or using CephFS?
>> If CephFS, do you have multiple clients mounting the FS, and what are
>
There's an issue in master branch temporarily that makes rbd reads
greater than the cache size hang (if the cache was on). This might be
that. (Jason is working on it: http://tracker.ceph.com/issues/9854)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Oct 23, 2014 at 5
On Tue, Oct 28, 2014 at 3:24 AM, Cristian Falcas
wrote:
> Hello,
>
> In the documentation about creating an cache pool, you find this:
>
> "Cache mode
>
> The most important policy is the cache mode:
>
> ceph osd pool set foo-hot cache-mode writeback"
>
> But when trying to run the above command,
On Mon, Oct 27, 2014 at 11:37 AM, Patrick Darley
wrote:
> Hi there
>
> Over the last week or so, I've been trying to connect a ceph monitor node
> running on a baserock system
> to connect to a simple 3-node ubuntu ceph cluster.
>
> The 3 node ubunutu cluster was created by following the documente
ing the undumping and starting the mds:
> http://pastebin.com/y14pSvM0
>
> Kind Regards,
>
> Jasper
>
> Van: john.sp...@inktank.com [john.sp...@inktank.com] namens John Spray
> [john.sp...@redhat.com]
> Verzonden: donderdag 16 oktober 2014 12:23
&g
has gone wrong. You don't need to run it
from the new monitor, so if you're having trouble getting the keys to
behave I'd just run it from an existing system. :)
-Greg
On Tue, Oct 28, 2014 at 10:11 AM, Patrick Darley
wrote:
> On 2014-10-28 16:08, Gregory Farnum wrote:
>>
On Thu, Oct 23, 2014 at 6:41 AM, Chris Kitzmiller
wrote:
> On Oct 22, 2014, at 8:22 PM, Craig Lewis wrote:
>
> Shot in the dark: try manually deep-scrubbing the PG. You could also try
> marking various osd's OUT, in an attempt to get the acting set to include
> osd.25 again, then do the deep-scru
[Re-adding the list, so this is archived for future posterity.]
On Wed, Oct 29, 2014 at 6:11 AM, Patrick Darley
wrote:
>
> Thanks again for the reply Greg!
>
> On 2014-10-28 17:39, Gregory Farnum wrote:
>>
>> I'm sorry, you're right — I misread it. :(
>
On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
wrote:
> Hello Greg,
>
> I added the debug options which you mentioned and started the process again:
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph
> --res
Dan (who wrote that slide deck) is probably your best bet here, but I
believe pool deletion is not very configurable and fairly expensive
right now. I suspect that it will get better in Hammer or Infernalis,
once we have a unified op work queue that we can independently
prioritize all IO through (t
On Wed, Oct 29, 2014 at 7:49 AM, Daniel Schneller
wrote:
> Hi!
>
> We are exploring options to regularly preserve (i.e. backup) the
> contents of the pools backing our rados gateways. For that we create
> nightly snapshots of all the relevant pools when there is no activity
> on the system to get
On Fri, Oct 31, 2014 at 9:55 AM, Narendra Trivedi (natrived)
wrote:
> Hi All,
>
>
>
> I have been working with Openstack Swift + radosgw to stress the whole
> object storage from the Swift side (I have been creating containers and
> objects for days now) but can’t actually find the limitation when
s been configured?
>
> --Narendra
>
> -Original Message-----
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Friday, October 31, 2014 11:58 AM
> To: Narendra Trivedi (natrived)
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Swift + radosgw: How do I fi
What happened when you did the OSD prepare and activate steps?
Since your OSDs are either not running or can't communicate with the
monitors, there should be some indication from those steps.
-Greg
On Sun, Nov 2, 2014 at 6:44 AM Shiv Raj Singh wrote:
> Hi All
>
> I am new to ceph and I have been
On Mon, Nov 3, 2014 at 7:46 AM, Chad Seys wrote:
> Hi All,
>I upgraded from emperor to firefly. Initial upgrade went smoothly and all
> placement groups were active+clean .
> Next I executed
> 'ceph osd crush tunables optimal'
> to upgrade CRUSH mapping.
Okay...you know that's a data mov
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand
wrote:
> Update :
>
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
> [21787] 0 21780 492110 185044 920 240143 0
> ceph-mon
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
> [131
[ Re-adding the list. ]
On Mon, Nov 3, 2014 at 10:49 AM, Chad Seys wrote:
>
>> > Next I executed
>> >
>> > 'ceph osd crush tunables optimal'
>> >
>> > to upgrade CRUSH mapping.
>>
>> Okay...you know that's a data movement command, right?
>
> Yes.
>
>> So you should expect it to impact operati
Okay, assuming this is semi-predictable, can you start up one of the
OSDs that is going to fail with "debug osd = 20", "debug filestore =
20", and "debug ms = 1" in the config file and then put the OSD log
somewhere accessible after it's crashed?
Can you also verify that all of your monitors are r
On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys wrote:
> On Monday, November 03, 2014 13:22:47 you wrote:
>> Okay, assuming this is semi-predictable, can you start up one of the
>> OSDs that is going to fail with "debug osd = 20", "debug filestore =
>> 20", and "debug ms = 1" in the config file and the
On Mon, Nov 3, 2014 at 12:28 PM, Chad Seys wrote:
> On Monday, November 03, 2014 13:50:05 you wrote:
>> On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys wrote:
>> > On Monday, November 03, 2014 13:22:47 you wrote:
>> >> Okay, assuming this is semi-predictable, can you start up one of the
>> >> OSDs tha
On Thu, Oct 30, 2014 at 4:48 AM, Daniel Schneller
wrote:
> Apart from the current "there is a bug" part, is the idea to copy a snapshot
> into a new pool a viable one for a full-backup-restore?
Well, kinda? I mean, it's funneling everything through a single node
and relies on being able to copy t
On Thu, Oct 30, 2014 at 8:13 AM, Cristian Falcas
wrote:
> Hello,
>
> I have an one node ceph installation and when trying to import an
> image using qemu, it works fine for some time and after that the osd
> process starts using ~100% of cpu and the number of op/s increases and
> the writes decrea
On Wed, Nov 5, 2014 at 7:24 AM, Chad Seys wrote:
> Hi Sam,
>
>> Incomplete usually means the pgs do not have any complete copies. Did
>> you previously have more osds?
>
> No. But could have OSDs quitting after hitting assert(0 == "we got a bad
> state machine event"), or interacting with kernel
Yes, you can get the OSDs back if you replace the server.
In fact, in your case you might not want to bother including hosts as a
distinguishable entity in the crush map; and then to "replace the server"
you could hair mount the LUNs somewhere else and turn on the OSDs. You
would need to set a few
I believe we use base-2 space accounting everywhere. Joao could confirm on
that.
-Greg
On Fri, Nov 7, 2014 at 5:50 AM Robert Sander
wrote:
> Hi,
>
> I just create a simple check_MK agent plugin and accompanying checks to
> monitor the overall health status and pool usage with the check_MK / OMD
>
Did you upgrade your clients along with the MDS? This warning indicates the
MDS asked the clients to boot some inboxes out of cache and they have taken
too long to do so.
It might also just mean that you're actively using more inodes at any given
time than your MDS is configured to keep in memory.
e so I uploaded it to another one:
>>>
>>> http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz
>>>
>>> Thanks,
>>>
>>> Jasper
>>>
>>>
>>> Van: gregory.far...@inktank.com [gregory.f
On Fri, Nov 7, 2014 at 2:40 PM, Erik Logtenberg wrote:
> Hi,
>
> My MDS is very slow, and it logs stuff like this:
>
> 2014-11-07 23:38:41.154939 7f8180a31700 0 log_channel(default) log
> [WRN] : 2 slow requests, 1 included below; oldest blocked for >
> 187.777061 secs
> 2014-11-07 23:38:41.15495
Yep! I mean, you don't do anything to register the osd with the cluster
again, you just turn it on and it goes to register its new location.
-Greg
On Sat, Nov 8, 2014 at 2:35 AM Mario Giammarco wrote:
> Gregory Farnum writes:
>
> >
> >
> > and then to "replac
When acting as a cache pool it needs to go do a lookup on the base pool for
every object it hasn't encountered before. I assume that's why it's slower.
(The penalty should not be nearly as high as you're seeing here, but based
on the low numbers I imagine you're running everything on an overloaded
It's all about the disk accesses. What's the slow part when you dump
historic and in-progress ops?
On Sat, Nov 8, 2014 at 2:30 PM Loic Dachary wrote:
> Hi Greg,
>
> On 08/11/2014 20:19, Gregory Farnum wrote:> When acting as a cache pool it
> needs to go do a lookup o
On Sat, Nov 8, 2014 at 3:24 PM, Loic Dachary wrote:
>
>
> On 09/11/2014 00:03, Gregory Farnum wrote:
>> It's all about the disk accesses. What's the slow part when you dump
>> historic and in-progress ops?
>
> This is what I see on g1 (6% iowait)
Yeah,
On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal)
wrote:
> Hi
>
> I am on ceph 0.87, RHEL 7
>
> Out of 60 few osd’s start and the rest complain about mismatch about id’s as
> below.
>
>
>
> 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53
>
> 2014-11-09 07:09:55.81004
clones == snapshotted objects
On Sun, Nov 9, 2014 at 9:59 PM Mallikarjun Biradar <
mallikarjuna.bira...@gmail.com> wrote:
> Anybody observed this?
>
> On Thu, Oct 30, 2014 at 12:18 PM, Mallikarjun Biradar <
> mallikarjuna.bira...@gmail.com> wrote:
>
>> What exactly "clone" field from "rados df"mea
4-08-21
> 15:14:45.430926 602'31312014-08-18 15:14:37.494913
>
> I there a way to solve this?
>
> Kind regards,
>
> Jasper
>
> Van: Gregory Farnum [g...@gregs42.com]
> Verzonden: vrijdag 7 november 2014 2
On Mon, Nov 10, 2014 at 2:21 PM, Jason wrote:
> I have searched the list archives, and have seen a couple of references
> to this question, but no real solution, unfortunately...
>
> We are running multiple ceph clusters, pretty much as media appliances.
> As such, the number of nodes is variable,
On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar
wrote:
> Hi all,
>
> Triggering shallow scrub on OSD where scrub is already in progress, restarts
> scrub from beginning on that OSD.
>
>
> Steps:
> Triggered shallow scrub on an OSD (Cluster is running heavy IO)
> While scrub is in progress, tri
Yep! Every other stable release gets the LFS treatment. We're still fixing
bugs and backporting some minor features to Dumpling, but haven't done any
serious updates to Emperor since Firefly came out. Giant will be superseded
by Hammer in the February timeframe, if I have my dates right.
-Greg
On T
701 - 800 of 2358 matches
Mail list logo