lable in both
> Rack buckets and continue running, or only rebalance in one rack bucket,
> resulting in exceeding the full ratio and locking up?
>
> Thanks,
>
> -Tom
>
> -Original Message-
> From: Gregory Farnum [mailto:g...@inktank.com]
> Sent: Tuesday, Marc
[ Re-adding the list. ]
On Mon, Mar 3, 2014 at 3:28 PM, Chris Kitzmiller
wrote:
> On Mar 3, 2014, at 4:19 PM, Gregory Farnum wrote:
>> The apply latency is how long it's taking for the backing filesystem to ack
>> (not sync to disk) writes from the OSD. Either it's gett
Hmm, at first glance it looks like you're using multiple active MDSes
and you've created some snapshots and part of that state got corrupted
somehow. The log files should have a slightly more helpful (including
line numbers) stack trace at the end, and might have more context for
what's gone wrong.
If the stripe size and object size are the same it's just chunking --
that's our default. Should work fine.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Mar 11, 2014 at 8:23 AM, Jean-Charles LOPEZ
wrote:
> Hi Dieter,
>
> you have a problem with your command.
>
> You
On Tue, Mar 11, 2014 at 1:38 PM, Sushma Gurram
wrote:
> Hi,
>
>
>
> I'm trying to follow the instructions for QEMU rbd installation at
> http://ceph.com/docs/master/rbd/qemu-rbd/
>
>
>
> I tried to write a raw qemu image to ceph cluster using the following
> command
>
> qemu-img convert -f raw -O
On Tue, Mar 11, 2014 at 2:24 PM, Sushma Gurram
wrote:
> It seems good with master branch. Sorry about the confusion.
>
> On a side note, is it possible to create/access the block device using librbd
> and run fio on it?
...yes? librbd is the userspace library that QEMU is using to access
it to b
On Wednesday, March 12, 2014, Florian Krauß
wrote:
> Hello everyone,
>
> this is the first time i ever write to a mailing list, please be patient
> with me (especially for my poor english)...
> Im trying to reach my Bachelors Degree in Computer Science, Im doing a
> Project which involves ceph.
>
On Thu, Mar 13, 2014 at 3:56 PM, Greg Poirier wrote:
> We've been seeing this issue on all of our dumpling clusters, and I'm
> wondering what might be the cause of it.
>
> In dump_historic_ops, the time between op_applied and sub_op_commit_rec or
> the time between commit_sent and sub_op_applied i
> rbd_data.67b14a2ae8944a.8fac [write 3325952~868352] 6.5255f5fd
> e660)",
> "received_at": "2014-03-13 20:41:40.227813",
> "age": "320.017087",
> "duration": "0.086852",
>
alleviated by migrating journals to SSDs, but I am looking to
> rebuild in the near future--so am willing to hobble in the meantime.
>
> I am surprised that our all SSD cluster is also underperforming. I am trying
> colocating the journal on the same disk with all SSDs at the moment
That seems a little high; how do you have your system configured? That
latency is how long it takes for the hard drive to durably write out
something to the journal.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Sun, Mar 16, 2014 at 9:59 PM, wrote:
>
> [root@storage1 ~]#
t; [osd.8]
> host = storage1
>
> [osd.9]
> host = storage1
>
> [osd.10]
> host = storage1
>
> [osd.11]
> host = storage1
>
> [osd.12]
> host = storage1
>
> [osd.13]
> host = storage1
>
> [osd.14]
> host = storage1
>
> [osd.15]
> h
On Tue, Mar 18, 2014 at 12:20 PM, Sage Weil wrote:
> On Tue, 18 Mar 2014, John Spray wrote:
>> Hi Matt,
>>
>> This is expected behaviour: pool IDs are not reused.
>
> The IDs go up, but I think the 'count' shown there should not.. i.e.
> num_pools != max_pool_id. So probably a subtle bug, I expec
Exactly what errors did you see, from which log? In general the OSD
does suicide on filesystem errors.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Mar 19, 2014 at 4:06 AM, Mike Bryant wrote:
> So I've done some more digging, and running the radosgw in debug mode I
I haven't worked with Hadoop in a while, but from the error it sounds
like the map reduce server needs another config option set specifying
which filesystem to work with. I don't think those instructions you
linked to are tested with hadoop 2.
-Greg
Software Engineer #42 @ http://inktank.com | http
When starting this you should be aware that the filesystem is not yet fully
supported.
On Thursday, March 20, 2014, Jordi Sion wrote:
> Hello,
>
> I plan to setup a Ceph cluster for a small size hosting company. The aim
> is to have customers data (website and mail folders) in a distributed
> cl
I don't remember what features should exist where, but I expect that
the cluster is making use of features that the kernel client doesn't
support yet (despite the very new kernel). Have you checked to see if
there's anything interesting in dmesg?
-Greg
Software Engineer #42 @ http://inktank.com | h
That is pretty strange, but I think probably there's somehow a mismatch
between the installed versions. Can you check with the --version flag on
both binaries?
On Monday, March 24, 2014, Mark Kirkwood
wrote:
> Hi,
>
> I'm redeploying my development cluster after building 0.78 from src on
> Ubunt
How long does it take for the OSDs to restart? Are you just issuing a
restart command via upstart/sysvinit/whatever? How many OSDMaps are
generated from the time you issue that command to the time the cluster
is healthy again?
This sounds like an issue we had for a while where OSDs would start
pee
On Mon, Mar 24, 2014 at 6:26 PM, hjcho616 wrote:
> I tried the patch twice. First time, it worked. There was no issue.
> Connected back to MDS and was happily running. All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive. Health was reported OK.
> Howev
On Tue, Mar 25, 2014 at 9:24 AM, Travis Rhoden wrote:
> Okay, last one until I get some guidance. Sorry for the spam, but wanted to
> paint a full picture. Here are debug logs from all three mons, capturing
> what looks like an election sequence to me:
>
> ceph0:
> 2014-03-25 16:17:24.324846 7fa
evels showing what the individual pipes are doing will
narrow it down on the Ceph side.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Mar 25, 2014 at 10:05 AM, Travis Rhoden wrote:
>
>
>
> On Tue, Mar 25, 2014 at 12:53 PM, Gregory Farnum wrote:
>>
>
on 0.78-325-ge5a4f5e (e5a4f5ed005c9349be94b19ef33d6fe08271c798)
>>>
>>> On 25/03/14 14:16, Mark Kirkwood wrote:
>>>>
>>>> Yeah, that is my feeling too - however both ceph and ceph-mon claim to
>>>> be the same version...and the dates on the var
On Tue, Mar 25, 2014 at 9:56 AM, hjcho616 wrote:
> I am merely putting the client to sleep and waking it up. When it is up,
> running ls on the mounted directory. As far as I am concerned at very high
> level I am doing the same thing. All are running 3.13 kernel Debian
> provided.
>
> When tha
es to 300+
>
> http://pastie.org/pastes/8968950/text?key=0e0bs1ojbm2arnexn52iwq
>
> Regards,
> Quenten
>
> -Original Message-
> From: Gregory Farnum [mailto:g...@inktank.com]
> Sent: Wednesday, 26 March 2014 2:02 AM
> To: Quenten Grasso
> Cc: Kyle Bader; ceph-users@lists.cep
I believe it's just that there was an issue for a while where the
return codes were incorrectly not being filled in, and now they are.
So the prval params you pass in when constructing the compound ops are
where the values will be set.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.
That's not an expected error when running this test; have you
validated that your cluster is working in any other ways? Eg, what's
the output of "ceph -s".
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Mar 28, 2014 at 5:53 AM, wrote:
> Hi,
>
> When I try "rados -p
At present, the only security permission on the MDS is "allowed to do
stuff", so "rwx" and "*" are synonymous. In general "*" means "is an
admin", though, so you'll be happier in the future if you use "rwx".
You may also want a more restrictive set of monitor capabilities as
somebody else recently
Hmm, this might be considered a bit of a design oversight. Looking at
the auth keys is a read operation, and the client has read
permissions...
You might want to explore the more fine-grained command-based monitor
permissions as a workaround, but I've created a ticket to try and
close that read per
Is the mon process doing anything (that is, does it have any CPU
usage)? This looks to be an internal leveldb issue, but not one that
we've run into before, so I think there must be something unique about
the leveldb store involved.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
Not directly. However, that "used" total is compiled by summing up the
output of "df" from each individual OSD disk. There's going to be some
space used up by the local filesystem metadata, by RADOS metadata like
OSD maps, and (depending on configuration) your journal files. 2350MB
/ 48 OSDs = ~49M
If you wait longer, you should see the remaining OSDs get marked down.
We detect down OSDs in two ways:
1) OSDs heartbeat each other frequently and issue reports when the
heartbeat responses take too long. (This is the main way.)
2) OSDs periodically send statistics to the monitors, and if these
st
; dk
>
> On Mon, Mar 31, 2014 at 12:47 PM, Gregory Farnum wrote:
>>
>> If you wait longer, you should see the remaining OSDs get marked down.
>> We detect down OSDs in two ways:
>> 1) OSDs heartbeat each other frequently and issue reports when the
>> heartbeat res
Can you reproduce this with "debug osd = 20" and "debug ms = 1" set on
the OSD? I think we'll need that data to track down what exactly has
gone wrong here.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Mar 31, 2014 at 1:22 PM, Aaron Ten Clay wrote:
> Hello fellow Ce
Yes, Zheng's fix for the MDS crash is in current mainline and will be
in the next Firefly RC release.
Sage, is there something else we can/should be doing when a client
goes to sleep that we aren't already? (ie, flushing out all dirty data
or something and disconnecting?)
-Greg
Software Engineer #
On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng wrote:
> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
> wrote:
>> After some more searching, I've found that the source of the problem is with
>> the mds and not the mon.. The mds crashes, generates a core dump that eats
>> the local space, and in t
fails (the first of
"mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see
if there's anything tell-tale going on at the time.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
wrote:
>
>
Emperor release (0.72) ?
>
> I think I have the same problem than hjcho616 : Debian Wheezy with 3.13
> backports, and MDS dying when a client shutdown.
>
> On 03/31/2014 11:46 PM, Gregory Farnum wrote:
>> Yes, Zheng's fix for the MDS crash is in current mainline and will be
It's been a while, but I think you need to use the long form
"client_mountpoint" config option here instead. If you search the list
archives it'll probably turn up; this is basically the only reason we
ever discuss "-r". ;)
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr
t;
> Since the mds permissions are functionally equivalent, either I need extra
> rights on the monitor, or the OSDs. Does a client need to access the
> metadata pool in order to do a CephFS mount?
>
> I'll experiment a bit and report back.
>
>
> On Mon, Mar 31, 2014 at
;
>>> --- begin dump of recent events ---
>>> -1> 2014-04-01 11:59:27.137779 7ffec89c6700 5 mds.0.10 initiating
>>> monitor reconnect; maybe we're not the slow one
>>> -> 2014-04-01 11:59:27.137787 7ffec89c6700 10 monclient(hunting):
>>> _reo
Yes.
On Thu, Apr 3, 2014 at 12:56 AM, Florent B wrote:
> Thank you Gregory !
>
> I think I found all options :
> https://github.com/ceph/ceph/blob/master/src/common/config_opts.h
>
> Is that right ?
>
> On 04/02/2014 04:19 PM, Gregory Farnum wrote:
>> It's be
The filesystem interprets nonexistent file objects as holes -- so, zeroes.
This is expected. If you actually deleted *metadata* objects it would
detect that and fail.
-Greg
On Thursday, April 3, 2014, Danny Luhde-Thompson <
da...@meantradingsystems.com> wrote:
> I accidentally removed some MDS ob
On Thursday, April 3, 2014, Chad Seys wrote:
> On Thursday, April 03, 2014 07:57:58 Dan Van Der Ster wrote:
> > Hi,
> > By my observation, I don't think that marking it out before crush rm
> would
> > be any safer.
> >
> > Normally what I do (when decommissioning an OSD or whole server) is stop
>
Ceph will allow anything; it's just providing a block device. How it
performs will depend quite a lot on the database workload you're
applying, though. We've heard from people who think it's wonderful and
others who don't, depending on what hardware they're using and what
their use case is. You'll
On Fri, Apr 4, 2014 at 11:15 AM, Milosz Tanski wrote:
> Loic,
>
> The writeup has been helpful.
>
> What I'm curious about (and hasn't been mentioned) is can we use
> erasure with CephFS? What steps have to be taken in order to setup
> erasure coding for CephFS?
Lots. CephFS takes advantage of al
On Sat, Apr 5, 2014 at 10:00 AM, Max Kutsevol wrote:
> Hello!
>
> I am new to ceph, please take that into account.
>
> I'm experimenting with 3mons+2osds setup and got into situation when I
> recreated both of osds.
>
> My pools:
> ceph> osd lspools
> 0 data,1 metadata,
>
> These are just the def
Nope, that's not supported. See
http://ceph.com/docs/master/radosgw/s3/#features-support
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Apr 7, 2014 at 6:41 PM, Craig Lewis wrote:
> Does RGW support the S3 Object Lifecycle Management?
> http://docs.aws.amazon.com/Amazo
On Tuesday, April 8, 2014, Christian Balzer wrote:
> On Tue, 08 Apr 2014 14:19:20 +0200 Josef Johansson wrote:
> >
> > On 08/04/14 10:39, Christian Balzer wrote:
> > > On Tue, 08 Apr 2014 10:31:44 +0200 Josef Johansson wrote:
> > >
> > >> On 08/04/14 10:04, Christian Balzer wrote:
> > >>> Hello,
On Tue, Apr 8, 2014 at 4:57 PM, Craig Lewis wrote:
>
>>
>> pg query says the recovery state is:
>> "might_have_unfound": [
>> { "osd": 11,
>> "status": "querying"},
>> { "osd": 13,
>> "status": "already probed"}],
>>
> I
This flag won't be listed as required if you don't have any erasure
coding parameters in your OSD/crush maps. So if you aren't using it,
you should remove the EC rules and the kernel should be happy.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 8, 2014 at 6:08 PM
Ceph is designed to handle reliability in its system rather than in an
external one. You could set it up to use that storage and not do its
own replication, but then you lose availability if the OSD process
hosts disappear, etc. And the filesystem (which I guess is the part
you're interested in) is
I don't think the backing store should be seeing any effects like
that. What are the filenames which are using up that space inside the
folders?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 9, 2014 at 1:58 AM, Mark Kirkwood
wrote:
> Hi all,
>
> I've noticed that
much is waiting to get into
the durable journal, not waiting to get flushed out of it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 9, 2014 at 3:06 AM, Christian Balzer wrote:
> On Tue, 8 Apr 2014 09:35:19 -0700 Gregory Farnum wrote:
>
>> On Tuesd
e +1.714.602.1309
> Email cle...@centraldesktop.com
>
> Central Desktop. Work together in ways you never thought possible.
> Connect with us Website | Twitter | Facebook | LinkedIn | Blog
>
> On 4/8/14 18:27 , Gregory Farnum wrote:
>
> On Tue, Apr 8, 2014 at 4:57 PM, Cr
On Wed, Apr 9, 2014 at 8:03 AM, Christian Balzer wrote:
>
> Hello,
>
> On Wed, 9 Apr 2014 07:31:53 -0700 Gregory Farnum wrote:
>
>> journal_max_write_bytes: the maximum amount of data the journal will
>> try to write at once when it's coalescing multiple pen
neral? Even if the kernel
> doesn't support EC pools directly, but would work in a cluster with EC pools
> in use?
>
> Thanks,
> -mike
>
>
> On Wed, 9 Apr 2014, Gregory Farnum wrote:
>
>> This flag won't be listed as required if you don't have any erasure
&g
y on these multi host configurations that
> have osd's using whole devices (both setups installed using ceph-deploy, so
> in theory nothing exotic about 'em except for the multi 'hosts' are actually
> VMs).
>
> Regards
>
> Mark
>
> On 10/04/14 02:27
Sounds like you want to explore the auto-in settings, which can
prevent new OSDs from being automatically accepted into the cluster.
Should turn up if you search ceph.com/docs. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Apr 10, 2014 at 1:45 PM, wrote:
> Hi All
Yes. It's awkward and the whole "two weights" thing needs a bit of UI
reworking, but it's expected behavior.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Apr 10, 2014 at 3:59 PM, Craig Lewis wrote:
> I've got some OSDs that are nearfull. Hardware is ordered, and I'
I don't know if there's any formal documentation, but it's a lot simpler
than the other components because it doesn't use any local storage (except
for the keyring). You basically just need to generate a key and turn it on.
Have you set one up by hand before?
-Greg
On Thursday, April 10, 2014, Ada
How many monitors do you have?
It's also possible that re-used numbers won't get caught in this,
depending on the process you went through to clean them up, but I
don't remember the details of the code here.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Apr 10, 2014 a
If you never ran "osd rm" then the monitors still believe it's an existing
OSD. You can run that command after doing the crush rm stuff, but you
should definitely do so.
On Friday, April 11, 2014, Chad Seys wrote:
> Hi Greg,
> > How many monitors do you have?
>
> 1 . :)
>
> > It's also possible
On Wed, Apr 9, 2014 at 8:41 PM, Mark Kirkwood
wrote:
> Redoing (attached, 1st file is for 2x space, 2nd for normal). I'm seeing:
>
> $ diff osd-du.0.txt osd-du.1.txt
> 924,925c924,925
> < 2048 /var/lib/ceph/osd/ceph-1/current/5.1a_head/file__head_2E6FB49A__5
> < 2048/var/lib/ceph/osd/ceph-1/cu
On Fri, Apr 11, 2014 at 11:12 PM, Christian Balzer wrote:
>
> Hello,
>
> 3 node cluster (2 storage with 2 OSDs one dedicated mon), 3 mons total.
> Debian Jessie, thus 3.13 kernel and Ceph 0.72.2.
>
> 2 of the mons (including the leader) are using around 100MB RSS and one
> was using about 1.1GB.
>
That bug was resolved a long time ago; as long as you're using one of the
Emperor point releases you'll be fine.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Apr 14, 2014 at 1:46 AM, Stanislav Yanchev
wrote:
> Hello, I have a question about upgrading from the latest
On Thu, Apr 10, 2014 at 7:27 PM, Adam Clark wrote:
> Wow, that was quite simple
>
> mkdir /var/lib/ceph/mds/ceph-0
> ceph auth get-or-create mds.0 mds 'allow' osd 'allow *' mon 'allow *' >
> /var/lib/ceph/mds/ceph-0/keyring
> ceph-mds --id 0
>
> mount -t ceph ceph-mon01:6789:/ /mnt -o name=admin,
You just need to wait for the ondisk or complete ack in whatever
interface you choose. It won't come back until the data is persisted
to all extant copies.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Apr 7, 2014 at 4:08 PM, Steven Paster wrote:
> I am using the Cep
This looks like some kind of HBase issue to me (which I can't help
with; I've never used it), but I guess if I were looking at Ceph I'd
check if it was somehow configured such that the needed files are
located in different pools (or other separate security domains) that
might be set up wrong.
-Greg
Don't do that. I'm pretty sure it doesn't actually work, and if it
does it certainly won' perform better than with it off.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 15, 2014 at 1:53 PM, Qing Zheng wrote:
> Hi -
>
> We have a question on mds journaling.
>
> Is
ion to?
>
> Cheers,
>
> -- Qing
>
> -Original Message-
> From: Gregory Farnum [mailto:g...@inktank.com]
> Sent: Tuesday, April 15, 2014 5:02 PM
> To: Qing Zheng
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] ceph mds log
>
> Don't do tha
What are the results of "ceph osd pg 11.483 query"?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 15, 2014 at 4:01 PM, Craig Lewis wrote:
> I have 1 incomplete PG. The data is gone, but I can upload it again. I
> just need to make the cluster start working so I
What's the backtrace from the MDS crash?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 16, 2014 at 7:11 AM, Georg Höllrigl
wrote:
> Hello,
>
> Using Ceph MDS with one active and one standby server - a day ago one of the
> mds crashed and I restarted it.
> Tonight
On Wed, Apr 16, 2014 at 8:08 AM, Dan van der Ster
wrote:
> Dear ceph-users,
>
> I've recently started looking through our FileStore logs to better
> understand the VM/RBD IO patterns, and noticed something interesting. Here
> is a snapshot of the write lengths for one OSD server (with 24 OSDs) --
s
> Senior Systems Engineer
> Office +1.714.602.1309
> Email cle...@centraldesktop.com
>
> Central Desktop. Work together in ways you never thought possible.
> Connect with us Website | Twitter | Facebook | LinkedIn | Blog
>
> On 4/15/14 16:07 , Gregory Farnum
On Thu, Apr 17, 2014 at 12:45 AM, Georg Höllrigl
wrote:
> Hello Greg,
>
> I've searched - but don't see any backtraces... I've tried to get some more
> info out of the logs. I really hope, there is something interesting in it:
>
> It all started two days ago with an authentication error:
>
> 2014-
On Monday, April 21, 2014, Loic Dachary wrote:
> Hi,
>
> I would like to allow users to create,use and delete RBD volumes, up to X
> GB, from a single pool. The user is a Debian GNU/Linux box using krbd. The
> sysadmin of the box is not trusted to have unlimited access to the Ceph
> cluster but (
On Thursday, April 24, 2014, Georg Höllrigl
wrote:
>
> And that's exactly what it sounds like — the MDS isn't finding objects
>> that are supposed to be in the RADOS cluster.
>>
>
> I'm not sure, what I should think about that. MDS shouldn't access data
> for RADOS and vice versa?
The metadata
Yehuda says he's fixed several of these bugs in recent code, but if
you're seeing it from a recent dev release, please file a bug!
Likewise if you're on a named release and would like to see a backport. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Apr 24, 2014 at
If you had it working in Havana I think you must have been using a
customized code base; you can still do the same for Icehouse.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Apr 25, 2014 at 12:55 AM, Maciej Gałkiewicz
wrote:
> Hi
>
> After upgrading my OpenStack clu
Hmm, it looks like your on-disk SessionMap is horrendously out of
date. Did your cluster get full at some point?
In any case, we're working on tools to repair this now but they aren't
ready for use yet. Probably the only thing you could do is create an
empty sessionmap with a higher version than t
The monitor requires at least number of reports, from a set of
OSDs whose size is at least . So with 9 reporters and 3 reports,
it would wait until 9 OSDs had reported an OSD down (basically ignoring the
reports setting, as it is smaller).
-Greg
On Friday, April 25, 2014, Craig Lewis wrote:
>
Bobtail is really too old to draw any meaningful conclusions from; why
did you choose it?
That's not to say that performance on current code will be better
(though it very much might be), but the internal architecture has
changed in some ways that will be particularly important for the futex
profi
This usually means that your OSDs all stopped running at the same time, and
will eventually be marked down by the monitors. You should verify that
they're running.
-Greg
On Saturday, April 26, 2014, Srinivasa Rao Ragolu
wrote:
> Hi,
>
> My monitor node and osd nodes are running fine. But my clus
It is not. My guess from looking at the time stamps is that maybe you have
a log rotation system set up that isn't working properly?
-Greg
On Sunday, April 27, 2014, Indra Pramana wrote:
> Dear all,
>
> I have multiple OSDs per node (normally 4) and I realised that for all the
> nodes that I hav
3 ceph-osd.15.log.3.gz
>
> Any advice?
>
> Thank you.
>
>
> On Mon, Apr 28, 2014 at 11:26 PM, Gregory Farnum
>
> > wrote:
>
>> It is not. My guess from looking at the time stamps is that maybe you
>> have a log rotation system set up that isn't wor
one.
>
> Is there a way I can verify if the logs are actually being written by the
> ceph-osd processes?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
>
> On Tue, Apr 29, 2014 at 12:28 PM, Gregory Farnum wrote:
>>
>> Are your OSDs actual
That's not quite how Ceph works. I recommend perusing some of the
introductory documentation at ceph.com/docs, but in short:
When you set up a ceph pool, you are specifying groups of hard drives which
will be used together.
When you create an RBD volume in a pool, you are saying "I want this volume
Monitor keys don't change; I think something else must be going on. Did you
remove any of their stores? Are the local filesystems actually correct
(fsck)?
The ceph-create-keys is a red herring and will stop as soon as. The
monitors do get into a quorum.
-Greg
On Tuesday, April 29, 2014, Marc wro
ind it
> when needed... maybe somewhere in the debugging section of the wiki?
>
> On 29/04/2014 18:25, Gregory Farnum wrote:
>> Monitor keys don't change; I think something else must be going on. Did you
>> remove any of their stores? Are the local filesystems actually correct
You'll need to go look at the individual OSDs to determine why they
aren't on. All the cluster knows is that the OSDs aren't communicating
properly.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 29, 2014 at 3:06 AM, Gandalf Corvotempesta
wrote:
> After a simple "
It looks like the OSD is expecting a file to be there, and it is, but
it's incorrectly empty or something. Did you lose power to the node?
Have you run fsck on the local filesystem?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 29, 2014 at 3:05 AM, vernon1...@126.
Hmm, I think this might actually be another instance of
http://tracker.ceph.com/issues/8232, which was just reported
yesterday.
That said, I think that if you restart one OSD at a time, you should
be able to avoid the race condition. It was restarting all of them
simultaneously that got you into tr
On Tue, Apr 29, 2014 at 3:28 PM, Marc wrote:
> Thank you for the help so far! I went for option 1 and that did solve
> that problem. However quorum has not been restored. Here's the
> information I can get:
>
> mon a+b are in state Electing and have been for more than 2 hours now.
> mon c does rep
to reboot the hosts as Guang Yang reported in the issue
> tracking #8232 to resolve this issue?
>
> Best regards,
> Thanh Tran
>
>
> On Wed, Apr 30, 2014 at 12:53 AM, Gregory Farnum wrote:
>>
>> Hmm, I think this might actually be another instance of
>> http
What's your cluster look like? I wonder if you can just remove the bad
PG from osd.4 and let it recover from the existing osd.1
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
wrote:
> This is all on firefly rc1 on CentOS 6
>
> I ha
"Need" means "I know this version of the object has existed at some
time in the cluster". "Have" means "this is the newest version of the
object I currently have available". If you're missing OSDs (or have
been in the past) you may need to invoke some of the "lost" commands
to tell the OSDs to just
e
>> tried downing osd.4 and manually deleting the pg directory in question with
>> the hope that the cluster would roll back epochs for 0.2f, but all it does
>> is recreate the pg directory (empty) on osd.4.
>>
>> Jeff
>>
>> On 05/05/2014 04:33 PM, Gregory Fa
down.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> ________
> vernon1...@126.com
>
> From: Gregory Farnum
> Date: 2014-05-06 05:06
> To: vernon1...@126.com
> CC: ceph-users
> Subject: Re: [ceph-users] some unfound object
&
On Wed, May 7, 2014 at 5:05 AM, Gandalf Corvotempesta
wrote:
> Very simple question: what happen if server bound to the cache pool goes down?
> For example, a read-only cache could be archived by using a single
> server with no redudancy.
> Is ceph smart enough to detect that cache is unavailable
1001 - 1100 of 2358 matches
Mail list logo