Have to make changes at the level of API / ABI kernel. May be better to use
KVM virtualization with support Rados Block Device (RBD)?
2014-04-01 10:10 GMT+04:00 Vilobh Meshram :
> But if needed to be done how easy/hard it is ? Or we need to take into
> consideration anything before back portin
Not backport for 2.6.32 and in the future is not planned.
2014-04-01 9:19 GMT+04:00 Vilobh Meshram :
> What is the procedure to back port rbd.ko to 2.6.32 Linux Kernel ?
>
> Thanks,
> Vilobh
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.c
What is the procedure to back port rbd.ko to 2.6.32 Linux Kernel ?
Thanks,
Vilobh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
I'm taking a look at erasure coded pools in (ceph 0.78-336-gb9e29ca).
I'm doing a simple test where I use 'rados put' to load a 1G file into
an erasure coded pool, and then 'rados rm' to remove it later. Checking
with 'rados df' shows no objects in the pool and no KB, but the object
space is s
On Mon, 31 Mar 2014, Michael Nelson wrote:
Hi Loic,
On Sun, 30 Mar 2014, Loic Dachary wrote:
Hi Michael,
I'm trying to reproduce the problem from sources (today's instead of
yesterday's but there is no difference that could explain the behaviour you
have):
cd src
rm -fr /tmp/dev /tmp/o
On 03/31/2014 03:03 PM, Brendan Moloney wrote:
Hi,
I was wondering if RBD snapshots use the CRUSH map to distribute
snapshot data and live data on different failure domains? If not, would
it be feasible in the future?
Currently rbd snapshots and live objects are stored in the same place,
since
Hi,
I was wondering if RBD snapshots use the CRUSH map to distribute snapshot data
and live data on different failure domains? If not, would it be feasible in the
future?
Thanks,
Brendan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lis
Thanks Greg,
Looking forward to the new release!
Regards,
Quenten Grasso
-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Tuesday, 1 April 2014 3:08 AM
To: Quenten Grasso
Cc: Kyle Bader; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD Restarts cause excess
Even though it is included in /etc/rc.modules
and initramfs has been updated.
Suggestions much appreciated.
MTIA,
dk
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Yes, Zheng's fix for the MDS crash is in current mainline and will be
in the next Firefly RC release.
Sage, is there something else we can/should be doing when a client
goes to sleep that we aren't already? (ie, flushing out all dirty data
or something and disconnecting?)
-Greg
Software Engineer #
Well that was quick!
osd.0 crashed already, here's the log (~20 MiB):
http://www.aarontc.com/logs/ceph-osd.0.log.bz2
I updated the bug report as well.
Thanks,
-Aaron
On Mon, Mar 31, 2014 at 2:16 PM, Aaron Ten Clay wrote:
> Greg,
>
> I'm in the process of doing so now. joshd asked for "debug
On Mon, 31 Mar 2014, Jens Kristian Søgaard wrote:
> Hi,
>
> > Perhaps as a workaround you should just wipe this mon's data dir and
> > remake it?
>
> That's a possibility ofcourse!
>
> I really would like to know if there's something to gain from using
> leveldb from ceph-extras instead of the d
Greg,
I'm in the process of doing so now. joshd asked for "debug filestore = 20"
as well, and I just restarted an OSD with those changes. As soon as it
crashes again, I'll post the log file.
joshd also had me open a bug: http://tracker.ceph.com/issues/7922
Thanks,
-Aaron
On Mon, Mar 31, 2014 a
Can you reproduce this with "debug osd = 20" and "debug ms = 1" set on
the OSD? I think we'll need that data to track down what exactly has
gone wrong here.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Mar 31, 2014 at 1:22 PM, Aaron Ten Clay wrote:
> Hello fellow Ce
Thanks for the prompt reply.
The OSDs are set up on dedicated devices, and
the mappings are in /etc/fstab. mount shows:
/dev/rssda on /var/lib/ceph/osd/ceph-0 type xfs (rw)
and similar on all other nodes.
Thx,
dk
On Mon, Mar 31, 2014 at 1:12 PM, Gregory Farnum wrote:
> Well, you killed them a
Hello fellow Cephers!
Recently, before and after the update from 0.77 to 0.78, about half the
OSDs in my cluster crash quite frequently with 'osd/PG.cc: 5255: FAILED
assert(0 == "we got a bad state machine event")'
I'm not sure if this is a bug (there are some similar-sounding reports in
Redmine
Well, you killed them as part of the reboot...they should have
restarted automatically when the system turned on, but that will
depend on your configuration and how they were set up. (Eg, if they
are each getting a dedicated hard drive, make sure the system knows
the drive is present.)
What version
Hi Greg,
Thanks for the prompt response.
Sure enough, I do see all the OSDs are now down.
However, I do not understand the meaning of the
sentence about killing the OSDs. This was an OS
level reboot of the entire cluster, not issuing any
ceph commands either before or after the restart.
Doesn't Cep
If you wait longer, you should see the remaining OSDs get marked down.
We detect down OSDs in two ways:
1) OSDs heartbeat each other frequently and issue reports when the
heartbeat responses take too long. (This is the main way.)
2) OSDs periodically send statistics to the monitors, and if these
st
On a 4 node cluster (admin + 3 mon/osd nodes) I see the following shortly
after rebooting the cluster and waiting for a couple of minutes:
root@rts23:~# ps -ef | grep ceph && ceph osd tree
root 4183 1 0 12:09 ?00:00:00 /usr/bin/ceph-mon
--cluster=ceph -i rts23 -f
root 577
What logs should I see to explore more about what might be going wrong
here ?
Thanks,
Vilobh
On 3/30/14, 3:00 PM, "Vilobh Meshram" wrote:
>Hi Loic,
>
>Thanks for your reply.
>Not really I have setup 3 nodes as storage nodes ³ceph osd tree² output
>also confirms that.
>
>I was more concerned abo
Hi Patrick
When you call the clone method, add an extra argument specifying the
features to be used for the clone:
1 for layering only
3 for layering and striping
Adapt the value to your requirements
Then it should work.
JC
On Monday, March 31, 2014, COPINE Patrick wrote:
> Hi,
>
> Now, I ha
Hi,
Perhaps as a workaround you should just wipe this mon's data dir and
remake it?
That's a possibility ofcourse!
I really would like to know if there's something to gain from using
leveldb from ceph-extras instead of the distribution?
--
Jens Kristian Søgaard, Mermaid Consulting ApS,
j...@
Perhaps as a workaround you should just wipe this mon's data dir and remake it?
In the past when I upgraded our mons from spinning disks to SSDs, I went
through a procedure to remake each mon from scratch (wiping and resyncing each
mon's leveldb one at a time).
I did something like this:
servic
Hi Gregory,
Is the mon process doing anything (that is, does it have any CPU
usage)? This looks to be an internal leveldb issue, but not one that
we've run into before, so I think there must be something unique about
the leveldb store involved.
No, it is not doing anything at all.
I'm not sur
Hi,
I can't reproduce that with a dumpling cluster:
# cat ceph.client.dpm.keyring
[client.dpm]
key = xxx
caps mon = "allow r"
caps osd = "allow x, allow rwx pool=dpm"
# ceph health --id dpm
HEALTH_OK
# ceph auth list --id dpm
Error EACCES: access denied
Cheers, Dan
_
Not directly. However, that "used" total is compiled by summing up the
output of "df" from each individual OSD disk. There's going to be some
space used up by the local filesystem metadata, by RADOS metadata like
OSD maps, and (depending on configuration) your journal files. 2350MB
/ 48 OSDs = ~49M
Is the mon process doing anything (that is, does it have any CPU
usage)? This looks to be an internal leveldb issue, but not one that
we've run into before, so I think there must be something unique about
the leveldb store involved.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
Hmm, this might be considered a bit of a design oversight. Looking at
the auth keys is a read operation, and the client has read
permissions...
You might want to explore the more fine-grained command-based monitor
permissions as a workaround, but I've created a ticket to try and
close that read per
At present, the only security permission on the MDS is "allowed to do
stuff", so "rwx" and "*" are synonymous. In general "*" means "is an
admin", though, so you'll be happier in the future if you use "rwx".
You may also want a more restrictive set of monitor capabilities as
somebody else recently
That's not an expected error when running this test; have you
validated that your cluster is working in any other ways? Eg, what's
the output of "ceph -s".
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Mar 28, 2014 at 5:53 AM, wrote:
> Hi,
>
> When I try "rados -p
I believe it's just that there was an issue for a while where the
return codes were incorrectly not being filled in, and now they are.
So the prval params you pass in when constructing the compound ops are
where the values will be set.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.
Yep, that looks like http://tracker.ceph.com/issues/7093, which is
fixed in dumpling and most of the dev releases since emperor. ;) I
also cherry-picked the fix to the emperor branch and it will be
included whenever we do another point release of that.
-Greg
Software Engineer #42 @ http://inktank.c
- Message from Martin B Nielsen -
Date: Mon, 31 Mar 2014 15:55:24 +0200
From: Martin B Nielsen
Subject: Re: [ceph-users] MDS debugging
To: Kenneth Waegeman
Cc: ceph-users
Hi,
I can see you're running mon, mds and osd on the same server.
That's true, we have 3
- Message from "Yan, Zheng" -
Date: Mon, 31 Mar 2014 21:09:20 +0800
From: "Yan, Zheng"
Subject: Re: [ceph-users] MDS debugging
To: Kenneth Waegeman
Cc: ceph-users
On Mon, Mar 31, 2014 at 7:49 PM, Kenneth Waegeman
wrote:
Hi all,
Before the weekend we started s
Hi,
I can see you're running mon, mds and osd on the same server.
Also, from a quick glance you're using around 13GB resident memory.
If you only have 16GB in your system I'm guessing you'll be swapping about
now (or close). How much mem does the system hold?
Also, how busy are the disks? Or is
On Mon, Mar 31, 2014 at 7:49 PM, Kenneth Waegeman
wrote:
> Hi all,
>
> Before the weekend we started some copying tests over ceph-fuse. Initially,
> this went ok. But then the performance started dropping gradually. Things
> are going very slow now:
what does the copying test like?
Regards
Yan,
> 31 марта 2014 г., в 12:01, Jianing Yang написал(а):
>
>
> Hi, all
>
> I've deleted everything image in my only pool but still have 2350 MB
> data remains. Is there a command that help get which files/objects are
> still in use?
>
"rados -p ls | xargs -n 1 rados -p stat" will give you th
Hi all,
Before the weekend we started some copying tests over ceph-fuse.
Initially, this went ok. But then the performance started dropping
gradually. Things are going very slow now:
2014-03-31 13:36:37.047423 mon.0 [INF] pgmap v265871: 1300 pgs: 1300
active+clean; 19872 GB data, 59953 GB
Hi, all
I've deleted everything image in my only pool but still have 2350 MB
data remains. Is there a command that help get which files/objects are
still in use?
,
|
| ceph -w
| cluster 33064485-f73e-4db2-b9d6-8f4463334619
| health HEALTH_OK
| monmap e1: 3 mons at
{a=10.86.32.
Hi,
Now, I have an exception. Before starting the Python program, I made the
following actions.
The exception is :
> root@ceph-clt:~/src# python v4.py
>
> Traceback (most recent call last):
>
> File "v4.py", line 22, in
>
> rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3
41 matches
Mail list logo