We have a single VM that is acting odd. We had 7 SSD OSDs (out of 40) go
down over a period of about 12 hours. These are a cache tier and have size
4, min_size 2. I'm not able to make heads or tails of the error and hoped
someone here could help.
2016-01-14 23:09:54.559121 osd.136 [ERR] 13.503 cop
It looks like the gateway is experiencing a similar race condition to
what we reported before.
The rados object has a size of 0 bytes but the bucket index shows the
object listed and the object metadata shows a size of
7147520 bytes.
I have a lot of logs but I don't think any of them have the
Hey ceph-users,
I wanted to follow up, Zheng's patch did the trick. We re-added the removed
mds, and it all came back. We're sync-ing our data off to a backup server.
Thanks for all of the help, Ceph has a great community to work with!
Mike C
On Thu, Jan 14, 2016 at 4:46 PM, Yan, Zheng wrote:
On 12/01/16 01:22, Stillwell, Bryan wrote:
>> Well, it seems I spoke to soon. Not sure what logic the udev rules use
>> >to identify ceph journals, but it doesn't seem to pick up on the
>> >journals in our case as after a reboot, those partitions are owned by
>> >root:disk with permissions 0660.
I am not sure why this is happening someone used s3cmd to upload around
130,000 7mb objects to a single bucket. Now we are tearing down the
cluster to rebuild it better, stronger, and hopefully faster. Before we
destroy it we need to download all of the data. I am running through all
of the key
Do I apply this against the v9.2.0 git tag?
On Thu, Jan 14, 2016 at 4:48 PM, Dyweni - Ceph-Users <
6exbab4fy...@dyweni.com> wrote:
> Your patch lists the command as "addfailed" but the email lists the
> command as "add failed". (Note the space).
>
>
>
>
>
> On 2016-01-14 18:46, Yan, Zheng wrote:
Your patch lists the command as "addfailed" but the email lists the
command as "add failed". (Note the space).
On 2016-01-14 18:46, Yan, Zheng wrote:
Here is patch for v9.2.0. After install the modified version of
ceph-mon, run “ceph mds add failed 1”
On Jan 15, 2016, at 08:20, Mike
Here is patch for v9.2.0. After install the modified version of ceph-mon, run
“ceph mds add failed 1”
mds_addfailed.patch
Description: Binary data
> On Jan 15, 2016, at 08:20, Mike Carlson wrote:
>
> okay, that sounds really good.
>
> Would it help if you had access to our cluster?
>
>
On Fri, Jan 15, 2016 at 12:23 AM, Sage Weil wrote:
> On Fri, 15 Jan 2016, Yan, Zheng wrote:
>> > On Jan 15, 2016, at 08:16, Mike Carlson wrote:
>> >
>> > Did I just loose all of my data?
>> >
>> > If we were able to export the journal, could we create a brand new mds out
>> > of that and retriev
On Fri, 15 Jan 2016, Yan, Zheng wrote:
> > On Jan 15, 2016, at 08:16, Mike Carlson wrote:
> >
> > Did I just loose all of my data?
> >
> > If we were able to export the journal, could we create a brand new mds out
> > of that and retrieve our data?
>
> No. it’s early to fix. but you need to re
okay, that sounds really good.
Would it help if you had access to our cluster?
On Thu, Jan 14, 2016 at 4:19 PM, Yan, Zheng wrote:
>
> > On Jan 15, 2016, at 08:16, Mike Carlson wrote:
> >
> > Did I just loose all of my data?
> >
> > If we were able to export the journal, could we create a brand
> On Jan 15, 2016, at 08:16, Mike Carlson wrote:
>
> Did I just loose all of my data?
>
> If we were able to export the journal, could we create a brand new mds out of
> that and retrieve our data?
No. it’s early to fix. but you need to re-compile ceph-mon from source code.
I’m writing the p
Did I just loose all of my data?
If we were able to export the journal, could we create a brand new mds out
of that and retrieve our data?
On Thu, Jan 14, 2016 at 4:15 PM, Yan, Zheng wrote:
>
> > On Jan 15, 2016, at 08:01, Gregory Farnum wrote:
> >
> > On Thu, Jan 14, 2016 at 3:46 PM, Mike Car
> On Jan 15, 2016, at 08:01, Gregory Farnum wrote:
>
> On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson wrote:
>> Hey Zheng,
>>
>> I've been in the #ceph irc channel all day about this.
>>
>> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
>> did a "ceph mds rmfailed 1"
> On Jan 15, 2016, at 08:01, Gregory Farnum wrote:
>
> On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson wrote:
>> Hey Zheng,
>>
>> I've been in the #ceph irc channel all day about this.
>>
>> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
>> did a "ceph mds rmfailed 1"
On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson wrote:
> Hey Zheng,
>
> I've been in the #ceph irc channel all day about this.
>
> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
>
> # ceph mds stop 1
> Error
Hey Zheng,
I've been in the #ceph irc channel all day about this.
We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
# ceph mds stop 1
Error EEXIST: mds.1 not active (???)
Our mds in a state of resolve, and w
On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson wrote:
> Thank you for the reply Zheng
>
> We tried set mds bal frag to true, but the end result was less than
> desirable. All nfs and smb clients could no longer browse the share, they
> would hang on a directory with anything more than a few hundred
rbd-nbd uses librbd directly -- it runs as a user-space daemon process and
interacts with the kernel NBD commands via a UNIX socket. As a result, it
supports all image features supported by librbd. You can use the rbd CLI to
map/unmap RBD-based NBDs [1] similar to how you map/unmap images via
This month’s Ceph Advisory Board meeting notes have been added to the Ceph wiki:
wiki.ceph.com/Ceph_Advisory_Board
Please let me know if you have any questions or concerns. Thanks.
--
Best Regards,
Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com || http://community.redha
Hey cephers,
It has been quite a while since I distilled the highlights of what is
going on in the community into a single post, so I figured it was long
overdue. Please check out the latest Ceph.com blog and some of the
many great things that are on our short-term radar at the moment:
http://cep
Does this support rbd images with stripe count > 1?
If yes, then this is also a solution for this problem:
http://tracker.ceph.com/issues/3837
Thanks,
Dyweni
On 2016-01-14 13:27, Bill Sanders wrote:
Is there some information about rbd-nbd somewhere? If it has feature
parity with librbd
Thank you for the reply Zheng
We tried set mds bal frag to true, but the end result was less than
desirable. All nfs and smb clients could no longer browse the share, they
would hang on a directory with anything more than a few hundred files.
We then tried to back out the active/active mds change
Is there some information about rbd-nbd somewhere? If it has feature
parity with librbd and is easier to maintain, will this eventually
deprecate krbd? We're using the RBD kernel client right now, and so
this looks like something we might want to explore at my employer.
Bill
On Thu, Jan 14, 201
Probably worth filing a bug. Make sure to include the usual stuff:
1) version
2) logs from a crashing osd
For this one, it would also be handy if you used gdb to dump the
thread backtraces for an osd which is experiencing "an increase of
approximately 230-260 threads for every other OSD node"
-Sa
We went to 3 copies because 2 isn't safe enough for the default. With 3
copies and a properly configured system your data is approximately as safe
as the data center it's in. With 2 copies the durability is a lot lower
than that (two 9s versus four 9s or something). The actual safety numbers
did no
There's not a great unified tracking solution, but newer MDS code has admin
socket commands to dump client sessions. Look for those.
This question is good for the user list, but if you can't send mail to dev
lost you're probably using HTML email or something. vger.kernel.org has
some pretty strict
Try using "id=client.my_user". It's not taking daemonize arguments because
auto-mount in fstab requires the use of CLI arguments (of which daemonize
isn't a member), IIRC.
-Greg
On Wednesday, January 6, 2016, Florent B wrote:
> Hi everyone,
>
> I have a problem with ceph-fuse on Debian Jessie.
>
It sounds like you *didn't* change the fsid for the existing osd/mon
daemons since you say there gettin refused. So I think you created a new
"cluster" of just the one monitor, and your client is choosing to connect
to it first. If that's the case, killing that monitor and creating it
properly will
On Thu, Jan 14, 2016 at 7:37 AM, Sage Weil wrote:
> This development release includes a raft of changes and improvements for
> Jewel. Key additions include CephFS scrub/repair improvements, an AIX and
> Solaris port of librados, many librbd journaling additions and fixes,
> extended per-pool optio
Thank you very much, Jason!
I've updated the ticket with new data, but I'm not sure if I attached
logs correctly. Please let me know if anything more is needed.
2016-01-14 23:29 GMT+08:00 Jason Dillaman :
> I would need to see the log from the point where you've frozen the disks
> until the poin
This development release includes a raft of changes and improvements for
Jewel. Key additions include CephFS scrub/repair improvements, an AIX and
Solaris port of librados, many librbd journaling additions and fixes,
extended per-pool options, and NBD driver for RBD (rbd-nbd) that allows
librbd
I would need to see the log from the point where you've frozen the disks until
the point when you attempt to create a snapshot. The logs below just show
normal IO.
I've opened a new ticket [1] where you can attach the logs.
[1] http://tracker.ceph.com/issues/14373
--
Jason Dillaman
-
On Thu, Jan 14, 2016 at 12:50 AM, Kostis Fardelas wrote:
> Hello cephers,
> after being on 0.80.10 for a while, we upgraded to 0.80.11 and we
> noticed the following things:
> a. ~13% paxos refresh latency increase (from about 0.015 to 0.017 on average)
> b. ~15% paxos commit latency ( from 0.019
2016-01-14 11:25 GMT+02:00 Magnus Hagdorn :
> On 13/01/16 13:32, Andy Allan wrote:
>
>> On 13 January 2016 at 12:26, Magnus Hagdorn
>> wrote:
>>
>>> Hi there,
>>> we recently had a problem with two OSDs failing because of I/O errors of
>>> the
>>> underlying disks. We run a small ceph cluster wit
On 13/01/16 13:32, Andy Allan wrote:
On 13 January 2016 at 12:26, Magnus Hagdorn wrote:
Hi there,
we recently had a problem with two OSDs failing because of I/O errors of the
underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in
total. All 3 nodes are dell poweredge r515 ser
Hello cephers,
after being on 0.80.10 for a while, we upgraded to 0.80.11 and we
noticed the following things:
a. ~13% paxos refresh latency increase (from about 0.015 to 0.017 on average)
b. ~15% paxos commit latency ( from 0.019 to 0.022 on average)
c. osd commitcycle latencies were decreased and
37 matches
Mail list logo