Upon further investigation, it looks like this part of the ceph logrotate
script is causing me the problem:
if [ -e "/var/lib/ceph/$daemon/$f/done" ] && [ -e
"/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e
"/var/lib/ceph/$daemon/$f/sysvinit" ]; then
I don't have a "done" file in the mounted direct
Dear list,
we are new to ceph and we are planning to install a ceph cluster over
two datacenters.
The situation is:
DC1: 2 racks
DC2: 1 racks
We want to have one replica per rack and more generally two replicas in
the first DC and one in the other one.
So now we are stuck on the crushmap: how to
> We want to have one replica per rack and more generally two replicas in
> the first DC and one in the other one.
> So now we are stuck on the crushmap: how to force the cluster to put two
> replicas in the first dc?
> Is that related to th bucket's weight?
You can fix that in the crush map bucke
Joshua it looks like you got Ceph from EPEL (that version has the '-2'
slapped on it). And it is why you are seeing this
for ceph:
ceph-0.80.1-2.el6.x86_64
And this for others:
libcephfs1-0.80.1-0.el6.x86_64
Make sure that you do get Ceph from our repos. Newer versions of
ceph-deploy fix this b
On Thu, 10 Jul 2014, Joshua McClintock wrote:
> { "rule_id": 1,
> "rule_name": "erasure-code",
> "ruleset": 1,
> "type": 3,
The presence of the erasure code CRUSH rules it what is preventing the
kernel client from mounting. Upgrade to a newer kernel (3.14 I
On Fri, 11 Jul 2014, James Eckersall wrote:
> Upon further investigation, it looks like this part of the ceph logrotate
> script is causing me the problem:
>
> if [ -e "/var/lib/ceph/$daemon/$f/done" ] && [ -e
> "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e
> "/var/lib/ceph/$daemon/$f/sysvinit" ]
On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just wrote:
> It could be an indication of a problem on osd 5, but the timing is
> worrying. Can you attach your ceph.conf?
Attached.
> Have there been any osds
> going down, new osds added, anything to cause recovery?
I upgraded to firefly last week
Hi Sage,
Many thanks for the info.
I have inherited this cluster, but I believe it may have been created with
mkcephfs rather than ceph-deploy.
I'll touch the done files and see what happens. Looking at the logic in
the logrotate script I'm sure this will resolve the problem.
Thanks
J
On 11
Thanks Sage! What had happened prior to me upgrading was that I added an
erasure coded pool, but all my OSDs began to crash. The ec profile didn't
seem to cause the crash, so I left it, but once I removed the pool, the
crashes stopped.
Do you guys want any of the core dumps, or is anything short
When you get the next inconsistency, can you copy the actual objects
from the osd store trees and get them to us? That might provide a
clue.
-Sam
On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith wrote:
>
>
>
> On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just wrote:
>>
>> It could be an indication of a
One other thing we might also try is catching this earlier (on first read
of corrupt data) instead of waiting for scrub. If you are not super
performance sensitive, you can add
filestore sloppy crc = true
filestore sloppy crc block size = 524288
That will track and verify CRCs on any large (
Hello Alfredo, isn't this what the 'ceph-release-1-0.el6.noarch' package is
for in my rpm -qa list? Here are the yum repo files I have in
/etc/yum.repos.d. I don't see any priorities in the ceph one which is
where libcephfs1 comes from I think. I tried to 'yum reinstall
ceph-release', but the fi
"Ceph is short of Cephalopod, a class of mollusks that includes the
octopus, squid, and cuttlefish"
http://community.redhat.com/blog/2014/06/ceph-turns-10-a-look-back/
On Fri, 2014-07-11 at 10:48 -0700, Tuite, John E. wrote:
> Is Ceph an acronym? If yes, what?
>
>
>
> John Tuite
>
> Corpor
thanks
John Tuite
Corporate Global Infrastructure Services Pittsburgh
Manager, Information Technology
Global Hosting Services
Thermo Fisher Scientific
600 Business Center Drive
Pittsburgh, Pennsylvania 15205
Office 412-490-7292
Mobile 412-897-3401
Fax 412-490-9401
john.tu...@thermofisher.com
http:
Hi,
I am using the librados python api to do a lot of rapid reads on
objects. I am using a callback function def oncomplete(self,
completion,data_read) which is called whenever an object has been
read. Is there a way to identify which object has completed the read
request?
I want to measure the
Hello,
while recovering is running my virtual machine latencies using qemu rbd
goes up from 3-7ms to 250ms-350ms.
So all applications inside vms are slow.
I already have those
osd_recovery_max_active = 1
osd_max_backfills = 1
osd_recovery_op_priority = 5
osd_recover_clone_
Also, what filesystem are you using?
-Sam
On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil wrote:
> One other thing we might also try is catching this earlier (on first read
> of corrupt data) instead of waiting for scrub. If you are not super
> performance sensitive, you can add
>
> filestore slopp
Greetings,
I'm using xfs.
Also, when, in a previous email, you asked if I could send the object, do
you mean the files from each server named something like
this:
./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.000b__head_34DC35C6__3
?
On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just wr
Right.
-Sam
On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith wrote:
> Greetings,
>
> I'm using xfs.
>
> Also, when, in a previous email, you asked if I could send the object, do
> you mean the files from each server named something like this:
> ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.00
And grab the xattrs as well.
-Sam
On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just wrote:
> Right.
> -Sam
>
> On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith wrote:
>> Greetings,
>>
>> I'm using xfs.
>>
>> Also, when, in a previous email, you asked if I could send the object, do
>> you mean the files f
I have a four node ceph cluster for testing. As I'm watching the
relatively idle cluster I'm seeing quite a bit of traffic from one of
the OSD nodes to the monitor. This node has 8 OSDs and each of them are
involved in this behavior, but none of the other 24 OSDs located on the
other nodes are.
v0.80.3 Firefly
===
This is the third Firefly point release. It includes a single fix
for a radosgw regression that was discovered in v0.80.2 right after it
was released.
We recommand that all v0.80.x Firefly users upgrade.
Notable Changes
---
* radosgw: fix regression
Anybody knows this issue? thanks.
Fri, 11 Jul 2014 10:26:47 +0800 from Yonghua Peng :
>Hi,
>
>I try to create a qemu image, but got failed.
>
>ceph@ceph:~/my-cluster$ qemu-img create -f rbd rbd:rbd/qemu 2G
>Formatting 'rbd:rbd/qemu', fmt=rbd size=2147483648 cluster_size=0
>qemu-img: error co
Greetings,
Well it happened again with two pgs this time, still in the same rbd image.
They are at http://people.adams.edu/~rbsmith/osd.tar. I think I grabbed the
files correctly. If not, let me know and I'll try again on the next
failure. It certainly is happening often enough.
On Fri, Jul 11,
24 matches
Mail list logo