ceph version 0.80.1
System: CentOS 6.5
[root@dn1 osd.6]# mount
/dev/sde1 on /cache4 type ext4 (rw,noatime,user_xattr) —— osd.6
/dev/sdf1 on /cache5 type ext4 (rw,noatime,user_xattr) —— osd.7
/dev/sdg1 on /cache6 type ext4 (rw,noatime,user_xattr) —— osd.8
/dev/sdh1 on /cache7 type ext4 (rw,noatime
ceph 0.80.1
The same quesiton.
I have deleted 1/4 data, but the problem didn't disappear
Does anyone have other way to solve it?
At 2015-01-10 05:31:30,"Udo Lembke" wrote:
>Hi,
>I had an similiar effect two weeks ago - 1PG backfill_toofull and due
>reweighting and delete there was enough free s
Looking further, i guess what i tried to tell was a simplified version of
sharded threadpools, released in giant. Is it possible for that to be
backported to firefly?
On Tue, Mar 3, 2015 at 9:33 AM, Erdem Agaoglu
wrote:
> Thank you folks for bringing that up. I had some questions about sharding.
Ah yes, that's a good point :-)
Thank you for your assistance Greg, I'm understanding a little more about how
Ceph operates under the hood now.
We're probably at a reasonable point for me to say I'll just switch the
machines off and forget about them for a while. It's no great loss; I just
wan
HI Guys,
I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over
37% od the data to rebalance - let's say this is fine (this is when I
removed it frm Crush Map).
I'm wondering - I have previously set some throtling mechanism, but during
first 1h of rebalancing, my rate of reco
Hello, all
I have ceph+RGW installation. And have some problems with "shadow" objects.
For example:
#rados ls -p .rgw.buckets|grep "default.4507.1"
.
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.1_5
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.2_2
defa
Hi.
Use value "osd_recovery_delay_start"
example:
[root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
config show | grep osd_recovery_delay_start
"osd_recovery_delay_start": "10"
2015-03-03 13:13 GMT+03:00 Andrija Panic :
> HI Guys,
>
> I yesterday removed 1 OSD from cluste
Hi!
After realizing the problem with log rotation (see
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
and fixing it, I now for the first time have some
meaningful (and recent) logs to look at.
While from an application perspective there seem
to be no issues, I would like to und
Thanks Irek.
Does this mean, that after peering for each PG, there will be delay of
10sec, meaning that every once in a while, I will have 10sec od the cluster
NOT being stressed/overloaded, and then the recovery takes place for that
PG, and then another 10sec cluster is fine, and then stressed ag
Another question - I mentioned here 37% of objects being moved arround -
this is MISPLACED object (degraded objects were 0.001%, after I removed 1
OSD from cursh map (out of 44 OSD or so).
Can anybody confirm this is normal behaviour - and are there any
workarrounds ?
I understand this is because
osd_recovery_delay_start - is the delay in seconds between iterations
recovery (osd_recovery_max_active)
It is described here:
https://github.com/ceph/ceph/search?utf8=%E2%9C%93&q=osd_recovery_delay_start
2015-03-03 14:27 GMT+03:00 Andrija Panic :
> Another question - I mentioned here 37% of ob
A large percentage of the rebuild of the cluster map (But low percentage
degradation). If you had not made "ceph osd crush rm id", the percentage
would be low.
In your case, the correct option is to remove the entire node, rather than
each disk individually
2015-03-03 14:27 GMT+03:00 Andrija Panic
Hi Irek,
yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded
and moved/recovered.
When I after that removed it from Crush map "ceph osd crush rm id", that's
when the stuff with 37% happened.
And thanks Irek for help - could you kindly just let me know of the
prefered steps
You have a number of replication?
2015-03-03 15:14 GMT+03:00 Andrija Panic :
> Hi Irek,
>
> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
> degraded and moved/recovered.
> When I after that removed it from Crush map "ceph osd crush rm id",
> that's when the stuff with 37% hap
Once you have only three nodes in the cluster.
I recommend you add new nodes to the cluster, and then delete the old.
2015-03-03 15:28 GMT+03:00 Irek Fasikhov :
> You have a number of replication?
>
> 2015-03-03 15:14 GMT+03:00 Andrija Panic :
>
>> Hi Irek,
>>
>> yes, stoping OSD (or seting it to
Hi,
I have a problem with timestamps of objects created in Rados Gateway.
Timestamps are supposed to be in UTC timezone but instead I have strange
offset shift.
Server with Rados Gateway use MSK timezone (GMT +3). NTP is set, up and
running correctly. Rados Gateway and Ceph has no objects (usage
Thx Irek. Number of replicas is 3.
I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
decommissioned), which is further connected to a new 10G switch/network
with 3 servers on it with 12 OSDs each.
I'm decommissioning old 3 nodes on 1G network...
So you suggest removing whole node w
Hi,
I have a problem when i will remove a empty directory in cephfs. The
directory is empty, but it seems have files crashed in MDS.
*$ls test-daniel-old/*
total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar 2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar 2 1
librbd caches data at a buffer / block level. In a simplified example, if you
are reading and writing random 4K blocks, the librbd cache would store only
those individual 4K blocks. Behind the scenes, it is possible for adjacent
block buffers to be merged together within the librbd cache. The
Hi,
I am attempting to test the cephfs filesystem layouts.
I created a user with rights to write only in one pool :
client.puppet
key:zzz
caps: [mon] allow r
caps: [osd] allow rwx pool=puppet
I also created another pool in which I would assume this user is allowed to do
Your procedure appears correct to me. Would you mind re-running your cloned
image VM with the following ceph.conf properties:
[client]
rbd cache off
debug rbd = 20
log file = /path/writeable/by/qemu.$pid.log
If you recreate the issue, would you mind opening a ticket at
http://tracker.ceph.com/
On 03/03/2015 15:21, SCHAER Frederic wrote:
By the way : looks like the “ceph fs ls” command is inconsistent when
the cephfs is mounted (I used a locally compiled kmod-ceph rpm):
[root@ceph0 ~]# ceph fs ls
name: cephfs_puppet, metadata pool: puppet_metadata, data pools: [puppet ]
(umount
Hi all,
In my reading on the net about various implementations of Ceph, I came
across this website blog page (really doesn't give a lot of good
information but caused me to wonder):
http://avengermojo.blogspot.com/2014/12/cubieboard-cluster-ceph-test.html
near the bottom, the person did a rados
Hi all,
what happens to data contained in an rbd image when the image itself gets
deleted?
Are the data just unlinked or are them destroyed in a way that make them
unreadable?
thanks
Giuseppe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://
On 03/03/2015 14:07, Daniel Takatori Ohara wrote:
*$ls test-daniel-old/*
total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar 2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar 2 11:41 ../
*$rm -rf test-daniel-old/*
rm: cannot remove ‘test-daniel-old/’: Directory
Hi,
I have ceph cluster that is contained within a rack (1 Monitor and 5 OSD
nodes). I kept the same public and private address for configuration.
I do have 2 NICS and 2 valid IP addresses (one internal only and one external)
for each machine.
Is it possible now, to change the Public Network add
On Tue, Mar 3, 2015 at 9:24 AM, John Spray wrote:
> On 03/03/2015 14:07, Daniel Takatori Ohara wrote:
>
> $ls test-daniel-old/
> total 0
> drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar 2 10:52 ./
> drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar 2 11:41 ../
>
> $rm -rf test
Hi All,
I have a cluster that I've been pushing data into in order to get an idea
of how full it can get prior ceph marking the cluster full. Unfortunately,
each time I fill the cluster I end up with one disk that typically hits the
full ratio (0.95) while all other disks still have anywhere from
Loic,
Thank you, I got it created. One of these days, I am going to have to try to
understand some of the crush map details... Anyway, on to the next step!
-don-
--
The information contained in this transmission may be confid
I would be inclined to shut down both OSDs in a node, let the cluster
recover. Once it is recovered, shut down the next two, let it recover.
Repeat until all the OSDs are taken out of the cluster. Then I would
set nobackfill and norecover. Then remove the hosts/disks from the
CRUSH then unset nobac
Hi John and Gregory,
The version of ceph client is 0.87 and the kernel is 3.13.
The debug logs here in attach.
I see this problem in a older kernel, but i didn't find the solution in the
track.
Thanks,
Att.
---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Onc
Hello,
I've been playing with backing up images from my production site
(running 0.87) to my backup site (running 0.87.1) using export/import
and export-diff/import-diff. After initially exporting and importing the
image (rbd/small to backup/small) I took a snapshot (called test1) on
the productio
I did a bit more testing.
1. I tried on a newer kernel and was not able to recreate the problem,
maybe it is that kernel bug you mentioned. Although its not an exact
replica of the load.
2. I haven't tried the debug yet since I have to wait for the right moment.
One thing I realized and maybe it i
After change the ownership of the log file directory everything became fine.
Thanks for your help
Regards.
Italo Santos
http://italosantos.com.br/
On Tuesday, March 3, 2015 at 00:35, zhangdongmao wrote:
> I have met this before.
> Because I use apache with rgw, radosrgw is executed by the
Hello everyone,
I have a cluster with 5 hosts and 18 OSDs, today I faced with a unexpected
issue when multiple OSD goes down.
The first OSD go down, was osd.8, feel minutes after, another OSD goes down on
the same host, the osd.1. So, I tried restart the OSDs (osd.8 and osd.1) but
doesn’t work
I had to go through the same experience of changing the public network
address and it's not easy. Ceph seems to keep a record of what ip
address is associated to what OSD and a port number for the process. I
was never able to find out where this record is kept or how to change it
manually. Her
Snapshots are read-only, so all changes to the image can only be applied to the
HEAD revision.
In general, you should take a snapshot prior to export / export-diff to ensure
consistent images:
rbd snap create rbd/small@snap1
rbd export rbd/small@snap1 ./foo
rbd import ./foo backup/small
Jason,
Ah, ok that makes sense. I was forgetting snapshots are read-only. Thanks!
My plan was to do something like this. First, create a sync snapshot and
seed the backup:
rbd snap create rbd/small@sync
rbd export rbd/small@sync ./foo
rbd import ./foo backup/small
rbd snap create backup/small@s
I was testing a little bit more and decided to run the cephfs-journal-tool
I ran across some errors
$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
(0x2aebf6) at 0x2aeb32279b
2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr
(0
On 03/03/2015 22:35, Scottix wrote:
I was testing a little bit more and decided to run the cephfs-journal-tool
I ran across some errors
$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
(0x2aebf6) at 0x2aeb32279b
2015-03-03 14:18:54.539060
On 03/03/2015 22:57, John Spray wrote:
On 03/03/2015 22:35, Scottix wrote:
I was testing a little bit more and decided to run the
cephfs-journal-tool
I ran across some errors
$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
(0x2aebf6) a
Le 03/03/2015 22:03, Italo Santos a écrit :
I realised that when the first OSD goes down, the cluster was
performing a deep-scrub and I found the bellow trace on the logs of
osd.8, anyone can help me understand why the osd.8, and other osds,
unexpected goes down?
I'm afraid I've seen thi
Ya we are not at 0.87.1 yet, possibly tomorrow. I'll let you know if it
still reports the same.
Thanks John,
--Scottie
On Tue, Mar 3, 2015 at 2:57 PM John Spray wrote:
> On 03/03/2015 22:35, Scottix wrote:
> > I was testing a little bit more and decided to run the
> cephfs-journal-tool
> >
> >
Hi,
This is just a heads up that we've identified a performance regression in
v0.80.8 from previous firefly releases. A v0.80.9 is working it's way
through QA and should be out in a few days. If you haven't upgraded yet
you may want to wait.
Thanks!
sage
_
On 03/03/2015 04:19 PM, Sage Weil wrote:
> Hi,
>
> This is just a heads up that we've identified a performance regression in
> v0.80.8 from previous firefly releases. A v0.80.9 is working it's way
> through QA and should be out in a few days. If you haven't upgraded yet
> you may want to wait
someone in this DL had the thread error?
Checking for unpackaged file(s): /usr/lib/rpm/check-files
/home/vagrant/rpmbuild/BUILDROOT/calamari-server-1.3-rc_23_g4c41db3.el7.x86_64
Wrote:
/home/vagrant/rpmbuild/RPMS/x86_64/calamari-server-1.3-rc_23_g4c41
Does kernel client affected by the problem ?
Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
> Hi,
>
> This is just a heads up that we've identified a performance regression in
> v0.80.8 from previous firefly releases. A v0.80.9 is working it's way
> through QA and should be out in a
Hi Ceph,
Last week-end I discussed with a friend about a use case many of us thought
about already: it would be cool to have a simple way to assemble Ceph aware NAS
fresh from the store. I summarized the use case and interface we discussed here
:
https://wiki.ceph.com/Clustering_a_few_NAS_i
Hi Yann,
That seems related to http://tracker.ceph.com/issues/10536 which seems to be
resolved. Could you create a new issue with a link to 10536 ? More logs and
ceph report would also be useful to figure out why it resurfaced.
Thanks !
On 04/03/2015 00:04, Yann Dupont wrote:
>
> Le 03/03/20
On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
> Does kernel client affected by the problem ?
Nope. The kernel client is unaffected.. the issue is in librbd.
sage
>
> Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
> > Hi,
> >
> > This is just a heads up that we've identified a perform
Le mardi 03 mars 2015 à 16:32 -0800, Sage Weil a écrit :
> On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
> > Does kernel client affected by the problem ?
>
> Nope. The kernel client is unaffected.. the issue is in librbd.
>
> sage
>
Ok, thanks for the clarification.
So I have to dig !
___
51 matches
Mail list logo