> On 27/04/2015, at 15.51, Alexandre DERUMIER wrote:
>
> Hi, can you check on your ceph node
> /var/log/salt/minion ?
>
> I have had some similar problem, I have need to remove
>
> rm /etc/salt/pki/minion/minion_master.pub
> /etc/init.d/salt-minion restart
>
> (I don't known if "calamari-ctl
Hi,
I've deployed a small hammer cluster 0.94.1. And I mount it via
ceph-fuse on Ubuntu 14.04. After several hours I found that the ceph-fuse
process crashed. The end is the crash log from
/var/log/ceph/ceph-client.admin.log. The memory cost of ceph-fuse process
was huge(more than 4GB) when it
Just to add some more interesting behavior to my problem, is that monitors
are not updating the status of OSD's.
Even when I stop all the remaining OSD's, ceph osd tree shows them as up.
Also there the status of mons and mds doesn't seem to update correctly in my
opinion.
Below is a copy of statu
Hi: all
when I Configuring Federated Gateways?? I got the error as below:
sudo radosgw-agent -c /etc/ceph/ceph-data-sync.conf
ERROR:root:Could not retrieve region map from destination
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/radosgw_agent/cli.py", lin
On Sat, Apr 25, 2015 at 11:36 PM, Josef Johansson wrote:
> Hi,
>
> With inspiration from all the other performance threads going on here, I
> started to investigate on my own as well.
>
> I’m seeing a lot iowait on the OSD, and the journal utilised at 2-7%, with
> about 8-30MB/s (mostly around 8
This is the second (and possibly final) point release for Giant.
We recommend all v0.87.x Giant users upgrade to this release.
Notable Changes
---
* ceph-objectstore-tool: only output unsupported features when
incompatible (#11176 David Zafman)
* common: do not implicitly unlock r
Hello Alfredo / Craig
First of all Thank You So much for replying and giving your precious time
to this problem.
@Alfredo : I tried version radosgw-agent version 1.2.2 and the case has
progressed a lot. ( below are some the logs )
I am now getting
*2015-04-28 00:35:14,781 5132 [radosgw_agent][
How long are you thinking here?
We added more storage to our cluster to overcome these issues, and we
can't keep throwing storage at it until the issues are fixed.
On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote:
It will get to the ceph mainline eventually. We're still reviewing and testing
t
Hi Vickey (and all)
It looks like this issue was introduced as part of the 1.2.1 release.
I just finished getting 1.2.2 out (try upgrading please). You should no longer
see that
error.
Hope that helps!
-Alfredo
- Original Message -
From: "Craig Lewis"
To: "Vickey Singh"
Cc: ceph-use
> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-east-1
> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1
Are you trying to setup two zones on one cluster? That's possible, but
you'll also want to spend some time on your CRUSH map making sure that the
two zones are as ind
Hi
Updated the logfile, same place http://beta.xaasbox.com/ceph/ceph-osd.15.log
Br,
Tuomas
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: 27. huhtikuuta 2015 22:22
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Upgrade from Giant to Hamm
On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Hey
>
> Got the log, you can get it from
> http://beta.xaasbox.com/ceph/ceph-osd.15.log
Can you repeat this with 'debug osd = 20'? Thanks!
sage
>
> Br,
> Tuomas
>
>
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sen
Hey
Got the log, you can get it from
http://beta.xaasbox.com/ceph/ceph-osd.15.log
Br,
Tuomas
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: 27. huhtikuuta 2015 20:45
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to H
Yes, the tcmalloc patch we applied is not to solve the trace we are seeing. The
env varaiable code path was noop in the tcmalloc code base and the patch has
resolved that. Now, setting the env variable is taking effect within tcmalloc
code base.
Now, this thread cache env variable is a performan
Hi
Thank you so much,
Here's the other json file, I'll check and install that and get the logs
asap too. There has not been any snaps on rbd, I haven't used it at all, it
has been just an empty pool.
Br,
Tuomas
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: 27. huh
Yeah, no snaps:
images:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 17882,
"pool_snaps": [],
"removed_snaps": "[]",
img:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
Hi Somnath,
Forgive me as I think this was discussed earlier in the thread, but did
we confirm that the patch/fix/etc does not 100% fix the problem?
Mark
On 04/27/2015 12:25 PM, Somnath Roy wrote:
Alexandre,
The moment you restarted after hitting the tcmalloc trace, irrespective of what
val
Alexandre,
The moment you restarted after hitting the tcmalloc trace, irrespective of what
value you set as thread cache, it will perform better and that's what happening
in your case I guess.
Yes, setting this value kind of tricky and very much dependent on your
setup/workload etc.
I would sugg
Ok, just to make sure that I understand:
>>tcmalloc un-tuned: ~50k IOPS once bug sets in
yes, it's really random, but when hitting the bug, yes this is the worste I
have seen.
>>tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
yes
>>jemalloc un-tuned: ~150k IOPS
It's more around 185
Hi
Here you go
Br,
Tuomas
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: 27. huhtikuuta 2015 19:23
To: Tuomas Juntunen
Cc: 'Samuel Just'; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OS
On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet?
yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had a
On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Thanks for the info.
>
> For my knowledge there was no snapshots on that pool, but cannot verify
> that.
Can you attach a 'ceph osd dump -f json-pretty'? That will shed a
bit more light on what happened (and the simplest way to fix it).
sage
>
Hi Sage, Alexandre et al.
Here's another data point... we noticed something similar awhile ago.
After we restart our OSDs the "4kB object write latency" [1]
temporarily drops from ~8-10ms down to around 3-4ms. Then slowly over
time the latency increases back to 8-10ms. The time that the OSDs stay
It will get to the ceph mainline eventually. We're still reviewing and testing
the fix, and there's more work to be done on the cleanup tool.
Yehuda
- Original Message -
> From: "Ben"
> To: "Yehuda Sadeh-Weinraub"
> Cc: "ceph-users"
> Sent: Sunday, April 26, 2015 11:02:23 PM
> Subject
Hi all
The issue is resolved after upgrading Ceph from Giant to Hammer(0.94.1)
cheers
K.Mohamed Pakkeer
On Sun, Apr 26, 2015 at 11:28 AM, Mohamed Pakkeer
wrote:
> Hi
>
> I was doing some testing on erasure coded based CephFS cluster. cluster
> is running with giant 0.87.1 release.
>
>
>
> Clu
On Mon, 27 Apr 2015, Alexandre DERUMIER wrote:
> >>If I want to use librados API for performance testing, are there any
> >>existing benchmark tools which directly accesses librados (not through
> >>rbd or gateway)
>
> you can use "rados bench" from ceph packages
>
> http://ceph.com/docs/maste
>>Is it possible that you were suffering from the bug during the first
>>test but once reinstalled you hadn't hit it yet?
yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had always this bug in random, but mainly when I have
Hello Cephers
Still waiting for your hep.
I tried sever things but no luck.
On Mon, Apr 27, 2015 at 9:07 AM, Vickey Singh
wrote:
> Any help with related to this problem would be highly appreciated.
>
> -VS-
>
>
> On Sun, Apr 26, 2015 at 6:01 PM, Vickey Singh > wrote:
>
>> Hello Geeks
>>
>>
Hi Alex,
Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet? That's a pretty major
performance swing. I'm not sure if we can draw any conclusions about
jemalloc vs tcmalloc until we can figure out what went wrong.
Mark
On 0
I can sacrifice the images and img pools, if that is necessary.
Just need to get the thing going again
Tuomas
-Original Message-
From: Samuel Just [mailto:sj...@redhat.com]
Sent: 27. huhtikuuta 2015 15:50
To: tuomas juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgra
Hi, can you check on your ceph node
/var/log/salt/minion
?
I have had some similar problem, I have need to remove
rm /etc/salt/pki/minion/minion_master.pub
/etc/init.d/salt-minion restart
(I don't known if "calamari-ctl clear" change the salt master key)
- Mail original -
De: "Stef
Thanks for the info.
For my knowledge there was no snapshots on that pool, but cannot verify that.
Any way to make this work again?
Removing the tier and other settings didn't fix it, I tried it the second this
happened.
Br,
Tuomas
-Original Message-
From: Samuel Just [mailto:sj...@red
I have had been trying to configure my radosgw-agent on RHEL 6.5, but after a
recent reboot of the gateway node discovered that the file needed by the
/etc/init.d/radosgw-agent script has disappeared
(/etc/ceph/radosgw-agent/default.conf). As a result, I can no longer start up
the radosgw.
H
Hi, Nikola.
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19152.html
2015-04-27 14:17 GMT+03:00 Nikola Ciprich :
> Hello Somnath,
> > Thanks for the perf data..It seems innocuous..I am not seeing single
> tcmalloc trace, are you running with tcmalloc by the way ?
>
> according to ldd
All,
After successfully upgrading from Giant to Hammer, at first our Calamari server
seems fine, showing the new too many PGs. then during/after
removing/consolidating various pool, it failed to get updated, Haven’t been
able to find any RC, I decided to flush the Postgress DB (calamari-ctl cle
So, the base tier is what determines the snapshots for the cache/base pool
amalgam. You added a populated pool complete with snapshots on top of a base
tier without snapshots. Apparently, it caused an existential crisis for the
snapshot code. That's one of the reasons why there is a --force-n
On Mon, Apr 27, 2015 at 3:42 PM, Burkhard Linke
wrote:
> Hi,
>
> I've deployed ceph on a number of nodes in our compute cluster (Ubuntu 14.04
> Ceph Firefly 0.80.9). /ceph is mounted via ceph-fuse.
>
> From time to time some nodes loose their access to cephfs with the following
> error message:
>
The following:
ceph osd tier add img images --force-nonempty
ceph osd tier cache-mode images forward
ceph osd tier set-overlay img images
Idea was to make images as a tier to img, move data to img then change clients
to use the new img pool.
Br,
Tuomas
> Can you explain exactly what you mean
Can you explain exactly what you mean by:
"Also I created one pool for tier to be able to move data without outage."
-Sam
- Original Message -
From: "tuomas juntunen"
To: "Ian Colle"
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:23:44 AM
Subject: Re: [ceph-users] Upgrade
Hi
Any solution for this yet?
Br,
Tuomas
> It looks like you may have hit http://tracker.ceph.com/issues/7915
>
> Ian R. Colle
> Global Director
> of Software Engineering
> Red Hat (Inktank is now part of Red Hat!)
> http://www.linkedin.com/in/ircolle
> http://www.twitter.com/ircolle
> Cell: +1.
It looks like you may have hit http://tracker.ceph.com/issues/7915
Ian R. Colle
Global Director
of Software Engineering
Red Hat (Inktank is now part of Red Hat!)
http://www.linkedin.com/in/ircolle
http://www.twitter.com/ircolle
Cell: +1.303.601.7713
Email: ico...@redhat.com
- Original Messag
Hello Somnath,
> Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc
> trace, are you running with tcmalloc by the way ?
according to ldd, it seems I have it compiled in, yes:
[root@vfnphav1a ~]# ldd /usr/bin/ceph-osd
.
.
libtcmalloc.so.4 => /usr/lib64/libtcmalloc.so.4 (
On 04/23/2015 06:58 PM, Craig Lewis wrote:
Yes, unless you've adjusted:
[global]
mon osd min down reporters = 9
mon osd min down reports = 12
OSDs talk to the MONs on the public network. The cluster network is
only used for OSD to OSD communication.
If one OSD node can't talk on that n
Well yes “pretty much” the same thing :).
I think some people would like to distinguish recovery from replication and
maybe perform some QoS around these 2.
We have to replicate while recovering so one can impact the other.
In the end, I just think it’s a doc issue, still waiting for a dev to ans
I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
Then created new pools and deleted some old ones. Also I created one pool for
tier to be able to move data without outage.
After these operations all but 10 OSD's are down and creating this kind of
messages to logs, I get more than 100gb of these
Hi,
I've deployed ceph on a number of nodes in our compute cluster (Ubuntu
14.04 Ceph Firefly 0.80.9). /ceph is mounted via ceph-fuse.
From time to time some nodes loose their access to cephfs with the
following error message:
# ls /ceph
ls: cannot access /ceph: Transport endpoint is not co
46 matches
Mail list logo