Thanks, Gregory!
My Ceph version is 0.94.1. What I'm trying to test is the worst situation
when the node is loosing network or becomes inresponsive. So what i do is
"killall -9 ceph-osd", then reboot.
Well, I also tried to do a clean reboot several times (just a "reboot"
command), but i saw no di
Can you give some more insight about the ceph cluster you are running ?
It seems IO started and then no response..cur MB/s is becoming 0s..
What is ‘ceph –s’ output ?
Hope all the OSDs are up and running..
Thanks & Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Be
On Tue, May 12, 2015 at 11:39 PM, Vasiliy Angapov wrote:
> Hi, colleagues!
>
> I'm testing a simple Ceph cluster in order to use it in production
> environment. I have 8 OSDs (1Tb SATA drives) which are evenly distributed
> between 4 nodes.
>
> I'v mapped rbd image on the client node and started
Hi Christian,
currently we do get good discounts as an University and the bundles were
worth it.
The chassis do have multiple PSUs and n 10Gb Ports (40Gb is possible).
The switch connection is redundant.
Cuurrently we think of 10 SATA OSD nodes + x SSD Cache Pool Nodes and 5
MONs. For a start.
Cache tier will definitely cause lot of WA irrespective of you use replication
or EC. There are some improvement coming in Infarnalis time and hope that will
help in WA.
Sorry, I am yet to use Cache tier , so, I don't have the data in terms of
performance/WA.
Will keep community posted once I ge
Hi, colleagues!
I'm testing a simple Ceph cluster in order to use it in production
environment. I have 8 OSDs (1Tb SATA drives) which are evenly distributed
between 4 nodes.
I'v mapped rbd image on the client node and started writing a lot of data
to it. Then I just reboot one node and see what'
Hello,
On Wed, 13 May 2015 06:11:25 + Somnath Roy wrote:
> Christian,
> EC pool is not supporting overwrites/partial writes and thus not
> supported (directly) with block/file interfaces. Did you put Cache tier
> in-front for your test with fio ?
>
No, I never used EC and/or cache-tiers.
T
Christian,
EC pool is not supporting overwrites/partial writes and thus not supported
(directly) with block/file interfaces.
Did you put Cache tier in-front for your test with fio ?
Thanks & Regards
Somnath
-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Tuesday, M
Hi, guys,
We have been running an OpenStack Havana environment with Ceph 0.72.2 as
block storage backend. Recently we were trying to upgrade OpenStack to
Juno. For testing, we deployed a Juno all-in-one node, this node share the
same Cinder volume rbd pool and Glance image rbd pool with the old Ha
Ideally I would like everything in /var/log/calmari
be sure to set calamari.conf like so:
[shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf
log_level = DEBUG
db_log_level = DEBUG
log_level = DEBUG
then restart cthulhu and apache
visit http://essperf3/api/v2/cluster
and http://essper
Hello,
On Tue, 12 May 2015 15:28:28 + Somnath Roy wrote:
> Hi Christian,
> Wonder why are you saying EC will write more data than replication ?
There are 2 distinct things here to look at.
1. One is the overhead (increasing with smaller blocks) created by Ceph
(and the filesystem) as per m
/var/log/salt/minion doesn't really look very interesting after that sequence.
I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much
more interesting when clear calamari and stop salt-minion. Looking at the
endpoints from http://essperf2/api/v2/cluster doesn't show anyth
Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are
relatively empty. Possibly Cthulhu from the master?
I'm running on Ubuntu 14.04 and don't have an httpd service. I had been
start/stopping apache2. Likewise there is no supervisord service and I've been
using supervi
All that looks fine.
There must be some state where the cluster is known to calamari and it is
failing to actually show it.
If you have time to debug I would love to see the logs at debug level.
If you don’t we could try cleaning out calamari’s state.
sudo supervisorctl shutdown
sudo service ht
Master was ess68 and now it's essperf3.
On all cluster nodes the following files now have 'master: essperf3'
/etc/salt/minion
/etc/salt/minion/calamari.conf
/etc/diamond/diamond.conf
The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \*
test.ping' from essperf3 Calamar
Bruce,
It is great to hear that salt is reporting status from all the nodes in the
cluster.
Let me see if I understand your question:
You want to know what conditions cause us to recognize a working cluster?
see
https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/manager.py#L
I opened issue #11604, and have a fix for the issue. I updated our test suite
to cover the specific issue that you were hitting. We'll backport the fix to
both hammer and firefly soon.
Thanks!
Yehuda
- Original Message -
> From: "Yehuda Sadeh-Weinraub"
> To: "Mark Murphy"
> Cc: ceph-u
Increasing the audience since ceph-calamari is not responsive. What salt
event/info does the Calamari Master expect to see from the ceph-mon to
determine there is an working cluster? I had to change servers hosting the
calamari master and can't get the new machine to recognize the cluster. The
please give me some advice, thanks
在 刘俊 <316828...@qq.com>,2015年5月13日 上午12:29写道:no,i set up replication between two clusters,each cluster has one zone, both clusters are in the same region. but i got some errors.
在 Craig Lewis ,2015年5月13日 上午12:02写道:Are you trying to setup replication on one clus
I know Wido was looking to potentially get on the ballot here, so am
including him.
I'm also including ceph-user and ceph-devel in case anyone else had
strong feelings about leading the Ceph User Committee (CUC). Thanks.
As Eric mentioned. If you are interested in taking on the role and
responsib
Thanks for the suggestions Greg. One thing I forgot to mention, restarting
the main MDS service fixes the problem temporarily.
Clearing inodes and dentries on the client with "echo 2 | sudo tee
/proc/sys/vm/drop_caches" on the two cephfs clients that were failing to
respond to cache pressure fixed
On Tue, May 12, 2015 at 11:36 PM, Chad William Seys
wrote:
>> No, pools use crush rulesets. "straw" and "straw2" are bucket types
>> (or algorithms).
>>
>> As an example, if you do "ceph osd crush add-bucket foo rack" on
>> a cluster with firefly tunables, you will get a new straw bucket. The
>>
> No, pools use crush rulesets. "straw" and "straw2" are bucket types
> (or algorithms).
>
> As an example, if you do "ceph osd crush add-bucket foo rack" on
> a cluster with firefly tunables, you will get a new straw bucket. The
> same after doing "ceph osd crush tunables hammer" will get you a
On Tue, May 12, 2015 at 11:16 PM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> What is the difference between straw and straw2 buckets? Should we consider
> "upgrading" to straw2 buckets by dumping the CRUSH map and updating them?
Well, straw bucket was supposed t
Nick thanks for your feedback.
Please find my response inline.
Regards
Somnath
-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: Tuesday, May 12, 2015 1:02 PM
To: Somnath Roy; 'Christian Balzer'; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] EC backend benchmark
Hi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
What is the difference between straw and straw2 buckets? Should we
consider "upgrading" to straw2 buckets by dumping the CRUSH map and
updating them?
-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com
wsFcB
On Tue, May 12, 2015 at 10:38 PM, Chad William Seys
wrote:
> Hi Ilya and all,
> Thanks for explaining.
> I'm confused about what "building" a crushmap means.
> After running
> #ceph osd crush tunables hammer
> data migrated around the cluster, so something changed.
>
Hi Somanth,
Firstly, thank you for sharing these results.
I suspect you are struggling to saturate anything due to the effects of
serial latency, have you tried scaling the clients above 8?
I noticed a similar ceiling at a much albeit at a much lower performance
threshold when using 1Gb network
Hi,
Thank you for a very thorough investigation. See my comments below:
- Original Message -
> From: "Mark Murphy"
> To: "Yehuda Sadeh-Weinraub"
> Cc: "Sean Sullivan" , ceph-users@lists.ceph.com
> Sent: Tuesday, May 12, 2015 10:50:49 AM
> Subject: Re: [ceph-users] Civet RadosGW S3 not s
On Tue, May 12, 2015 at 12:03 PM, Cullen King wrote:
> I'm operating a fairly small ceph cluster, currently three nodes (with plans
> to expand to five in the next couple of months) with more than adequate
> hardware. Node specs:
>
> 2x Xeon E5-2630
> 64gb ram
> 2x RAID1 SSD for system
> 2x 256gb
Hi!
I have found a way to change a pool ID for image parent:
list images:
# rbd ls volumes
5a4b167d-2588-4c06-904c-347abf91d788_disk.swap
volume-0ed965a0-53a5-4054-ad9c-3a432c8455d6
volume-1269b41a-4af0-499b-a16c-9bb6a5b98e70
volume-4094fbc1-9969-47aa-a0de-7026678b8e64
volume-5958295e-9623-4c46-
Hi Ilya and all,
Thanks for explaining.
I'm confused about what "building" a crushmap means.
After running
#ceph osd crush tunables hammer
data migrated around the cluster, so something changed.
I was expecting that 'straw' would be replaced by 'straw2'.
(Unfortuna
I'm operating a fairly small ceph cluster, currently three nodes (with
plans to expand to five in the next couple of months) with more than
adequate hardware. Node specs:
2x Xeon E5-2630
64gb ram
2x RAID1 SSD for system
2x 256gb SSDs for journals
4x 4tb drives for OSDs
1GbE for frontend (shared wi
[ Adding ceph-users to the CC ]
On Mon, May 11, 2015 at 8:22 PM, zhao.ming...@h3c.com
wrote:
> Hi:
>
> I'm learning CephFS recently, and now I have some question about it;
>
>
>
> 1. I've seen the typical configuration is 'single MDS',and found some
> resources from Internet which said 'singl
Hi
I am having this exact same problem, for more than a week. I have not found
a way to do this either.
Any help would be appreciated.
Basically all of our guests are now down, even though they are not in
production, we would still need to get the data out of them.
Br,
Tuomas
-Original Me
I am having a similar issue. The cluster is up and salt is running on and has
accepted keys from all nodes, including the monitor. I can issue salt and
salt/ceph.py commands from the Calamari including 'salt \* ceph.get_heartbeats'
which returns from all nodes including the monitor with the monm
Hey Yehuda,
I work with Sean on the dev side. We thought we should put together a short
report on what we’ve been seeing in the hopes that the behavior might make some
sense to you.
We had originally noticed these issues a while ago with our first iteration of
this particular Ceph deployment.
On Tue, May 12, 2015 at 8:37 PM, Chad William Seys
wrote:
> Hi Ilya and all,
> Is it safe to use kernel 3.16.7 rbd with Hammer tunables? I've tried
> this on a test Hammer cluster and the client seems to work fine.
> I've also mounted cephfs on a Hammer cluster (and Hammer tunable
Hi!
I have an RBD image (in pool "volumes"), made by openstack from parent image
(in pool "images").
Recently, I have tried to decrease number of PG-s, to avoid new Hammer warning.
I have copied pool "images" to another pool, deleted original pool and renamed
new pool to "images". Ceph allowed m
Hi Ilya and all,
Is it safe to use kernel 3.16.7 rbd with Hammer tunables? I've tried
this on a test Hammer cluster and the client seems to work fine.
I've also mounted cephfs on a Hammer cluster (and Hammer tunables)
using
kernel 3.16. It seems to work fine (but not much testi
Hello,
When storing large, multipart objects in the Ceph Object Store (~100 GB and
more), we have noticed that HEAD calls against the rados gateway for these
objects are excessively slow - in fact, they are about the same as doing a GET
on the object. Looking at the logs while this is occurring
On Tue, May 12, 2015 at 5:54 AM, Kenneth Waegeman
wrote:
>
>
> On 04/30/2015 07:50 PM, Gregory Farnum wrote:
>>
>> On Thu, Apr 30, 2015 at 2:03 AM, Kenneth Waegeman
>> wrote:
>>>
>>> So the cache is empty, but I get warning when I check the health:
>>> health HEALTH_WARN
>>> md
Moving this to ceph-user where it belongs for eyeballs and responses.
On Mon, May 11, 2015 at 10:39 PM, 张忠波 wrote:
> Hi
> When I run ceph-deploy , error will appear , "Error in sys.exitfunc: " .
> I find the same error message with me ,
> http://www.spinics.net/lists/ceph-devel/msg21388.html ,
no,i set up replication between two clusters,each cluster has one zone, both clusters are in the same region. but i got some errors.
在 Craig Lewis ,2015年5月13日 上午12:02写道:Are you trying to setup replication on one cluster right now?Generally replication is setup between two different clusters, eac
>If you run 'rbd info --pool RBD-01 CEPH_006__01__NA__0003__ESX__ALL_EXT',
what is the output?
size 2048 GB in 524288 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.19b1.238e1f29
format: 1
>Does 'rbd diff' work against the image (i.e. more than a few
Are you trying to setup replication on one cluster right now?
Generally replication is setup between two different clusters, each having
one zone. Both clusters are in the same region.
I can't think of a reason why two zones in one cluster wouldn't work. It's
more complicated to setup though.
In my case, I did remove all salt keys. The salt portion of my install is
working. It’s just that the calamari server is not seeing the ceph
cluster.
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235
On 5/12/15, 1:35 AM, "Alexandre DERUMIER" wrote:
>Hi, when
We've had a hammer (0.94.1) (virtual) 3 node/3 osd cluster with radosgws
failing to start, failing continously with the following error:
--8<---cut here---start->8---
2015-05-06 04:40:38.815545 7f3ef9046840 0 ceph version 0.94.1
(e4bfad3a3c51054df7e537a724c8d
Hi Christian,
Wonder why are you saying EC will write more data than replication ?
Anyways, as you suggested, I will see how can I measure WA for EC vs
replication.
Thanks & Regards
Somnath
-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Monday, May 11, 2015 11:28
For me that's true about 1/3 the time, but often I do still have to repair the
PG after removing the affected OSD. YMMV.
>
>>
>>
>> Agree that 99+% of the inconsistent PG's I see correlate directly to disk
>> flern.
>>
>> Check /var/log/kern.log*, /var/log/messages*, etc. and I'll bet you f
Very strange. I'll see if I can reproduce on a giant release. If you run 'rbd
info --pool RBD-01 CEPH_006__01__NA__0003__ESX__ALL_EXT', what is the
output? I want to use the same settings as your image.
Does 'rbd diff' work against the image (i.e. more than a few kilobyes of
deltas)? Al
Thx Mark
I understand the specific parameters are mandatory for the S3 implementation
but as they are not for the swift implementation (I tested it...)
it should have been better to distinguish which parameter is mandatory
according to the implementation.
For the S3 implementation, the creation
could i build one region using two clusters?? each cluster has one zone?? so
that I sync metadata and data from one cluster to another cluster??
I build two ceph clusters.
for the first cluster, I do the follow steps
1.create pools
sudo ceph osd pool create .us-east.rgw.root 64 64
sudo ceph
On 04/30/2015 07:50 PM, Gregory Farnum wrote:
On Thu, Apr 30, 2015 at 2:03 AM, Kenneth Waegeman
wrote:
So the cache is empty, but I get warning when I check the health:
health HEALTH_WARN
mds0: Client cephtst.cubone.os failing to respond to cache
pressure
Someone an idea w
Hello,
I'm not familiar with Cisco UCS gear (can you cite exact models?),
but somehow the thought of buying compute gear from Cisco makes me think of
having too much money or very steep discounts. ^o^
That said, I presume the chassis those blades are in have redundancy in
terms of PSUs (we alway
Hi,
we have some space in our two blade chassis, so I was thinking of the
pros and cons of using some blades as MONs. I thought about five MONs.
Pro: space saving in our rack
Con: "just" two blade centers. Two points of failures.
From the redundndency POV I'd go with standalone servers, but spac
I build two ceph clusters.
for the first cluster, I do the follow steps
1.create pools
sudo ceph osd pool create .us-east.rgw.root 64 64
sudo ceph osd pool create .us-east.rgw.control 64 64
sudo ceph osd pool create .us-east.rgw.gc 64 64
sudo ceph osd pool create .us-east.rgw.buckets 64 64
sudo c
On 05/06/15 20:28, Lionel Bouton wrote:
> Hi,
>
> On 05/06/15 20:07, Timofey Titovets wrote:
>> 2015-05-06 20:51 GMT+03:00 Lionel Bouton :
>>> Is there something that would explain why initially Btrfs creates the
>>> 4MB files with 128k extents (32 extents / file) ? Is it a bad thing for
>>> perfor
Should we put a timeout to the unmap command on the RBD RA in the meantime?
> On 08 May 2015, at 15:13, Vandeir Eduardo wrote:
>
> Wouldn't be better a configuration named (map|unmap)_timeout? Cause we are
> talking about a map/unmap of a RBD device, not a mount/unmount of a file
> system.
>
Robert,
thanks a lot for the feedback!
I was very worried about the same thing! Glad to know tha Ceph's
automagic takes care of everything :-P
Best regards,
George
If you use ceph-disk (and I believe ceph-depoly) to create your OSDs,
or you go through the manual steps to set up the partit
Hi, when you have remove salt from nodes,
do you have remove the old master key
/etc/salt/pki/minion/minion_master.pub
?
I have add the same behavior than you when reinstalling calamari server, and
previously installed salt on ceph nodes
(with explicit error about the key in /var/log/salt/mini
Hi,
>>as Debian Jessie is already released for some time, I'd like to ask is
>>there any plans to build newer Ceph packages for it?
Yes it's planned, I'm currently helping on create images for build &&
integration platform.
If you want, I have build ceph packages for jessie:
http://odisoweb
Hey,
as Debian Jessie is already released for some time, I'd like to ask is
there any plans to build newer Ceph packages for it?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Tue, May 12, 2015 at 1:07 AM, Anthony D'Atri wrote:
>
>
> Agree that 99+% of the inconsistent PG's I see correlate directly to disk
> flern.
>
> Check /var/log/kern.log*, /var/log/messages*, etc. and I'll bet you find
> errors correlating.
>
More to this... In the case that an inconsistent P
64 matches
Mail list logo