Hi,
I need to add a more complex crush ruleset to a cluster and was trying
to script that as I'll need to do it often.
Is there any way to create these other than manually editing the crush map?
This is to create a k=4 + m=2 across 3 rooms, with 2 parts in each room
The ruleset would be somethin
nt. I will do my best to share the experience in the mailing
list after...
Is there anything we should be aware, and is there anything I could
report back to the community to test/experiment/debug the solution?
and thanks for all your help.
On Fri, Feb 8, 2019 at 5:20 PM Jason Dillaman wrote:
>
t; > Nautilus:
> > >
> > > https://fosdem.org/2019/schedule/event/ceph_project_status_update/attachments/slides/3251/export/events/attachments/ceph_project_status_update/slides/3251/ceph_new_in_nautilus.pdf
> > >
> > > Kind regards,
> > > Caspar
Hi,
a recurring topic is live migration and pool type change (moving from
EC to replicated or vice versa).
When I went to the OpenStack open infrastructure (aka summit) Sage
mentioned about support of live migration of volumes (and as a result
of pools) in Nautilus. Is this still the case and is
It may be related to http://tracker.ceph.com/issues/34307 - I have a
cluster whose OMAP size is larger than the stored data...
On Mon, Oct 22, 2018 at 11:09 AM Wido den Hollander wrote:
>
>
>
> On 8/31/18 5:31 PM, Dan van der Ster wrote:
> > So it sounds like you tried what I was going to do, and
Hi all,
I have several clusters, all running Luminous (12.2.7) proving S3
interface. All of them have enabled dynamic resharding and is working.
One of the newer clusters is starting to give warnings on the used
space for the OMAP directory. The default.rgw.buckets.index pool is
replicated with 3
I think your objective is to move the data without anyone else
noticing. What I usually do is reduce the priority of the recovery
process as much as possible. Do note this will make the recovery take
a looong time, and will also make recovery from failures slow...
ceph tell osd.* injectargs '--osd_
Hi all,
I have a couple of very big s3 buckets that store temporary data. We
keep writing to the buckets some files which are then read and
deleted. They serve as a temporary storage.
We're writing (and deleting) circa 1TB of data daily in each of those
buckets, and their size has been mostly sta
adding back in the list :)
-- Forwarded message -
From: Luis Periquito
Date: Wed, Jun 20, 2018 at 1:54 PM
Subject: Re: [ceph-users] Planning all flash cluster
To:
On Wed, Jun 20, 2018 at 1:35 PM Nick A wrote:
>
> Thank you, I was under the impression that 4GB RAM per 1
Adding more nodes from the beginning would probably be a good idea.
On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote:
>
> Hello Everyone,
>
> We're planning a small cluster on a budget, and I'd like to request any
> feedback or tips.
>
> 3x Dell R720XD with:
> 2x Xeon E5-2680v2 or very similar
The
Hi,
I have a big-ish cluster that, amongst other things, has a radosgw
configured to have an EC data pool (k=12, m=4). The cluster is
currently running Jewel (10.2.7).
That pool spans 244 HDDs and has 2048 PGs.
from the df detail:
.rgw.buckets.ec 26 -N/A N/A
7;s currently on a P3700 400G SSD and almost full (280g). We've been
able to avoid it to stop growing at such a rate as it was before. The
main issue was with an OSD crashing under load making a few others
crash/OoM as well. Setting nodown has been a saver.
>
>> On Wed, Apr 25, 2
Hi all,
we have a (really) big cluster that's ongoing a very bad move and the
monitor database is growing at an alarming rate.
The cluster is running jewel (10.2.7) and is there any way to trim the
monitor database before it gets HEALTH_OK?
I've searched and so far only found people saying not r
On Fri, Mar 23, 2018 at 4:05 AM, Anthony D'Atri wrote:
> FYI: I/O limiting in combination with OpenStack 10/12 + Ceph doesn?t work
> properly. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1476830
>
>
> That's an OpenStack bug, nothing to do with Ceph. Nothing stops you from
> using virsh to t
On Fri, Feb 9, 2018 at 2:59 PM, Kai Wagner wrote:
> Hi and welcome,
>
>
> On 09.02.2018 15:46, ST Wong (ITSC) wrote:
>
> Hi, I'm new to CEPH and got a task to setup CEPH with kind of DR feature.
> We've 2 10Gb connected data centers in the same campus.I wonder if it's
> possible to setup a CEP
on a cursory look of the information it seems the cluster is
overloaded with the requests.
Just a guess, but if you look at IO usage on those spindles they'll be
at or around 100% usage most of the time.
If that is the case then increasing the pg_num and pgp_num won't help,
and short term, will m
"ceph versions" returned all daemons as running 12.2.1.
On Fri, Jan 12, 2018 at 8:00 AM, Janne Johansson wrote:
> Running "ceph mon versions" and "ceph osd versions" and so on as you do the
> upgrades would have helped I guess.
>
>
> 2018-01-11 17:2
since. So just
restarting all the OSDs made the problem go away.
How to check if that was the case? The OSDs now have a "class" associated.
On Wed, Jan 10, 2018 at 7:16 PM, Luis Periquito wrote:
> Hi,
>
> I'm running a cluster with 12.2.1 and adding more OSDs to it.
&g
Hi,
I'm running a cluster with 12.2.1 and adding more OSDs to it.
Everything is running version 12.2.1 and require_osd is set to
luminous.
one of the pools is replicated with size 2 min_size 1, and is
seemingly blocking IO while recovering. I have no slow requests,
looking at the output of "ceph
you never said if it was bluestore or filestore?
Can you look in the server to see which component is being stressed
(network, cpu, disk)? Utilities like atop are very handy for this.
Regarding those specific SSDs they are particularly bad when running
some time without trimming - performance nos
On Tue, Dec 5, 2017 at 1:20 PM, Wido den Hollander wrote:
> Hi,
>
> I haven't tried this before but I expect it to work, but I wanted to check
> before proceeding.
>
> I have a Ceph cluster which is running with manually formatted FileStore XFS
> disks, Jewel, sysvinit and Ubuntu 14.04.
>
> I wo
Hi Wido,
what are you trying to optimise? Space? Power? Are you tied to OCP?
I remember Ciara had some interesting designs like this
http://www.ciaratech.com/product.php?id_prod=539&lang=en&id_cat1=1&id_cat2=67
though I don't believe they are OCP.
I also had a look and supermicro has a few that
As that is a small cluster I hope you still don't have a lot of
instances running...
You can add "admin socket" to the client configuration part and then
read performance information via that. IIRC that prints total bytes
and IOPS, but it should be simple to read/calculate difference. This
will ge
There are a few things I don't like about your machines... If you want
latency/IOPS (as you seemingly do) you really want the highest frequency
CPUs, even over number of cores. These are not too bad, but not great
either.
Also you have 2x CPU meaning NUMA. Have you pinned OSDs to NUMA nodes?
Ideal
Not looking at anything else, you didn't set the max_bytes or
max_objects for it to start flushing...
On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong wrote:
> Dear all,
>
> Thanks a lot for the very insightful comments/suggestions!
>
> There are 3 OSD servers in our pilot Ceph cluster, each with 2x
On Fri, Sep 29, 2017 at 9:44 AM, Adrian Saul
wrote:
>
> Do you mean that after you delete and remove the crush and auth entries for
> the OSD, when you go to create another OSD later it will re-use the previous
> OSD ID that you have destroyed in the past?
>
The issue is that has been giving th
Hi all,
I use puppet to deploy and manage my clusters.
Recently, as I have been doing a removal of old hardware and adding of
new I've noticed that sometimes the "ceph osd create" is returning
repeated IDs. Usually it's on the same server, but yesterday I saw it
in different servers.
I was expec
On Fri, Sep 22, 2017 at 9:49 AM, Dietmar Rieder
wrote:
> Hmm...
>
> not sure what happens if you loose 2 disks in 2 different rooms, isn't
> there is a risk that you loose data ?
yes, and that's why I don't really like the profile...
___
ceph-users mai
Hi all,
I've been trying to think what will be the best erasure code profile,
but I don't really like the one I came up with...
I have 3 rooms that are part of the same cluster, and I need to design
so we can lose any one of the 3.
As this is a backup cluster I was thinking on doing a k=2 m=1 co
What's your "osd crush update on start" option?
further information can be found
http://docs.ceph.com/docs/master/rados/operations/crush-map/
On Wed, Sep 13, 2017 at 4:38 PM, German Anders wrote:
> Hi cephers,
>
> I'm having an issue with a newly created cluster 12.2.0
> (32ce2a3ae5239ee33d61507
Not going through the obvious of that crush map is just not looking
correct or even sane... or that the policy itself doesn't sound very
sane - but I'm sure you'll understand the caveats and issues it may
present...
what's most probably happening is that a (or several) pool is using
those same OSD
I'm running OpenStack and using Ceph as a backend.
As all the tutorials advocate I use the show_image_direct_url option. This
creates the new volume with a pointer to original image.
As we defined everything with availability zones, we have one pool that's
HA in all of the zones for images - if a
Hi Dan,
I've enabled it in a couple of big-ish clusters and had the same
experience - a few seconds disruption caused by a peering process
being triggered, like any other crushmap update does. Can't remember
if it triggered data movement, but I have a feeling it did...
On Mon, Jul 10, 2017 at 3
> Keep in mind that 1.6TB P4600 is going to last about as long as your 400GB
> P3700, so if wear-out is a concern, don't put more stress on them.
>
I've been looking at the 2T ones, but it's about the same as the 400G P3700
> Also the P4600 is only slightly faster in writes than the P3700, so tha
he Micron 5100 MAX are finally shipping in volume to offer a
> replacement to Intel S3610, though no good replacement for the S3710 yet
> that I’ve seen on the endurance part.
>
> Reed
>
> On May 17, 2017, at 5:44 AM, Luis Periquito wrote:
>
> Anyway, in a couple months we
>> Anyway, in a couple months we'll start testing the Optane drives. They
>> are small and perhaps ideal journals, or?
>>
The problem with optanes is price: from what I've seen they cost 2x or
3x as much as the P3700...
But at least from what I've read they do look really great...
_
TL;DR: add the OSDs and then split the PGs
They are different commands for different situations...
changing the weight is to have a bigger number of nodes/devices.
Depending on the size of cluster, the size of the devices, how busy it
is and by how much you're growing it will have some different
One of the things I've noticed in the latest (3+ years) batch of CPUs
is that they ignore more the cpu scaler drivers and do what they want.
More than that interfaces like the /proc/cpuinfo are completely
incorrect.
I keep checking the real frequencies using applications like the
"i7z", and it sho
AM, Luis Periquito wrote:
>>> Right. The tool isn't removing objects (yet), because we wanted to
>>> have more confidence in the tool before having it automatically
>>> deleting all the found objects. The process currently is to manually
>>> move these obje
> Right. The tool isn't removing objects (yet), because we wanted to
> have more confidence in the tool before having it automatically
> deleting all the found objects. The process currently is to manually
> move these objects to a different backup pool (via rados cp, rados
> rm), then when you're
I have a cluster that has been leaking objects in radosgw and I've
upgraded it to 10.2.6.
After that I ran
radosgw-admin orphans find --pool=default.rgw.buckets.data --job-id=orphans
which found a bunch of objects. And ran
radosgw-admin orphans finish --pool=default.rgw.buckets.data --job-id=orph
Hi All,
I've just ran an upgrade in our test cluster, going from 10.2.3 to
10.2.6 and got the wonderful "failed to encode map with expected crc"
message.
Wasn't this supposed to only happen from pre-jewel to jewel?
should I be looking at something else?
thanks
__
I've run through many upgrades without anyone noticing, including in
very busy openstack environments.
As a rule of thumb you should upgrade MONs, OSDs, MDSs and RadosGWs in
that order, however you should always read the upgrade instructions on
the release notes page
(http://docs.ceph.com/docs/mas
Hi,
I have a cluster with RGW in which one bucket is really big, so every
so often we delete stuff from it.
That bucket is now taking 3.3T after we deleted just over 1T from it.
That was done last week.
The pool (.rgw.buckets) is using 5.1T, and before the deletion was
taking almost 6T.
How can
Without knowing the cluster architecture it's hard to know exactly
what may be happening. And you sent no information on your cluster...
How is the cluster hardware? Where are the journals? How busy are the
disks (% time busy)? What is the pool size? Are these replicated or EC
pools?
On Mon, No
Without knowing the cluster architecture it's hard to know exactly what may
be happening.
How is the cluster hardware? Where are the journals? How busy are the disks
(% time busy)? What is the pool size? Are these replicated or EC pools?
Have you tried tuning the deep-scrub processes? Have you tr
I have a pool that I every time I try to change it's crush_ruleset
crashes 2 out of my 3 mons, and it's always the same. I've tried
leaving the first one down and it crashes the second.
It's a replicated pool, and I have other pools that look exactly the same.
I've deep-scrub'ed all the PG's to m
I was upgrading a really old cluster from Infernalis (9.2.1) to Jewel
(10.2.3) and got some weird, but interesting issues. This cluster
started its life with Bobtail -> Dumpling -> Emperor -> Firefly ->
Giant -> Hammer -> Infernalis and now Jewel.
When I upgraded the first MON (out of 3) everythin
Hi all,
I was being asked if CEPH supports the Storage Management Initiative
Specification (SMI-S)? This for the context of monitoring our ceph
clusters/environments.
I've tried looking and find no references to supporting it. But does it?
thanks,
___
Changing the number of PGs is one of the most expensive operations you
can run, and should be avoided as much as possible.
Having said that you should try to avoid having way too many PGs with
very few OSDs, but it's certainly preferable to splitting PGs...
On Wed, Aug 3, 2016 at 1:15 PM, Maged M
Hi Jaroslaw,
several things are springing up to mind. I'm assuming the cluster is
healthy (other than the slow requests), right?
From the (little) information you send it seems the pools are
replicated with size 3, is that correct?
Are there any long running delete processes? They usually have a
Thanks for sharing Wido.
>From your information you only talk about MON and OSD. What about the
RGW nodes? You stated in the beginning that 99% is rgw...
On Wed, Jul 13, 2016 at 3:56 PM, Wido den Hollander wrote:
> Hello,
>
> The last 3 days I worked at a customer with a 1800 OSD cluster which h
Hi all,
I have (some) ceph clusters running hammer and they are serving S3 data.
There are a few radosgw serving requests, in a load balanced form
(actually OSPF anycast IPs).
Usually upgrades go smoothly whereby I upgrade a node at a time, and
traffic just gets redirected around the nodes that a
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph
>> I have created an Erasure Coded pool and would like to change the K
>> and M of it. Is there any way to do it without destroying the pool?
>>
> No.
>
> http://docs.ceph.com/docs/master/rados/operations/erasure-code/
>
> "Choosing the right profile is important because it cannot be modified
> aft
Hi all,
I have created an Erasure Coded pool and would like to change the K
and M of it. Is there any way to do it without destroying the pool?
The cluster doesn't have much IO, but the pool (rgw data) has just
over 10T, and I didn't wanted to lose it.
thanks,
__
> OTOH, running ceph on dynamically routed networks will put your routing
> daemon (e.g. bird) in a SPOF position...
>
I run a somewhat large estate with either BGP or OSPF attachment, not
only ceph is happy in either of them, as I have never had issues with
the routing daemons (after setting them
Nick,
TL;DR: works brilliantly :)
Where I work we have all of the ceph nodes (and a lot of other stuff) using
OSPF and BGP server attachment. With that we're able to implement solutions
like Anycast addresses, removing the need to add load balancers, for the
radosgw solution.
The biggest issues
It may be possible to do it with civetweb, but I use apache because of
HTTPS config.
On Tue, May 24, 2016 at 5:49 AM, fridifree wrote:
> What apache gives that civetweb not?
> Thank you
>
> On May 23, 2016 11:49 AM, "Anand Bhat" wrote:
>>
>> For performance, civetweb is better as fastcgi module
Hi,
On Mon, May 23, 2016 at 8:24 PM, Anthony D'Atri wrote:
>
>
> Re:
>
>> 2. Inefficient chown documentation - The documentation states that one
>> should "chown -R ceph:ceph /var/lib/ceph" if one is looking to have ceph-osd
>> ran as user ceph and not as root. Now, this command would run a ch
>
> Thanks & regards
> Somnath
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Luis
> Periquito
> Sent: Monday, May 23, 2016 7:30 AM
> To: Ceph Users
> Subject: [ceph-users] using jemalloc in trusty
>
> I
I've been running some tests with jewel, and wanted to enable jemalloc.
I noticed that the new jewel release now loads properly
/etc/default/ceph and has an option to use jemalloc.
I've installed jemalloc, enabled the LD_PRELOAD option, however doing
some tests it seems that it's still using tcmal
I've upgraded our test cluster from 9.2.1 to 10.2.1 and I still had
these issues. As before the script did fix the issue and the cluster
is now working.
Is the correct fix in 10.2.1 or was it still expected to run the fix?
If it makes a difference I'm running trusty, the cluster was created
on ha
You want to enable the "show_image_direct_url = True" option.
full configuration information can be found
http://docs.ceph.com/docs/master/rbd/rbd-openstack/
On Thu, Mar 31, 2016 at 10:49 PM, Mario Codeniera
wrote:
> Hi,
>
> Is there anyone done thin provisioning on OpenStack instances (virtual
I have a cluster spread across 2 racks, with a crush rule that splits
data across those racks.
To test a failure scenario we powered off one of the racks, and
expected ceph to continuing running. Of the 56 OSDs that were powered
off 52 were quickly set as down in the cluster (it took around 30
sec
you should really fix the peering objects.
So far what I've seen in ceph is that it prefers data integrity over
availability. So if it thinks that it can't keep all working properly
it tends to stop (i.e. blocked requests), thus I don't believe there's
a way to do this.
On Fri, Mar 4, 2016 at 1:0
On Wed, Mar 2, 2016 at 9:32 AM, Mihai Gheorghe wrote:
> Hi,
>
> I've got two questions!
>
> First. We are currently running Hammer in production. You are thinking of
> upgrading to Infernalis. Should we upgrade now or wait for the next LTS,
> Jewel? On ceph releases i can see Hammers EOL is estima
On Mon, Feb 29, 2016 at 11:20 PM, Robin H. Johnson wrote:
> On Mon, Feb 29, 2016 at 04:58:07PM +0000, Luis Periquito wrote:
>> Hi all,
>>
>> I have a biggish ceph environment and currently creating a bucket in
>> radosgw can take as long as 20s.
>>
>> Wha
Hi all,
I have a biggish ceph environment and currently creating a bucket in
radosgw can take as long as 20s.
What affects the time a bucket takes to be created? How can I improve that time?
I've tried to create in several "bucket-location" with different
backing pools (some of them empty) and t
Only way I can think of that is creating a new crush rule that selects
that specific OSD with min_size = max_size = 1, then creating a pool
with size = 1 and using that crush rule.
Then you can use that pool as you'd use any other pool.
I haven't tested however it should work.
On Thu, Oct 29, 20
There are several routes you can follow for this work. The best one
will depend on cluster size, current data, pool definition (size),
performance expectations, etc.
They range from doing dist-upgrade a node at a time, to
remove-upgrade-then-add nodes to the cluster. But knowing that ceph is
"self
>
> On 10/20/2015 08:41 AM, Robert LeBlanc wrote:
>>
>> Given enough load, that fast Jornal will get filled and you will only be
>> as fast as the back disk can flush (and at the same time service reads).
>> That the the situation we are in right now. We are still seeing better
>> performance than
>> One trick I've been using in my ceph clusters is hiding a slow write
>> backend behind a fast journal device. The write performance will be of
>> the fast (and small) journal device. This only helps on write, but it
>> can make a huge difference.
>>
>
> Do you mean an external filesystem journal
On Tue, Oct 20, 2015 at 3:26 AM, Haomai Wang wrote:
> The fact is that journal could help a lot for rbd use cases,
> especially for small ios. I don' t think it will be bottleneck. If we
> just want to reduce double write, it doesn't solve any performance
> problem.
>
One trick I've been using in
Hi,
I was looking for some ceph resources and saw a reference to
planet.ceph.com. However when I opened it I was sent to a dental
clinic (?). That doesn't sound right, does it?
I was at this page when I saw the reference...
thanks
___
ceph-users mailin
I've been trying to find a way to limit the number of request an user
can make the radosgw per unit of time - first thing developers done
here is as fast as possible parallel queries to the radosgw, making it
very slow.
I've looked into quotas, but they only refer to space, objects and buckets.
I
Oct 13, 2015 at 11:56 AM, Luis Periquito wrote:
> It seems I've hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>
> is there any way I can recover this cluster? It worked in our test
> cluster, but crashed the production one...
___
It seems I've hit this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1231630
is there any way I can recover this cluster? It worked in our test
cluster, but crashed the production one...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lis
I'm having some issues downloading a big file (60G+).
After some investigation it seems to be very similar to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001272.html,
however I'm currently running Hammer 0.94.3. However the files were
uploaded when the cluster was running Firefly
Hi All,
I was hearing the ceph talk about radosgw and Yehuda talks about storage
policies. I started looking for it in the documentation, on how to
implement/use and couldn't much information:
http://docs.ceph.com/docs/master/radosgw/s3/ says it doesn't currently
support it, and http://docs.ceph.c
ork I would be most obliged
>
> -Original Message-
> From: Shinobu Kinjo [mailto:ski...@redhat.com]
> Sent: 25 September 2015 13:31
> To: Luis Periquito
> Cc: Abhishek L; Robert Duncan; ceph-users
> Subject: Re: [ceph-users] radosgw and keystone version 3 domains
&
I'm having the exact same issue, and after looking it seems that radosgw is
hardcoded to authenticate using v2 api.
from the config file: rgw keystone url = http://openstackcontrol.lab:35357/
the "/v2.0/" is hardcoded and gets appended to the authentication request.
a snippet taken from radosgw
I'm in the process of adding more resources to an existing cluster.
I'll have 38 hosts, with 2 HDD each, for an EC pool. I plan on adding a
cache pool in front of it (is it worth it? S3 data, mostly writes and
objects are usually 200kB upwards to several MB/GB...); all of the hosts
are on the same
I've seen one misbehaving OSD stopping all the IO in a cluster... I've had
a situation where everything seemed fine with the OSD/node but the cluster
was grinding to a halt. There was no iowait, disk wasn't very busy, wasn't
doing recoveries, was up+in, no scrubs... Restart the OSD and everything
r
ld not be
> restarted possibly due to library mismatch.
>
> Do you know whether the self-healing feature of ceph is applicable between
> different versions or not?
>
>
>
> Fangzhe
>
>
>
> *From:* Luis Periquito [mailto:periqu...@gmail.com]
> *Sent:* Wednesda
I Would say the easiest way would be to leverage all the self-healing of
ceph: add the new nodes to the old cluster, allow or force all the data to
migrate between nodes, and then remove the old ones out.
Well to be fair you could probably just install radosgw on another node and
use it as your ga
On Mon, Feb 23, 2015 at 10:18 PM, Yehuda Sadeh-Weinraub
wrote:
>
>
> --
>
> *From: *"Shinji Nakamoto"
> *To: *ceph-us...@ceph.com
> *Sent: *Friday, February 20, 2015 3:58:39 PM
> *Subject: *[ceph-users] RadosGW - multiple dns names
>
> We have multiple interfaces on o
When I create a new user using radosgw-admin most of the time the secret
key gets escaped with a backslash, making it not work. Something like
"secret_key": "xx\/\/".
Why would the "/" need to be escaped? Why is it printing the "\/" instead
of "/" that does
I've been meaning to write an email with the experience we had at the
company I work. For the lack of a more complete one I'll just tell some of
the findings. Please note these are my experiences, and are correct for my
environment. The clients are running on openstack, and all servers are
trusty.
How big are those OPS? Are they random? How many nodes? How many SSDs/OSDs?
What are you using to make the tests? Using atop on the OSD nodes where is
your bottleneck?
On Mon, Aug 17, 2015 at 1:05 PM, Межов Игорь Александрович wrote:
> Hi!
>
> We also observe the same behavior on our test Hammer
I don't understand your question? You created a 1G RBD/disk and it's full.
You are able to grow it though - but that's a Linux management issue, not
ceph.
As everything is thin-provisioned you can create a RBD with an arbitrary
size - I've create one with 1PB when the cluster only had 600G/Raw
ava
yes. The issue is resource sharing as usual: the MONs will use disk I/O,
memory and CPU. If the cluster is small (test?) then there's no problem in
using the same disks. If the cluster starts to get bigger you may want to
dedicate resources (e.g. the disk for the MONs isn't used by an OSD). If
the
t;something" persistent for this monitor.
>
> I do not have that much more useful to contribute to this discussion,
> since I've more-or-less destroyed any evidence by re-building the monitor.
>
> Cheers,
> KJ
>
> On Fri, Jul 24, 2015 at 1:55 PM, Luis Periquito
> w
The leveldb is smallish: around 70mb.
I ran debug mon = 10 for a while, but couldn't find any interesting
information. I would run out of space quite quickly though as the log
partition only has 10g.
On 24 Jul 2015 21:13, "Mark Nelson" wrote:
> On 07/24/2015 02:31 PM, Lu
lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
>
> The last time we tried it segfaulted after a few minutes, so YMMV and be
> careful.
>
> Jan
>
> On 23 Jul 2015, at 18:18, Luis Periquito wrote:
>
> Hi Greg,
>
> I've been looking at the tcmalloc issues, but d
with different malloc binaries?
On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum wrote:
> On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito
> wrote:
> > The ceph-mon is already taking a lot of memory, and I ran a heap stats
> >
&g
as free...
On Wed, Jul 22, 2015 at 4:29 PM, Luis Periquito wrote:
> This cluster is server RBD storage for openstack, and today all the I/O
> was just stopped.
> After looking in the boxes ceph-mon was using 17G ram - and this was on
> *all* the mons. Restarting the main one just
? Any ideas on how to identify the
underlying issue?
thanks,
On Mon, Jul 20, 2015 at 1:59 PM, Luis Periquito wrote:
> Hi all,
>
> I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each node
> has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted including
>
Hi all,
I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each node
has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted including
replication). There are 3 MONs on this cluster.
I'm running on Ubuntu trusty with kernel 3.13.0-52-generic, with Hammer
(0.94.2).
This clu
Hi all,
I've seen several chats on the monitor elections, and how the one with the
lowest IP is always the master.
Is there any way to change or influence this behaviour? Other than changing
the IP of the monitor themselves?
thanks
___
ceph-users maili
1 - 100 of 136 matches
Mail list logo