Why do you need to move the data between pools? My guess is that for your
needs you can add another pool to the FS and do something with it that way.
Then you are using the same MDS servers and the same FS. I would probably
recommend doing the copy using mounted filesystems instead of a rados copy.
I don't understand why min_size = 2 would kill latency times. Regardless
of your min_size, a write to ceph does not ack until it completes to all
copies. That means that even with min_size = 1 the write will not be
successful until it's written to the NVME, the SSD, and the HDD (given your
propos
I would run some benchmarking throughout the cluster environment to see
where your bottlenecks are before putting time and money into something
that might not be your limiting resource. Sebastian Han put together a
great guide for benchmarking your cluster here.
https://www.sebastien-han.fr/blog/
This isn't a solution to fix them not starting at boot time, but a fix to
not having to reboot the node again. `ceph-disk activate-all` should go
through and start up the rest of your osds without another reboot.
On Wed, Aug 23, 2017 at 9:36 AM Sean Purdy wrote:
> Hi,
>
> Luminous 12.1.1
>
> I'
> min_size 1
STOP THE MADNESS. Search the ML to realize why you should never user a
min_size of 1.
I'm curious as well as to what this sort of configuration will do for how
many copies are stored between DCs.
On Thu, Aug 24, 2017 at 1:03 PM Sinan Polat wrote:
> Hi,
>
>
>
> In a Multi Datacente
s.
On Thu, Aug 24, 2017 at 10:59 AM David Turner wrote:
> I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This has
> been operational for 5 months and working fine. I recently created a new
> user on the master zone, used that user to create a bucket, and put in a
>
t this
point)?
Thank you,
David Turner
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Andreas, did you find a solution to your multisite sync issues with the
stuck shards? I'm also on 10.2.7 and having this problem. One realm has
stuck shards for data sync and another realm says it's up to date, but
isn't receiving new users via metadata sync. I ran metadata sync init on
it and i
erformance on large reads, and lose more
> performance on small writes/reads (dependent on cpu speed and various other
> factors).
>
> Mark
>
> >
> > Anyway, thanks for the info!
> > Xavier.
> >
> > -Mensaje original-
> > De: Christian Bal
same
page. Does anyone know if that command will overwrite any local data that
the zone has that the other doesn't if you run `data sync init` on it?
On Thu, Aug 24, 2017 at 1:51 PM David Turner wrote:
> After restarting the 2 RGW daemons on the second site again, everything
> caug
Additionally, solely testing if you can write to the path could give a
false sense of security if the path is writable when the RBD is not
mounted. It would write a file to the system drive and you would see it as
successful.
On Fri, Aug 25, 2017 at 2:27 AM Adrian Saul
wrote:
> If you are monit
ble alongside a tcmu-runner
backstore: http://linux-iscsi.org/wiki/ISCSI_Extensions_for_RDMA
I'm not aware of any testing of this combination though.
Cheers, David
pgpnwbOwz5naj.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
PM Casey Bodley wrote:
> Hi David,
>
> The 'data sync init' command won't touch any actual object data, no.
> Resetting the data sync status will just cause a zone to restart a full
> sync of the --source-zone's data changes log. This log only lists which
&
In your example of EC 5 + 3, your min_size is 5. As long as you have 5
hosts up, you should still be serving content. My home cluster uses 2+1 and
has 3 nodes. I can reboot any node (leaving 2 online) as long as the PGs in
the cluster are healthy. If I were to actually lose a node, I would have to
To addend Steve's success, the rbd was created in a second cluster in the
same datacenter so it didn't run the risk of deadlocking that mapping rbds
on machines running osds has. It is still theoretical to work on the same
cluster, but more inherently dangerous for a few reasons.
On Tue, Aug 29,
But it was absolutely awesome to run an osd off of an rbd after the disk
failed.
On Tue, Aug 29, 2017, 1:42 PM David Turner wrote:
> To addend Steve's success, the rbd was created in a second cluster in the
> same datacenter so it didn't run the risk of deadlocking that m
LTS release happened today, so 12.2.0 is the best thing to run
as of now.
See if any existing bugs like http://tracker.ceph.com/issues/21142 are
related.
David
On 8/29/17 8:24 AM, Tomasz Kusmierz wrote:
So nobody has any clue on this one ???
Should I go with this one to dev mailing list
ALL OSDs need to be running the same private network at the same time. ALL
clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the same
public network at the same time. You cannot do this as a one at a time
migration to the new IP space. Even if all of the servers can still
communic
How long are you seeing these blocked requests for? Initially or
perpetually? Changing the failure domain causes all PGs to peer at the
same time. This would be the cause if it happens really quickly. There is
no way to avoid all of them peering while making a change like this. After
that, It
": "failed to sync bucket instance: (5)
Input\/output error"
65 "message": "failed to sync object"
On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman wrote:
>
> Hi David,
>
> On Mon, Aug 28, 2017 at 8:33 PM, David Turner
> wro
Jewel 10.2.7. I found a discrepancy in object counts for a multisite
configuration and it's looking like it might be orphaned multipart files
causing it. It doesn't look like this PR has received much attention. Is
there anything I can do to help you with testing/confirming a use case for
this t
drive that you're reading from/writing to in
sectors that are bad or at least slower.
On Fri, Sep 1, 2017, 6:13 AM Laszlo Budai wrote:
> Hi David,
>
> Well, most probably the larger part of our PGs will have to be
> reorganized, as we are moving from 9 hosts to 3 chassis. But I
appearing only
> during the backfill. I will try to dig deeper into the IO operations at the
> next test.
>
> Kind regards,
> Laszlo
>
>
>
> On 01.09.2017 16:08, David Turner wrote:
> > That is normal to have backfilling because the crush map did change. The
>
I am unaware of any way to accomplish having 1 pool with all 3 racks and
another pool with only 2 of them. If you could put the same osd in 2
different roots or have a crush rule choose from 2 different roots, then
this might work out. To my knowledge neither of these is possible.
What is your rea
gt; peering -> remapped ->
>> active+clean) and ceph become health_ok
>>
>> ceph cluster become health_ok eventually, but in this time there was a
>> problem that rbd can not found rbd images like below.
>> # rbd ls -p volumes
>> hhvol01
>> # rbd info
d, Aug 30, 2017 at 3:55 PM Jeremy Hanmer
> wrote:
>
>> This is simply not true. We run quite a few ceph clusters with
>> rack-level layer2 domains (thus routing between racks) and everything
>> works great.
>>
>> On Wed, Aug 30, 2017 at 10:52 AM, David Turner
>
Did the journal drive fail during operation? Or was it taken out during
pre-failure. If it fully failed, then most likely you can't guarantee the
consistency of the underlying osds. In this case, you just put the affected
osds and add them back in as new osds.
In the case of having good data on th
hat is preventing the
metadata from syncing in the other realm? I have 2 realms being sync using
multi-site and it's only 1 of them that isn't getting the metadata across.
As far as I can tell it is configured identically.
On Thu, Aug 31, 2017 at 12:46 PM David Turner wrote:
> All
On Filestore you would flush the journal and then after mapping the new
journal device use the command to create the journal. I'm sure there's
something similar for bluestore, but I don't have any experience with it
yet. Is there a new command similar to flush and create for the WAL and DB?
On T
`ceph health detail` will give a little more information into the blocked
requests. Specifically which OSDs are the requests blocked on and how long
have they actually been blocked (as opposed to '> 32 sec'). I usually find
a pattern after watching that for a time and narrow things down to an OSD
To be fair, other times I have to go in and tweak configuration settings
and timings to resolve chronic blocked requests.
On Thu, Sep 7, 2017 at 1:32 PM David Turner wrote:
> `ceph health detail` will give a little more information into the blocked
> requests. Specifically which OSDs a
ub
wrote:
> On Thu, Sep 7, 2017 at 7:44 PM, David Turner
> wrote:
> > Ok, I've been testing, investigating, researching, etc for the last week
> and
> > I don't have any problems with data syncing. The clients on one side are
> > creating multipart objec
>
>
> 1 ops are blocked > 524.288 sec on osd.2
>
> 1 ops are blocked > 262.144 sec on osd.2
>
> 2 ops are blocked > 65.536 sec on osd.21
>
> 9 ops are blocked > 1048.58 sec on osd.5
>
> 9 ops are blocked > 524.288 sec on osd.5
>
> 71 ops are blocke
ucket both
in `mdlog list`.
On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub
wrote:
> On Thu, Sep 7, 2017 at 10:04 PM, David Turner
> wrote:
> > One realm is called public with a zonegroup called public-zg with a zone
> for
> > each datacenter. The second realm is c
I'm pretty sure I'm using the cluster admin user/keyring. Is there any
output that would be helpful? Period, zonegroup get, etc?
On Thu, Sep 7, 2017 at 4:27 PM Yehuda Sadeh-Weinraub
wrote:
> On Thu, Sep 7, 2017 at 11:02 PM, David Turner
> wrote:
> > I created a test use
I sent the output of all of the files including the logs to you. Thank you
for your help so far.
On Thu, Sep 7, 2017 at 4:48 PM Yehuda Sadeh-Weinraub
wrote:
> On Thu, Sep 7, 2017 at 11:37 PM, David Turner
> wrote:
> > I'm pretty sure I'm using the cluster admin user/
e as empty?
>
> On Wed, Sep 6, 2017 at 11:23 PM, M Ranga Swami Reddy
> wrote:
> > Thank you. Iam able to replace the dmcrypt journal successfully.
> >
> > On Sep 5, 2017 18:14, "David Turner" wrote:
> >>
> >> Did the journal drive fail during o
h .1327
.dir.default.292886573.13181.12 remove"
.dir.default.64449186.344176 has selected_object_info with "od 337cf025"
so shards have "omap_digest_mismatch_oi" except for osd 990.
The pg repair code will use osd.990 to fix the other 2 copies without
further handling.
D
6.344176 get-omaphdr
obj_header
$ for i in $(ceph-objectstore-tool --data-path ... --pgid 5.3d40
.dir.default.64449186.344176 list-omap)
do
echo -n "${i}: "
ceph-objectstore-tool --data-path ... .dir.default.292886573.13181.12
get-omap $i
done
key1: val1
key2: val2
key3: val3
David
On
What do you mean by "updated crush map to 1"? Can you please provide a
copy of your crush map and `ceph osd df`?
On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi,
>
> I'recently updated crush map to 1 and did all relocation of the pgs. At
> the e
Did you configure your crush map to have that hierarchy of region,
datacenter, room, row, rack, and chassis? If you're using the default
crush map, then it has no idea about any of those places/locations. I
don't know what the crush map would look like after using that syntax if
the crush map did
d copy and
paste the running command (viewable in ps) to know exactly what to run in
the screens to start the daemons like this.
On Wed, Sep 13, 2017 at 6:53 PM David wrote:
> Hi All
>
> I did a Jewel -> Luminous upgrade on my dev cluster and it went very
> smoothly.
>
> I
The warning you are seeing is because those settings are out of order and
it's showing you which ones are greater than the ones they should be.
backfillfull_ratio is supposed to be higher than nearfull_ratio and
osd_failsafe_full_ratio is supposed to be higher than full_ratio.
nearfull_ratio is a
17.13 0.64 340
> 6 0.90919 1.0 931G 164G 766G 17.70 0.67 210
> TOTAL 4179G G 3067G 26.60
> MIN/MAX VAR: 0.64/2.32 STDDEV: 16.99
>
> As I said I still have OSD1 intact so I can do whatever you need except
> readding to the cluster. Since I don't
I have this issue with my NVMe OSDs, but not my HDD OSDs. I have 15 HDD's
and 2 NVMe's in each host. We put most of the journals on one of the
NVMe's and a few on the second, but added a small OSD partition to the
second NVMe for RGW metadata pools.
When restarting a server manually for testing,
rading the packages is causing a restart of the Ceph
daemons, it is most definitely a bug and needs to be fixed.
On Fri, Sep 15, 2017 at 4:48 PM David wrote:
> Happy to report I got everything up to Luminous, used your tip to keep the
> OSDs running, David, thanks again for that.
>
> I
S... good riddance! That's ridiculous!
On Fri, Sep 15, 2017 at 6:06 PM Vasu Kulkarni wrote:
> On Fri, Sep 15, 2017 at 2:10 PM, David Turner
> wrote:
> > I'm glad that worked for you to finish the upgrade.
> >
> > He has multiple MONs, but all of them are on n
start these (or guarantee
that it will for those folks)?
On Fri, Sep 15, 2017 at 6:49 PM Gregory Farnum wrote:
> On Fri, Sep 15, 2017 at 3:34 PM David Turner
> wrote:
>
>> I don't understand a single use case where I want updating my packages
>> using yum, apt, etc to r
I've never needed to do anything other than extend the partition and/or
filesystem when I increased the size of an RBD. Particularly if I didn't
partition the RBD I only needed to extend the filesystem.
Which method are you mapping/mounting the RBD? Is it through a Hypervisor
or just mapped to a
ass,
> which I don’t have) echo 1 > /sys/devices/rbd/21/refresh
>
> (I am trying to online increase the size via kvm, virtio disk in win
> 2016)
>
>
> -Original Message-
> From: David Turner [mailto:drakonst...@gmail.com]
> Sent: maandag 18 september 2017 22:42
>
Are you asking to add the osd back with its data or add it back in as a
fresh osd. What is your `ceph status`?
On Tue, Sep 19, 2017, 5:23 AM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi David,
>
> Thank you for the great explanation of the weights, I th
do <
gagui...@aguilardelgado.com> wrote:
> Hi David,
>
> What I want is to add the OSD back with its data yes. But avoiding any
> troubles that can happen from the time it was out.
>
> Is it possible? I suppose that some pg has been updated after. Will ceph
> manage it gracefully?
&
Just starting 3 nights ago we started seeing OSDs randomly going down in
our cluster (Jewel 10.2.7). At first I saw that each OSD that was recently
marked down in the cluster (`ceph osd dump | grep -E '^osd\.[0-9]+\s' |
sort -nrk11` sorted list of OSDs by which OSDs have been marked down in the
mo
Can you please provide the output of `ceph status`, `ceph osd tree`, and
`ceph health detail`? Thank you.
On Tue, Sep 19, 2017 at 2:59 PM Jonas Jaszkowic <
jonasjaszkowic.w...@gmail.com> wrote:
> Hi all,
>
> I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD
> of size 320
degraded, acting
> [14,29,4,1,19,17,9,0,3,16,24,2]
> pg 3.5b is active+recovery_wait+degraded, acting
> [4,15,14,30,28,1,12,10,2,29,24,18]
> pg 3.52 is active+recovery_wait+degraded, acting
> [17,24,20,23,4,14,18,27,8,22,9,31]
> pg 3.51 is active+recovery_wait+degraded
ssible). Are there any important
> options that I have to know?
>
> What is the best practice to deal with the issue recovery speed vs.
> read/write speed during a recovery situation? Do you
> have any suggestions/references/hints how to deal with such situations?
>
>
> Am 20.09.2
admin socket to see your
currently running settings to make sure that they took effect.
http://docs.ceph.com/docs/kraken/rados/operations/monitoring/#using-the-admin-socket
On Wed, Sep 20, 2017 at 11:42 AM David Turner wrote:
> You are currently on Kraken, but if you upgrade to Luminous you
active+remapped+backfilling, 173
> active+clean; 1975 GB data, 3011 GB used, 7063 GB / 10075 GB avail;
> 30549/1376215 objects degraded (2.220%); 12201/1376215 objects misplaced
> (0.887%); 21868 kB/s, 3 objects/s recovering
>
> Is this an acceptable recovery rate? Unfortunately I hav
Correction, if the OSD had been marked down and been marked out, some of
its PGs would be in a backfill state while others would be in a recovery
state depending on how long the OSD was marked down and how much
backfilling had completed in the cluster.
On Wed, Sep 20, 2017 at 12:06 PM David
been up just long enough before it crashed to cause problems.
On Wed, Sep 20, 2017 at 1:12 PM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi David,
>
> Thank you for your support. What can be the cause of
> active+clean+inconsistent still growing up? Bad
is (quite) working as described with *ceph osd out * and *ceph
> osd in *, but I am wondering
> if this produces a realistic behavior.
>
>
> Am 20.09.2017 um 18:06 schrieb David Turner :
>
> When you posted your ceph status, you only had 56 PGs degraded. Any value
> of osd_max_
You can always add the telegraf user to the ceph group. That change will
persist on reboots and allow the user running the commands to read any
folder/file that is owned by the group ceph. I do this for Zabbix and
Nagios now that the /var/lib/ceph folder is not public readable.
On Wed, Sep 20, 2
27;t use EC pools, but my experience with similar slow requests on
>> RGW+replicated_pools is that in the logs you need to find out the first
>> slow request and identify where it's from, for example, is it deep-scrub,
>> or some client accessing corrupted objects, disk error
The request remains blocked if you issue `ceph osd down 2`? Marking the
offending OSD as down usually clears up blocked requests for me... at least
it resets the timer on it and the requests start blocking again if the OSD
is starting to fail.
On Fri, Sep 22, 2017 at 11:51 AM Matthew Stroud
wrot
g for missing object
>
>
>
> Thanks,
>
> Matthew Stroud
>
>
>
> *From: *David Turner
> *Date: *Friday, September 22, 2017 at 9:57 AM
> *To: *Matthew Stroud , "
> ceph-users@lists.ceph.com"
> *Subject: *Re: [ceph-users] Stuck IOs
>
>
>
>
ed with the new Ceph
version.
In general RBDs are not affected by upgrades as long as you don't take down
too much of the cluster at once and are properly doing a rolling upgrade.
On Mon, Sep 25, 2017 at 8:07 AM David wrote:
> Hi Götz
>
> If you did a rolling upgrade, RBD clients should
db/wal partitions are per OSD. DB partitions need to be made as big as you
need them. If they run out of space, they will fall back to the block
device. If the DB and block are on the same device, then there's no reason
to partition them and figure out the best size. If they are on separate
dev
You can update the server with the mapped rbd and shouldn't see as much as
a blip on your VMs.
On Tue, Sep 26, 2017, 3:32 AM Götz Reinicke
wrote:
> Hi Thanks David & David,
>
> we don’t use the fuse code. And may be I was a bit unclear, but your
> feedback clears some
With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
the default, at almost all times, there are 2-3 deep scrubs running.
3 simultaneous deep scrubs is enough to cause a constant stream of:
mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32
sec (REQUEST_SLOW)
, "osd": 2,
"primary": false } ], "selected_object_info": "3:ce3f1d6a:::
mytestobject:head(47'54 osd.0.0:53 dirty|omap|data_digest|omap_digest s
143456 uv 3 dd 2ddbf8f5 od f5fba2c6 alloc_hint [0 0 0])",
"union_shard_errors": [ "data_digest_mi
You can also use ceph-fuse instead of the kernel driver to mount cephfs. It
supports all of the luminous features.
On Wed, Sep 27, 2017, 8:46 AM Yoann Moulin wrote:
> Hello,
>
> > Try to work with the tunables:
> >
> > $ *ceph osd crush show-tunables*
> > {
> > "choose_local_tries": 0,
> >
When you lose 2 osds you have 30 osds accepting the degraded data and
performing the backfilling. When the 2 osds are added back in you only have
2 osds receiving the majority of the data from the backfilling. 2 osds
have a lot less available iops and spindle speed than the other 30 did when
they
I've reinstalled a host many times over the years. We used dmcrypt so I
made sure to back up the keys for that. Other than that it is seamless as
long as your installation process only affects the root disk. If it
affected any osd or journal disk, then you would need to mark those osds
out and re-
ooking for scrub should give you
some ideas of things to try.
On Tue, Sep 26, 2017, 2:04 PM J David wrote:
> With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
> the default, at almost all times, there are 2-3 deep scrubs running.
>
> 3 simultaneous deep scrubs is enou
There are new PG states that cause health_err. In this case it is
undersized that is causing this state.
While I decided to upgrade my tunables before upgrading the rest of my
cluster, it does not seem to be a requirement. However I would recommend
upgrading them sooner than later. It will cause a
to do
> it by turning off deep scrubs, forcing individual PGs to deep scrub at
> intervals, and then enabling deep scrubs again.
> -Greg
>
>
> On Wed, Sep 27, 2017 at 6:34 AM David Turner
> wrote:
>
>> This isn't an answer, but a suggestion to try and help track it
If you're scheduling them appropriately so that no deep scrubs will happen
on their own, then you can just check the cluster status if any PGs are
deep scrubbing at all. If you're only scheduling them for specific pools,
then you can confirm which PGs are being deep scrubbed in a specific pool
wit
I'm going to assume you're dealing with your scrub errors and have a game
plan for those as you didn't mention them in your question at all.
One thing I'm always leary of when I see blocked requests happening is that
the PGs might be splitting subfolders. It is pretty much a guarantee if
you're a
The reason it is recommended not to raid your disks is to give them all to
Ceph. When a disk fails, Ceph can generally recover faster than the raid
can. The biggest problem with raid is that you need to replace the disk
and rebuild the raid asap. When a disk fails in Ceph, the cluster just
moves
There is no tool on the Ceph side to see which RBDs are doing what.
Generally you need to monitor the mount points for the RBDs to track that
down with iostat or something.
That said, there are some tricky things you could probably do to track down
the RBD that is doing a bunch of stuff (as long a
His dilemma sounded like he has access to the cluster, but not any of the
clients where the RBDs are used or even the hypervisors in charge of those.
On Fri, Sep 29, 2017 at 12:03 PM Maged Mokhtar wrote:
> On 2017-09-29 17:13, Matthew Stroud wrote:
>
> Is there a way I could get a performance st
I can only think of 1 type of cache tier usage that is faster if you are
using the cache tier on the same root of osds as the EC pool. That is cold
storage where the file is written initially, modified and read door the
first X hours, and then remains in cold storage for the remainder of its
life
Proofread failure. "modified and read during* the first X hours, and then
remains in cold storage for the remainder of its life with rare* reads"
On Sat, Sep 30, 2017, 1:32 PM David Turner wrote:
> I can only think of 1 type of cache tier usage that is faster if you are
> usin
I'm pretty sure that the process is the same as with filestore. The cluster
doesn't really know if an osd is filestore or bluestore... It's just an osd
running a daemon.
If there are any differences, they would be in the release notes for
Luminous as changes from Jewel.
On Sat, Sep 30, 2017, 6:28
e the cache tier. I mention that
because if it is that easy to enable/disable, then testing it should be
simple and easy to compare.
On Sat, Sep 30, 2017, 8:10 PM Chad William Seys
wrote:
> Hi David,
>Thanks for the clarification. Reminded me of some details I forgot
> to mention.
&g
Adding more OSDs or deleting/recreating pools that have too many PGs are
your only 2 options to reduce the number of PG's per OSD. It is on the
Ceph roadmap, but is not a currently supported feature. You can
alternatively adjust the setting threshold for the warning, but it is still
a problem you
> Andrei
> --
>
> *From: *"David Turner"
> *To: *"Jack" , "ceph-users" <
> ceph-users@lists.ceph.com>
> *Sent: *Monday, 2 October, 2017 22:28:33
> *Subject: *Re: [ceph-users] decreasing number of PGs
>
> Adding more OSDs or
My guess is a networking problem. Do you have vlans, cluster network vs
public network in the ceph.conf, etc configured? Can you ping between all
of your storage nodes on all of their IPs?
All of your OSDs communicate with the mons on the public network, but they
communicate with each other for
Just to make sure you're not confusing redundancy with backups. Having
your data in another site does not back up your data, but makes it more
redundant. For instance if an object/file is accidentally deleted from RGW
and you're syncing those files to AWS, Google buckets, or a second RGW
cluster
You're missing most all of the important bits. What the osds in your
cluster look like, your tree, and your cache pool settings.
ceph df
ceph osd df
ceph osd tree
ceph osd pool get cephfs_cache all
You have your writeback cache on 3 nvme drives. It looks like you have
1.6TB available between them
On Fri, Oct 6, 2017, 1:05 AM Christian Balzer wrote:
>
> Hello,
>
> On Fri, 06 Oct 2017 03:30:41 + David Turner wrote:
>
> > You're missing most all of the important bits. What the osds in your
> > cluster look like, your tree, and your cache pool settings.
6, 2017 at 4:49 PM, Shawfeng Dong wrote:
>> > >> > Dear all,
>> > >> >
>> > >> > Thanks a lot for the very insightful comments/suggestions!
>> > >> >
>> > >> > There are 3 OSD servers in our pilot Ceph clust
gt; # rados -p cephfs_data ls
>
> Any advice?
>
> On Fri, Oct 6, 2017 at 9:45 AM, David Turner
> wrote:
>
>> Notice in the URL for the documentation the use of "luminous". When you
>> looked a few weeks ago, you might have been looking at the documentation
&g
> flushing of objects to the underlying data pool. Once I killed that
> process, objects started to flush to the data pool automatically (with
> target_max_bytes & target_max_objects set); and I can force the flushing
> with 'rados -p cephfs_cache cache-flush-evict-all' as
way
from the primary NVMe copy. It wastes a copy of all of the data in the
pool, but that's on the much cheaper HDD storage and can probably be
considered acceptable losses for the sake of having the primary OSD on NVMe
drives.
On Sat, Oct 7, 2017 at 3:36 PM Peter Linder
wrote:
> On 1
Just to make sure you understand that the reads will happen on the primary
osd for the PG and not the nearest osd, meaning that reads will go between
the datacenters. Also that each write will not ack until all 3 writes
happen adding the latency to the writes and reads both.
On Sat, Oct 7, 2017, 1
all operations in the isolated DC will be
> frozen, so I believe you would not lose data.
>
>
>>
>>
>>
>> On Sat, Oct 7, 2017 at 3:36 PM Peter Linder
>> wrote:
>>
>>> On 10/7/2017 8:08 PM, David Turner wrote:
>>>
>>> Just
ming after hot swaping the device, the drive letter
> is "sdx" according to the link above what would be the right command to
> re-use the two NVME partitions for block db and wal ?
>
> I presume that everything else is the same.
> best.
>
>
> On Sat, Sep 30,
I've managed RBD cluster that had all of the RBDs configured to 1M objects
and filled up the cluster to 75% full with 4TB drives. Other than the
collection splitting (subfolder splitting as I've called it before) we
didn't have any problems with object counts.
On Wed, Oct 11, 2017 at 9:47 AM Greg
Christian is correct that min_size does not affect how many need to ACK the
write, it is responsible for how many copies need to be available for the
PG to be accessible. This is where SSD journals for filestore and SSD
DB/WAL partitions come into play. The write is considered ACK'd as soon as
th
401 - 500 of 1516 matches
Mail list logo