Hi Joao :
Thanks for thorough analysis . My initial concern is that , I think
in some cases , network failure will make low rank monitor see little
siblings (not enough to form a quorum ) , but some high rank mointor
can see more siblings, so I want to try to choose the one who can see
the most
Anyone have some feedback on this? Happy to log a bug ticket if it is one, but
want to make sure not missing something Luminous change related.
,Ashley
Sent from my iPhone
On 4 Jul 2017, at 3:30 PM, Ashley Merrick
mailto:ash...@amerrick.co.uk>> wrote:
Okie noticed their is a new command to s
On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp wrote:
> Hi,
>
> 2017-07-05 20:31 GMT+02:00 Ilya Dryomov :
>> On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp wrote:
>>> Hello,
>>>
>>> I have problem that sometimes I can't unmap rbd device, I get "sysfs
>>> write failed rbd: unmap failed: (16) Devic
Hi!
I changed the partitioning scheme to use a "real" primary partition instead of
a logical volume. Ceph-deploy seems run fine now, but the OSD does not start.
I see lots of these in the journal:
Jul 06 13:53:42 sh[9768]: 0> 2017-07-06 13:53:42.794027 7fcf9918fb80 -1 ***
Caught signal (Abort
Hi,
If you're using "rbd_aio_write()" in your code, be aware of the fact that
before Luminous release, this function expects buffer to remain unchanged
until write op ends, and on Luminous and later this function internally
copies the buffer, allocating memory where needed, freeing it once wri
On Thu, Jul 6, 2017 at 2:23 PM, Stanislav Kopp wrote:
> 2017-07-06 14:16 GMT+02:00 Ilya Dryomov :
>> On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp wrote:
>>> Hi,
>>>
>>> 2017-07-05 20:31 GMT+02:00 Ilya Dryomov :
On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp wrote:
> Hello,
>
>
Pre-Luminous also copies the provided buffer when using the C API --
it just copies it at a later point and not immediately. The eventual
goal is to eliminate the copy completely, but that requires some
additional plumbing work deep down within the librados messenger
layer.
On Thu, Jul 6, 2017 at
On 17-07-06 03:03 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 8:26 AM, Piotr Dałek wrote:
Hi,
If you're using "rbd_aio_write()" in your code, be aware of the fact that
before Luminous release, this function expects buffer to remain unchanged
until write op ends, and on Luminous and later
The correct (POSIX-style) program behavior should treat the buffer as
immutable until the IO operation completes. It is never safe to assume
the buffer can be re-used while the IO is in-flight. You should not
add any logic to assume the buffer is safely copied prior to the
completion of the IO.
On
Hi Ceph Users,
We plan to add 20 storage nodes to our existing cluster of 40 nodes, each node
has 36 x 5.458 TiB drives. We plan to add the storage such that all new OSDs
are prepared, activated and ready to take data but not until we start slowly
increasing their weightings. We also expect thi
Hey folks,
We have a cluster that's currently backfilling from increasing PG counts. We
have tuned recovery and backfill way down as a "precaution" and would like to
start tuning it to bring up to a good balance between that and client I/O.
At the moment we're in the process of bumping up PG nu
On 17-07-06 03:43 PM, Jason Dillaman wrote:
I've learned the hard way that pre-luminous, even if it copies the buffer,
it does so too late. In my specific case, my FUSE module does enter the
write call and issues rbd_aio_write there, then exits the write - expecting
the buffer provided by FUSE to
On Thu, 6 Jul 2017, Z Will wrote:
> Hi Joao :
>
> Thanks for thorough analysis . My initial concern is that , I think
> in some cases , network failure will make low rank monitor see little
> siblings (not enough to form a quorum ) , but some high rank mointor
> can see more siblings, so I want
On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek wrote:
> So I really see two problems here: lack of API docs and
> backwards-incompatible change in API behavior.
Docs are always in need of update, so any pull requests would be
greatly appreciated.
However, I disagree that the behavior has substanti
Just a quick place to start is osd_max_backfills. You have this set to 1.
Each PG is on 11 OSDs. When you have a PG moving, it is on the original 11
OSDs and the new X number of OSDs that it is going to. For each of your
PGs that is moving, an OSD can only move 1 at a time (your
osd_max_backfill
On 17-07-06 04:40 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek wrote:
So I really see two problems here: lack of API docs and
backwards-incompatible change in API behavior.
Docs are always in need of update, so any pull requests would be
greatly appreciated.
However
On Thu, Jul 6, 2017 at 7:04 AM wrote:
> Hi Ceph Users,
>
>
>
> We plan to add 20 storage nodes to our existing cluster of 40 nodes, each
> node has 36 x 5.458 TiB drives. We plan to add the storage such that all
> new OSDs are prepared, activated and ready to take data but not until we
> start sl
On Tue, Jul 4, 2017 at 10:47 PM Eino Tuominen wrote:
> Hello,
>
>
> I noticed the same behaviour in our cluster.
>
>
> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>
>
>
> cluster 0a9f2d69-5905-4369-81ae-e36e4a791831
>
> health HEALTH_WARN
>
> 1 pgs backfil
WOW!
Thanks to everybody!
A tons of suggestion and good tips!
At the moment we are already using 100Gb/s cards and we are already
adopted 100Gb/s switch so we can go with 40Gb/s that are fully
compatible with our SWITCH.
About CPU I was wrong, the model that we are seeing is not 2603 but 2630
> Op 6 juli 2017 om 18:27 schreef Massimiliano Cuttini :
>
>
> WOW!
>
> Thanks to everybody!
> A tons of suggestion and good tips!
>
> At the moment we are already using 100Gb/s cards and we are already
> adopted 100Gb/s switch so we can go with 40Gb/s that are fully
> compatible with our SW
Hi Wido,
I came across this ancient ML entry with no responses and wanted to follow up
with you to see if you recalled any solution to this.
Copying the ceph-users list to preserve any replies that may result for
archival.
I have a couple of boxes with 10x Micron 5100 SATA SSD’s, journaled on M
Hi all,
Are there any plans to support rbd journal feature in kernel krbd ?
Cheers /Maged
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Thu, Jul 6, 2017 at 9:18 AM, Gregory Farnum wrote:
> On Thu, Jul 6, 2017 at 7:04 AM wrote:
>
>> Hi Ceph Users,
>>
>>
>>
>> We plan to add 20 storage nodes to our existing cluster of 40 nodes, each
>> node has 36 x 5.458 TiB drives. We plan to add the storage such that all
>> new OSDs are prep
Hello,
We have a bucket that has 60 million + objects in it, and are trying to delete
it. To do so, we have tried doing:
radosgw-admin bucket list --bucket=
and then cycling through the list of object names and deleting them, 1,000 at a
time. However, after ~3-4k objects deleted, the list call
Thanks for your response David.
What you've described has been what I've been thinking about too. We have 1401
OSDs in the cluster currently and this output is from the tail end of the
backfill for +64 PG increase on the biggest pool.
The problem is we see this cluster do at most 20 backfills a
On Thu, Jul 6, 2017 at 11:46 AM, Piotr Dałek wrote:
> How about a hybrid solution? Keep the old rbd_aio_write contract (don't copy
> the buffer with the assumption that it won't change) and instead of
> constructing bufferlist containing bufferptr to copied data, construct a
> bufferlist containin
ceph pg dump | grep backfill
Look through the output of that command and see the acting (osds the pg is
on/moving off of) and current (where the pg will end up). All it takes is
a single osd being listed on a pg currently backfilling and any other PGs
it's listed on will be backfill+wait and have
Hey,
I have some SAS Micron S630DC-400 which came with firmware M013 which
did the same or worse (takes very long... 100% blocked for about 5min
for 16GB trimmed), and works just fine with firmware M017 (4s for 32GB
trimmed). So maybe you just need an update.
Peter
On 07/06/17 18:39, Reed Dier
Here's my possibly unique method... I had 3 nodes with 12 disks each,
and when adding 2 more nodes, I had issues with the common method you
describe, totally blocking clients for minutes, but this worked great
for me:
> my own method
> - osd max backfills = 1 and osd recovery max active = 1
> - cr
Hi Greg,
At the moment our cluster is all in balance. We have one failed drive
that will be replaced in a few days (the OSD has been removed from ceph
and will be re-added with the replacement drive). I'll document the
state of the PGs before the addition of the drive and during the
recover
I recommend you file a tracker issue at http://tracker.ceph.com/ with all
details( ceph version, steps you ran and output hiding out anything you
dont want to put), I doubt its a ceph-deploy issue
but we can try in our lab to replicate it.
On Thu, Jul 6, 2017 at 5:25 AM, Martin Emrich
wrote:
>
On Thu, Jul 6, 2017 at 3:25 PM, Piotr Dałek wrote:
> Is that deep copy an equivalent of what
> Jewel librbd did at unspecified point of time, or extra one?
It's equivalent / replacement -- not an additional copy. This was
changed to support scatter/gather IO API methods which the latest
version o
There are no immediate plans to support the RBD journaling in krbd.
The journaling feature requires a lot of code and, with limited
resources, the priority has been to provide alternative block device
options that pass-through to librbd for such use-cases and to optimize
the performance of librbd /
Hi,
We are running a Ceph cluster serving both batch workload (e.g. data import
/ export, offline processing) and latency-sensitive workload. Currently
batch traffic causes a huge slow down in serving latency-sensitive requests
(e.g. streaming). When that happens, network is not the bottleneck (50
I could easily see that being the case, especially with Micron as a common
thread, but it appears that I am on the latest FW for both the SATA and the
NVMe:
> $ sudo ./msecli -L | egrep 'Device|FW'
> Device Name : /dev/sda
> FW-Rev : D0MU027
> Device Name : /dev/s
Hello,
On Thu, 6 Jul 2017 17:57:06 + george.vasilaka...@stfc.ac.uk wrote:
> Thanks for your response David.
>
> What you've described has been what I've been thinking about too. We have
> 1401 OSDs in the cluster currently and this output is from the tail end of
> the backfill for +64 PG
How did you even get 60M objects into the bucket...?! The stuck requests
are only likely to be impacting the PG in which the bucket index is stored.
Hopefully you are not running other pools on those OSDs?
You'll need to upgrade to Jewel and gain the --bypass-gc radosgw-admin
flag, that speeds up
Hello,
On Thu, 6 Jul 2017 14:34:41 -0700 Su, Zhan wrote:
> Hi,
>
> We are running a Ceph cluster serving both batch workload (e.g. data import
> / export, offline processing) and latency-sensitive workload. Currently
> batch traffic causes a huge slow down in serving latency-sensitive requests
Hi all,
Are there any "official" plans to have Ceph events co-hosted with OpenStack
Summit Sydney, like in Boston?
The call for presentations closes in a week. The Forum will be organised
throughout September and (I think) that is the most likely place to have
e.g. Ceph ops sessions like we have
Oops, this time plain text...
On 7 July 2017 at 13:47, Blair Bethwaite wrote:
>
> Hi all,
>
> Are there any "official" plans to have Ceph events co-hosted with OpenStack
> Summit Sydney, like in Boston?
>
> The call for presentations closes in a week. The Forum will be organised
> throughout Se
On 17-07-06 09:39 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 3:25 PM, Piotr Dałek wrote:
Is that deep copy an equivalent of what
Jewel librbd did at unspecified point of time, or extra one?
It's equivalent / replacement -- not an additional copy. This was
changed to support scatter/gath
After looking into this further it seem's none of the :
ceph osd set-{full,nearfull,backfillfull}-ratio
Commands seem to be taking any effect on the cluster including the backfillfull
ratio, this command looks to have been added/changed since Jewel, and a
different way of setting the above. H
42 matches
Mail list logo