Den tors 20 dec. 2018 kl 22:45 skrev Vladimir Brik
:
> Hello
> I am considering using logical volumes of an NVMe drive as DB or WAL
> devices for OSDs on spinning disks.
> The documentation recommends against DB devices smaller than 4% of slow
> disk size. Our servers have 16x 10TB HDDs and a singl
I am considering using logical volumes of an NVMe drive as DB or WAL
devices for OSDs on spinning disks.
The documentation recommends against DB devices smaller than 4% of slow
disk size. Our servers have 16x 10TB HDDs and a single 1.5TB NVMe, so
dividing it equally will result in each OSD gettin
I'm in a similar situation, currently running filestore with spinners and
journals on NVME partitions which are about 1% of the size of the OSD. If I
migrate to bluestore, I'll still only have that 1% available. Per the docs,
if my block.db device fills up, the metadata is going to spill back onto
Hi Frank,
I encounter exactly the same issue with the same disks than yours. Every
day, after a batch of deep scrubbing operation, ther are generally
between 1 and 3 inconsistent pgs, and that, on different OSDs.
It could confirm a problem on these disks, but :
- it concerns only the pgs of
Hello,
I'm doing benchmarks for metadata operations on CephFS, HDFS, and HopsFS on
Google Cloud. In my current setup, i'm using 32 vCPU machines with 29 GB
memory, and i have 1 MDS, 1 MON and 3 OSDs. The MDS and the MON nodes are
co-located on one vm, while each of the OSDs is on a separate vm wit
Hi,
same here but also for pgs in cephfs pools.
As far as I know there is a known bug that under memory pressure some reads
return zero
and this will lead to the error message.
I have set nodeep-scrub and i am waiting for 12.2.11.
Thanks
Christoph
On Fri, Dec 21, 2018 at 03:23:21PM +0100, H
Christop, do you have any links to the bug?
On Fri, Dec 21, 2018 at 11:07 AM Christoph Adomeit <
christoph.adom...@gatworks.de> wrote:
> Hi,
>
> same here but also for pgs in cephfs pools.
>
> As far as I know there is a known bug that under memory pressure some
> reads return zero
> and this wil
Hi Cary,
I ran across your email on the ceph-users mailing list 'Signature check
failures.'.
I've just run across the same issue on my end. Also Gentoo user here.
Running Ceph 12.2.5... 32bit/armhf and 64bit/x64_64.
Was your environment mixed or strictly just x86_64?
What is interestin
Hi,
We have Ceph clusters which are greater than 1PB. We are using tree
algorithm. The issue is with the data placement. If the cluster utilization
percentage is at 65% then some of the OSDs are already above 87%. We had to
change the near_full ratio to 0.90 to circumvent warnings and to get back
Hi,
If you are running Ceph Luminous or later, use the Ceph Manager Daemon's
Balancer module. (http://docs.ceph.com/docs/luminous/mgr/balancer/).
Otherwise, tweak the OSD weights (not the OSD CRUSH weights) until you
achieve uniformity. (You should be able to get under 1 STDDEV). I
would adj
Thank You Dwyeni for the quick response. We have 2 Hammer which are due for
upgrade to Luminous next month and 1 Luminous 12.2.8. Will try this on
Luminous and if it works then will apply the same once the Hammer clusters
are upgraded rather than adjusting the weights.
Thanks,
Pardhiv Karri
On Fr
Hi,
We have a luminous cluster which was upgraded from Hammer --> Jewel -->
Luminous 12.2.8 recently. Post upgrade we are seeing issue with a few nodes
where they are running out of memory and dying. In the logs we are seeing
OOM killer. We don't have this issue before upgrade. The only difference
Hi,
You could be running out of memory due to the default Bluestore cache
sizes.
How many disks/OSDs in the R730xd versus the R740xd? How much memory in
each server type? How many are HDD versus SSD? Are you running
Bluestore?
OSD's in Luminous, which run Bluestore, allocate memory to use
> It'll cause problems if yours the only one NVMe drive will die - you'll lost
> all the DB partitions and all the OSDs are going to be failed
The severity of this depends a lot on the size of the cluster. If there are
only, say, 4 nodes total, for sure the loss of a quarter of the OSDs will b
Thank You for the quick response Dyweni!
We are using FileStore as this cluster is upgraded from
Hammer-->Jewel-->Luminous 12.2.8. 16x2TB HDD per node for all nodes. R730xd
has 128GB and R740xd has 96GB of RAM. Everything else is the same.
Thanks,
Pardhiv Karri
On Fri, Dec 21, 2018 at 1:43 PM Dy
Can you provide the complete OOM message from the dmesg log?
On Sat, Dec 22, 2018 at 7:53 AM Pardhiv Karri wrote:
>
>
> Thank You for the quick response Dyweni!
>
> We are using FileStore as this cluster is upgraded from
> Hammer-->Jewel-->Luminous 12.2.8. 16x2TB HDD per node for all nodes. R730
On Fri, Dec 14, 2018 at 6:44 PM Bryan Henderson
wrote:
> > Going back through the logs though it looks like the main reason we do a
> > 4MiB block size is so that we have a chance of reporting actual cluster
> > sizes to 32-bit systems,
>
> I believe you're talking about a different block size (t
I was informed today that the CEPH environment I’ve been working on is no
longer available. Unfortunately this happened before I could try any of your
suggestions, Roman.
Thank you for all the attention and advice.
--
Michael Green
> On Dec 20, 2018, at 08:21, Roman Penyaev wrote:
>
>> On
18 matches
Mail list logo