Hi,
if you use influxdb / telegraf,
a new input plugin for ceph metrics has been pushed in telegraf git
https://github.com/influxdata/telegraf/pull/1172/commits/d9c8e11a078158a7f6bc7284abd5f453c2f0308a
I'm going to do some nice grafana dashboard
Regards,
Alexandre
_
Hi All,
my cluster is Jewel Ceph.It is often that something wrong goes up when
mount cephfs, but no error message can be found on logs
the following are some infos:
[root@ceph2 ~]# mount /mnt/cephfs_stor/
mount error 5 = Input/output error
[root@ceph2 ~]# cat /etc/fstab | grep 6789
ceph2:6789:
I've upgraded our test cluster from 9.2.1 to 10.2.1 and I still had
these issues. As before the script did fix the issue and the cluster
is now working.
Is the correct fix in 10.2.1 or was it still expected to run the fix?
If it makes a difference I'm running trusty, the cluster was created
on ha
Hello,
On 05/19/2016 03:36 AM, Christian Balzer wrote:
>
> Hello again,
>
> On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:
>
>> Hello Christian,
>>
>>> Hello,
>>>
>>> On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
>>>
Dear Ceph users,
I've a question regarding
Hi
I am trying to install ceph with the ceph ansible role:
https://github.com/shieldwed/ceph-ansible.
I had to fix some ansible tasks to work correctly with ansible 2.0.2.0 but now
it seems to work quite well.
Sadly I have now come across a bug, I cannot solve myself:
When ansible is starting
Hi Ken,
dd is ok, but you should consider the fact that dd is a squence of writing.
So if you have random writes in your later productive usage, then this
test is basically only good to meassure the maximum squential write
performance in idle state.
And 250 MB for 200 HDD's is quiet evil bad as
Oliver,
Thanks for the info.
We then run sysbench for random IO testing, the result is even worse
(757 KB/s).
each object has 3 replicas.
Both networks are 10Gbps, I don't think there are issues with network.
Maybe lacking of SSD cache, and miscorrect configure to the cluster are
the reason.
Hi Ken,
wow thats quiet worst. That means you can not use this cluster like that.
How does your ceph.conf look like ?
How looks ceph -s ?
--
Mit freundlichen Gruessen / Best regards
Oliver Dzombic
IP-Interactive
mailto:i...@ip-interactive.de
Anschrift:
IP Interactive UG ( haftungsbeschrae
Hi,
a sparedisk is a nice idea.
But i think thats something you can also do with a shellscript.
Checking if an osd is down or out and just using your spare disk.
Maybe the programming ressources should not be used for something most
of us can do with a simple shell script checking every 5 secon
Hello,
On Thu, 19 May 2016 10:51:20 +0200 Dietmar Rieder wrote:
> Hello,
>
> On 05/19/2016 03:36 AM, Christian Balzer wrote:
> >
> > Hello again,
> >
> > On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:
> >
> >> Hello Christian,
> >>
> >>> Hello,
> >>>
> >>> On Wed, 18 May 2016 13:57
FWIW, we ran tests back in the dumpling era that more or less showed the
same thing. Increasing the merge/split thresholds does help. We
suspect it's primarily due to the PG splitting being spread out over a
longer period of time so the effect lessens. We're looking at some
options to introd
Hello.
I'm curious how to get maximum performance without loosing significant
space. OSD+its journal on SSD is good solution? Or using separate SSD
for journal for the few others SSD-based OSD is better?
Thanks.
___
ceph-users mailing list
ceph-user
Hi Somnath,
Somnath Roy writes:
> FileStore doesn't subscribe for any such event from the device. Presently, it
> is relying on filesystem (for the FileStore assert) to return back error
> during IO and based on the error it is giving an assert.
> FileJournal assert you are getting in the aio
On Thu, May 19, 2016 at 3:35 PM, 易明 wrote:
> Hi All,
>
> my cluster is Jewel Ceph.It is often that something wrong goes up when mount
> cephfs, but no error message can be found on logs
>
>
> the following are some infos:
> [root@ceph2 ~]# mount /mnt/cephfs_stor/
> mount error 5 = Input/output err
Hi Christian,
Thanks for your insights. To answer your question the NVMe devices
appear to be some variety of Samsung:
Model: Dell Express Flash NVMe 400GB
Manufacturer: SAMSUNG
Product ID: a820
regards,
Ben
On Wed, May 18, 2016 at 10:01 PM, Christian Balzer wrote:
>
> Hello,
>
> On Wed, 18 M
As of today, neither the rbd CLI nor librbd imposes any limit on the
maximum length of an RBD image name, whereas krbd has roughly a 100
character limit and the OSDs have a default object name limit of roughly
2000 characters. While there is a patch under review to increase the krbd
limit, it would
Hi,
We want to enable the hammer rbd features on newly created Cinder
volumes [1], but we still have a few VMs running with super old librbd
running (dumpling).
Perhaps its academic, but does anyone know the expected behaviour if
an old dumpling-linked qemu-kvm tries to attach an rbd with
exclusi
Hello,
We recently had a outage on our Ceph storage cluster caused by what I believe
to be a bug in Ceph. At the time of the incident all MONs, OSDs and clients
(except for one) were running Ceph Hammer 0.94.6.
To start describing the incident I will portray a hierarchy of rbd
volumes/snaptsho
On Thu, May 19, 2016 at 12:15 PM, Dan van der Ster wrote:
> I hope it will just refuse to
> attach, rather than attach but allow bad stuff to happen.
You are correct -- older librbd/krbd clients will refuse to open
images that have unsupported features enabled.
--
Jason
Restart osd.1 with debugging enabled
debug osd = 20
debug filestore = 20
debug ms = 1
Then, run list_unfound once the pg is back in active+recovering. If
it still hangs, post osd.1's log to the list along with the output of
ceph osd dump and ceph pg dump.
-Sam
On Wed, May 18, 2016 at 6:20 PM, D
Hello All.
I am newbie in ceph. and i use jewel release for testing purpose. it
seems every thing is OK, HEALTH_OK , all of OSDs are in UP and IN state.
I create some RBD images (rbd create ) and map to some ubuntu
host .
I can read and write data to my volume , but when i delete some content
That is normal behavior. Ceph has no understanding of the filesystem living on
top of the RBD, so it doesn’t know when space is freed up. If you are running
a sufficiently current kernel, you can use fstrim to cause the kernel to tell
Ceph what blocks are free. More details here:
http://www
Hi Albert,
to free unused space you must enable trim (or do an fstrim) in the vm -
and all things in the storage chain must support this.
The normal virtio-driver don't support trim, but if you use scsi-disks
with virtio-scsi-driver you can use it.
Work well but need some time for huge filesystems.
Thank you for your great support .
Best Regards
Albert
On Thu, May 19, 2016 at 10:41 PM, Udo Lembke wrote:
> Hi Albert,
> to free unused space you must enable trim (or do an fstrim) in the vm -
> and all things in the storage chain must support this.
> The normal virtio-driver don't support tri
You can also mount the rbd with the discard option. It works the same way as
you would mount an ssd to free up the space when you delete things. I use the
discard option on my ext4 rbds on Ubuntu and it frees up the used Ceph space
immediately.
Sent from my iPhone
On May 19, 2016, at 12:30 PM,
Hello,
On Fri, 20 May 2016 00:11:02 + David Turner wrote:
> You can also mount the rbd with the discard option. It works the same
> way as you would mount an ssd to free up the space when you delete
> things. I use the discard option on my ext4 rbds on Ubuntu and it frees
> up the used Ceph
* We are trying to assess if we are going to see a data loss if an SSD that
is hosting journals for few OSDs crashes. In our configuration, each SSD is
partitioned into 5 chunks and each chunk is mapped as a journal drive for one
OSD. What I understand from the Ceph documentation: "Consisten
Hello,
first of all, wall of text. Don't do that.
Use returns and paragraphs liberally to make reading easy.
I'm betting at least half of the people who could have answered you
question took a look at this blob of text and ignored it.
Secondly, search engines are your friend.
The first hit when
Hi,
Yes and no, for the actual data loss. This depends on your crush map.
If you're using the original map (which came with the installation),
then your smallest failure domain will be the host. If you have replica
size and 3 hosts and 5 OSDs per host (15 OSDs total), then loosing the
journ
Hello,
On Thu, 19 May 2016 13:26:33 +0200 Oliver Dzombic wrote:
> Hi,
>
> a sparedisk is a nice idea.
>
> But i think thats something you can also do with a shellscript.
>
Definitely, but you're then going to have a very likely possibility of
getting in conflict with your MONs and what they
Hello,
On Fri, 20 May 2016 03:44:52 + EP Komarla wrote:
> Thanks Christian. Point noted. Going forward I will write text to make
> it easy to read.
>
> Thanks for your response. Losing a journal drive seems expensive as I
> will have to rebuild 5 OSDs in this eventuality.
>
Potentially,
31 matches
Mail list logo