I think I figured out! All 4 of the OSDs on one host (OSD 107-110) were
sending massive amounts of auth requests to the monitors, seeming to
overwhelm them.
Weird bit is that I removed them (osd crush remove, auth del, osd rm), dd
the box and all of the disks, reinstalled and guess what? They are
Hello,
I am trying to launch a test cluster with 1 monitor and 1 osd on a node.
I created a cluster with the name msl-lab-dsg02 and tried to deploy an initial
monitor and I get this error:
root@msl-lab-dsg02:~/Downloads/cluster# ceph-deploy --overwrite-conf mon
create-initial
[ceph_deploy.conf][
Hi Noah,
It does look like the two things are unrelated. But you are right,
ceph-deploy stopped accepting that trailing hostname with the
"ceph-deploy mon create-initial" command with 1.5.26. It was never a
needed argument, and accepting it led to confusion. I tightened up
the argument parsing
Hi Bernhard,
Thanks for your email. systemd support for Ceph in general is still a
work in progress. It is actively being worked on, but the packages
hosted on ceph.com are still using sysvinit (for RPM systems), and
Upstart on Ubuntu. It is definitely a known issue.
Along those lines, ceph.co
It sounds slightly similar to what I just experienced.
I had one monitor out of three, which seemed to essentially run one core at
full tilt continuously, and had it's virtual address space allocated at the
point where top started calling it Tb. Requests hitting this monitor did
not get very timel
The leveldb is smallish: around 70mb.
I ran debug mon = 10 for a while, but couldn't find any interesting
information. I would run out of space quite quickly though as the log
partition only has 10g.
On 24 Jul 2015 21:13, "Mark Nelson" wrote:
> On 07/24/2015 02:31 PM, Luis Periquito wrote:
>
>>
On 07/24/2015 02:31 PM, Luis Periquito wrote:
Now it's official, I have a weird one!
Restarted one of the ceph-mons with jemalloc and it didn't make any
difference. It's still using a lot of cpu and still not freeing up memory...
The issue is that the cluster almost stops responding to request
No thanks at all.
I think about ZFS deduplication in a slightly different aspect of using
snapshots. We determined, that platter HDD work better with big object size.
But it cause big performance overhead with snapshots. For example, you have
32Mb block size. And you have image snapshot. If only
We use ZFS for other purposes and deduplication is overrated - it is quite
useful with big block sizes (and assuming your data don’t “shift” in the
blocks), but you can usually achieve much higher space savings with compression
- and it usually is faster, too :-) You need lots and lots of RAM fo
Hi! Did you try ZFS and deduplication mechanism? It could radically decrease
writes while COW.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Now it's official, I have a weird one!
Restarted one of the ceph-mons with jemalloc and it didn't make any
difference. It's still using a lot of cpu and still not freeing up memory...
The issue is that the cluster almost stops responding to requests, and if I
restart the primary mon (that had al
On 07/24/2015 03:29 PM, Ilya Dryomov wrote:
>
> ngx_write_fd() is just a write(), which, when interrupted by SIGALRM,
> fails with EINTR because SA_RESTART is not set. We can try digging
> further, but I think nginx should retry in this case.
Hello,
Culprit was the "timer_resolution 50ms;" sett
Hello,
If I understand correctly you want to look at how many “guest filesystem block
size” blocks there are that are empty?
This might not be that precise because we do not discard blocks inside the
guests, but if you tell me how to gather this - I can certainly try that. I’m
not sure if my ba
On Fri, Jul 24, 2015 at 11:55 PM, Jason Dillaman wrote:
>> Hi all,
>> I am looking for a way to alleviate the overhead of RBD snapshots/clones for
>> some time.
>>
>> In our scenario there are a few “master” volumes that contain production
>> data, and are frequently snapshotted and cloned for dev
Hi Somnath,
Do you have a link with the definitions of all the perf counters?
Thanks,
Steve
On Sun, Jul 5, 2015 at 11:23 AM, Somnath Roy wrote:
> Hi Ray,
>
> Here is the description of the different latencies under filestore perf
> counters.
>
>
>
> Journal_latency :
>
> --
Hi,
sorry for the late response, your message landed in the spam folder and I found
it just now.
# ceph mds dump
dumped mdsmap epoch 32
epoch 32
flags 0
created 2015-07-11 23:46:04.963071
modified2015-07-23 17:43:27.198951
tableserver 0
root0
session_timeout 60
session_auto
> Hi all,
> I am looking for a way to alleviate the overhead of RBD snapshots/clones for
> some time.
>
> In our scenario there are a few “master” volumes that contain production
> data, and are frequently snapshotted and cloned for dev/qa use. Those
> snapshots/clones live for a few days to a few
Sorry, autocorrect. Decompiled crush map.
Robert LeBlanc
Sent from a mobile device please excuse any typos.
On Jul 24, 2015 9:44 AM, "Robert LeBlanc" wrote:
> Please provide the recompiled crush map.
>
> Robert LeBlanc
>
> Sent from a mobile device please excuse any typos.
> On Jul 23, 2015 7:0
Hi,
Thanks.
I did not know about atop, nice tool... and I don't seem to be IRQ overloaded -
I can reach 100% cpu % for IRQs, but that's shared across all 8 physical cores.
I also discovered "turbostat" which showed me the R510s were not configured for
"performance" in the bios (but dbpm - demand
On Fri, Jul 24, 2015 at 4:29 PM, Ilya Dryomov wrote:
> On Fri, Jul 24, 2015 at 3:54 PM, Vedran Furač wrote:
>> On 07/24/2015 09:54 AM, Ilya Dryomov wrote:
>>>
>>> I don't know - looks like nginx isn't setting SA_RESTART, so it should
>>> be repeating the write()/writev() itself. That said, if it
On Fri, Jul 24, 2015 at 3:54 PM, Vedran Furač wrote:
> On 07/24/2015 09:54 AM, Ilya Dryomov wrote:
>>
>> I don't know - looks like nginx isn't setting SA_RESTART, so it should
>> be repeating the write()/writev() itself. That said, if it happens
>> only on cephfs, we need to track it down.
>
> Co
- Original Message -
> From: "Jan Schermer"
> To: "Samuel Taylor Liston"
> Cc: ceph-users@lists.ceph.com, "Wayne Betts"
> Sent: Thursday, July 23, 2015 9:43:30 AM
> Subject: Re: [ceph-users] el6 repo problem?
>
> The packages were probably rebuilt without changing their name/version (
On 07/24/2015 09:54 AM, Ilya Dryomov wrote:
>
> I don't know - looks like nginx isn't setting SA_RESTART, so it should
> be repeating the write()/writev() itself. That said, if it happens
> only on cephfs, we need to track it down.
Correct, this is first time I see such an error. I've never seen
Turns out that when we started the 3 OSDs it did “out” the rest on the same
host, so their reweight was 0.
Thus when I started the singular OSD on that host, it tried to put all the PGs
on the other OSDs onto this one (which failed for lack of disk space) and
because of that it also consumed muc
“Friday fun”… not!
We set mon_osd_down_out_subtree_limit=host some time ago. Now we needed to take
down all OSDs on one host and as expected nothing happened (noout was _not_
set). All the PGs showed as stuck degraded.
Then we took 3 OSDs on the host up and then down again because of slow reque
Hi,
I have a problem with ceph-deploy on Ubuntu 15.04
in the file
/usr/local/lib/python2.7/dist-packages/ceph_deploy/hosts/debian/__init__.py
def choose_init():
"""
Select a init system
Returns the name of a init system (upstart, sysvinit ...).
"""
if distro.lower() == 'ub
You don’t (shouldn’t) need to rebuild the binary to use jemalloc. It should be
possible to do something like
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
The last time we tried it segfaulted after a few minutes, so YMMV and be
careful.
Jan
> On 23 Jul 2015, at 18:18, Luis
On Thu, Jul 23, 2015 at 9:34 PM, Vedran Furač wrote:
> On 07/23/2015 06:47 PM, Ilya Dryomov wrote:
>>
>> To me this looks like a writev() interrupted by a SIGALRM. I think
>> nginx guys read your original email the same way I did, which is "write
>> syscall *returned* ERESTARTSYS", but I'm pretty
28 matches
Mail list logo