[ceph-users] How to monitor health and connectivity of OSD

2016-02-08 Thread Mariusz Gronczewski
016-02-08 03:39:28.311124) (turned out to be bad nic, fuck emulex) is there anything that could dump things like "failed heartbeats in last 10 minutes" or similiar stats ? -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 1

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Mariusz Gronczewski
ow the reason for so much RAM use is b/c of tcmalloc not freeing > unused memory. Right? note that I've only did it after most of pg were recovered > Here is a related "urgent" and "won't fix" bug to which applies > http://tracker.ceph.com/issues/12681

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Mariusz Gronczewski
t; > Mark > > > > On 09/09/2015 05:56 AM, Jan Schermer wrote: > >> Sorry if I wasn't clear. > >> Going from 2GB to 8GB is not normal, although some slight bloating is > >> expected. In your case it just got much worse than usual for reasons yet &

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
down will not > release the memory until you do "heap release". > > Jan > > > > On 09 Sep 2015, at 12:05, Mariusz Gronczewski > > wrote: > > > > On Tue, 08 Sep 2015 16:14:15 -0500, Chad William Seys > > wrote: > > > >>

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
profiling/ > > Shinobu > > - Original Message - > From: "Chad William Seys" > To: "Mariusz Gronczewski" , "Shinobu Kinjo" > , ceph-users@lists.ceph.com > Sent: Wednesday, September 9, 2015 6:14:15 AM > Subject: Re: Huge memory usage spike in

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
n state at the moment. But I didnt know that one, thanks. High memory usage stopped once cluster rebuilt, but I've planned cluster to have 2GB per OSD so I needed to add ram to even get to the point of ceph starting to rebuild, as some OSD ate up to 8 GBs during recover -- Mariusz Gronczewski,

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-08 Thread Mariusz Gronczewski
that will stop once recovery finishes. On Tue, 8 Sep 2015 12:31:03 +0200, Jan Schermer wrote: > YMMV, same story like SSD selection. > Intels have their own problems :-) > > Jan > > > On 08 Sep 2015, at 12:09, Mariusz Gronczewski > > wrote: > > > > For t

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-08 Thread Mariusz Gronczewski
traffic came from? > > > > Shinobu > > > > - Original Message - > > From: "Jan Schermer" > > To: "Mariusz Gronczewski" > > Cc: ceph-users@lists.ceph.com > > Sent: Monday, September 7, 2015 9:17:04 PM > > Subject: Re:

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
> > - Original Message - > From: "Mariusz Gronczewski" > To: "Shinobu Kinjo" > Cc: "Jan Schermer" , ceph-users@lists.ceph.com > Sent: Monday, September 7, 2015 10:19:23 PM > Subject: Re: [ceph-users] Huge memory usage spike in OSD on hammer/gian

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
yes On Mon, 7 Sep 2015 09:15:55 -0400 (EDT), Shinobu Kinjo wrote: > > master/slave > > Meaning that you are using bonding? > > - Original Message - > From: "Mariusz Gronczewski" > To: "Shinobu Kinjo" > Cc: "Jan Schermer" , ce

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
nope, master/slave, that's why on graph there is only traffic on eth2 On Mon, 7 Sep 2015 09:01:53 -0400 (EDT), Shinobu Kinjo wrote: > Are you using lacp in 10g interfaces? > > - Original Message - > From: "Mariusz Gronczewski" > To: "Shinobu Kinjo&

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
traffic was? > > > > Have you tried to capture that traffic between cluster and public network > > to see where such a bunch of traffic came from? > > > > Shinobu > > > > ----- Original Message - > > From: "Jan Schermer" > > To:

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
nd public network > to see where such a bunch of traffic came from? > > Shinobu > > - Original Message - > From: "Jan Schermer" > To: "Mariusz Gronczewski" > Cc: ceph-users@lists.ceph.com > Sent: Monday, September 7, 2015 9:17:04 PM >

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
ork traffic went up. > Nothing in logs on the mons which started 9/4 ~6 AM? > > Jan > > > On 07 Sep 2015, at 14:11, Mariusz Gronczewski > > wrote: > > > > On Mon, 7 Sep 2015 13:44:55 +0200, Jan Schermer wrote: > > > >> Maybe some configuration cha

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
occured on all OSDs, and it looked like that http://imgur.com/IIMIyRG sadly I was on vacation so I didnt manage to catch it before ;/ but I'm sure there was no config change > > On 07 Sep 2015, at 13:40, Mariusz Gronczewski > > wrote: > > > > On Mon, 7 Sep 2015 13:02:

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
better (~2GB > > per osd) but still much higher usage than before. > > > > any ideas what would be a reason for that ? logs are mostly full on > > OSDs trying to recover and timed out heartbeats > > > > -- > > Mariusz Gronczewski, Administrator &g

[ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-07 Thread Mariusz Gronczewski
osds down to unusabiltity. I then upgraded one of OSDs to hammer which made it a bit better (~2GB per osd) but still much higher usage than before. any ideas what would be a reason for that ? logs are mostly full on OSDs trying to recover and timed out heartbeats -- Mariusz Gronczewski

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Mariusz Gronczewski
mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/lis

Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-31 Thread Mariusz Gronczewski
6 will be supported to 2020, and centos 7 was released a year ago so I'd imagine a lot of people haven't migrated yet and migration process is nontrivial if you already did some modificiations to c6 (read: fix broken as fuck init scripts for few apps) -- Mariusz Gronczewski, Administra

Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-06 Thread Mariusz Gronczewski
___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > >

Re: [ceph-users] cephfs survey results

2014-11-04 Thread Mariusz Gronczewski
nch of relatively weak nodes, again, having active-active setup with MDS will be more interesting to him than someone that can just buy new fast machine for it. -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13

Re: [ceph-users] What a maximum theoretical and practical capacity in ceph cluster?

2014-10-28 Thread Mariusz Gronczewski
have Red Pro. which basically are reds but 7200 RPM and slightly less expensive than Re. We've been replacing our segate barracuda DM001 with these (dont get those segates, they are horrible) -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48

Re: [ceph-users] What a maximum theoretical and practical capacity in ceph cluster?

2014-10-28 Thread Mariusz Gronczewski
roblems with disk timeout (disks were shitty segates *DM001, no TLER) so it dropped whole drive from raid. And using MegaCli for everything is not exactly ergonomic. But yeah, 72 drives in 4U only makes sense if you use it for bulk storage -- Mariusz Gronczewski, Administrator Efigence S. A. u

Re: [ceph-users] journals relabeled by OS, symlinks broken

2014-10-27 Thread Mariusz Gronczewski
hout just removing all the OSDs and re-adding > > them? I thought about recreating the symlinks to point at the new SSD > > labels, but I figured I'd check here first. Thanks! > > > > -Steve > > > > -- > > Steve Anthony > > LTS HPC Support Specialis

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Mariusz Gronczewski
ph.com Usually removing OSD without removing host happens when you remove/replace dead drives. Hosts are in map so * CRUSH wont put 2 copies on same node * you can balance around network interface speed The question should be "why you remove all OSDs if you are going to remove host anyway&qu

Re: [ceph-users] OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

2014-09-12 Thread Mariusz Gronczewski
m: > Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 > (2014-07-13) x86_64 GNU/Linux > > Since this is happening on other Hardware as well, I don't think it's > Hardware related. I have no Idea if this is an OS issue (which would be > seriously str

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread Mariusz Gronczewski
host = cephosd03 > >>> devs = /dev/sdd1 > >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal > >>> > >>> [osd.20] > >>> host = cephosd03 > >>> devs = /dev/sdf1 > >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal > >

Re: [ceph-users] Ceph writes stall for long perioids with no disk/network activity

2014-08-07 Thread Mariusz Gronczewski
m X, Y" and except for time there is no other information to correlate that for example op on osd.1 waited for subop on osd.5 and that subop on osd.5 was slow because of y -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 2

Re: [ceph-users] Ceph writes stall for long perioids with no disk/network activity

2014-08-05 Thread Mariusz Gronczewski
On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson wrote: > On 08/04/2014 03:28 PM, Chris Kitzmiller wrote: > > On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote: > >> I got weird stalling during writes, sometimes I got same write speed > >> for few minutes an

[ceph-users] Ceph writes stall for long perioids with no disk/network activity

2014-08-01 Thread Mariusz Gronczewski
even after stopping bench it does not unlock, just hangs on HEALTH_WARN 16 requests are blocked > 32 sec; 2 osds have slow requests 16 ops are blocked > 524.288 sec 6 ops are blocked > 524.288 sec on osd.0 10 ops are blocked > 524.288 sec on osd.2 2 osds have slow requests -- Mariusz

Re: [ceph-users] Ceph networks, to bond or not to bond?

2014-06-06 Thread Mariusz Gronczewski
consists of two gigabit switches, capable > >> of LACP, but not stackable. For redundancy, I'd like to have my > >> links spread evenly over both switches. > >> > > >> > My question where I didn't find a conclusive answer

Re: [ceph-users] Ceph networks, to bond or not to bond?

2014-06-05 Thread Mariusz Gronczewski
ommend lecture of kernel docs ( linux-2.6/Documentation/networking/bonding.txt ), it have example multi-switch architecture (section 12) -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew..

Re: [ceph-users] Openstack Nova not removing RBD volumes after removing of instance

2014-04-04 Thread Mariusz Gronczewski
t; Regards > > Mark > > On 04/04/14 20:56, Mariusz Gronczewski wrote: > > Nope, one from RDO packages http://openstack.redhat.com/Main_Page > > > > On Thu, 3 Apr 2014 23:22:15 +0200, Sebastien Han > > wrote: > > > >> Are you running Havan

Re: [ceph-users] Openstack Nova not removing RBD volumes after removing of instance

2014-04-04 Thread Mariusz Gronczewski
> > "Always give 100%. Unless you're giving blood.” > > Phone: +33 (0)1 49 70 99 72 > Mail: sebastien@enovance.com > Address : 11 bis, rue Roquépine - 75008 Paris > Web : www.enovance.com - Twitter : @enovance > > On 03 Apr 2014, at 13:24, Mariusz Gro

[ceph-users] Openstack Nova not removing RBD volumes after removing of instance

2014-04-03 Thread Mariusz Gronczewski
-60a3-48ef-ba40-dfb5946a6a1d volume-ecf26742-e79e-4d7a-b8a4-9b4dc85dd41f Is that something specific to RBD backend or it's just nova not deleting volumes after instance deletion ? -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F:

Re: [ceph-users] Fluctuating I/O speed degrading over time

2014-03-10 Thread Mariusz Gronczewski
ctly good data, but latency occasionaly spiked up to seconds, making whole server lag (it was RAID6 on backup server) -- Mariusz Gronczewski, Administrator efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.groncze

Re: [ceph-users] Dell H310

2014-03-10 Thread Mariusz Gronczewski
g" i mean completely disconnecting it from OS, which was problematic. IT mode benefit is that all IO errors are send to OS "as is" so whatever you use can handle errors. In case of Linux software raid it can remap some sectors before completely failing and at least lets you get

Re: [ceph-users] Fluctuating I/O speed degrading over time

2014-03-07 Thread Mariusz Gronczewski
l problems, for example we found failing-but-not-yet-dead disks that sorta kinda worked but their latency was 10x higher than all other disks in machine. Mariusz Gronczewski, Administrator efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: m

Re: [ceph-users] Dell H310

2014-03-07 Thread Mariusz Gronczewski
s that. > You can flash it with any firmware, our IBM controller uses firmware from supermicro (because IBM doesn't provide IT FW for it) I wouldn't use it for bunch of SSDs but for spinners performance is fine -- Mariusz Gronczewski, Administrator efigence S. A. ul. Wołosk

Re: [ceph-users] Impact of disk encryption and recommendations?

2014-03-03 Thread Mariusz Gronczewski
7 MiB/s aes-xts 512b 1406,2 MiB/s 1411,7 MiB/s serpent-xts 512b 313,7 MiB/s 295,6 MiB/s twofish-xts 512b 347,8 MiB/s 350,2 MiB/s so if you have a lot of OSDs per machine and do a lot of sequential IO you might hit into CPU wall in performance -- Mariusz Gronczewski, Admin

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Mariusz Gronczewski
hat out of 600 disks happened to be on those 3) of data, not everything that is on given array. And again, that's only in case those disks die exactly at same moment, with no time to recovery. even 60 min between failures will let most of the data replicate. And in worst case, there is always da

Re: [ceph-users] ceph-deploy issues on RHEL6.4

2013-09-27 Thread Mariusz Gronczewski
Dnia 2013-09-27, o godz. 15:30:21 Guang napisał(a): > Hi ceph-users, > I recently deployed a ceph cluster with use of *ceph-deploy* utility, > on RHEL6.4, during the time, I came across a couple of issues / > questions which I would like to ask for your help. > > 1. ceph-deploy does not help to

[ceph-users] ceph freezes for 10+ seconds during benchmark

2013-09-02 Thread Mariusz Gronczewski
h osd.0 but there doesnt seem to be anything wrong with machine itself (bonnie++ and dd on machine does not show up any lockups) -- Mariusz Gronczewski, Administrator Efigence Sp. z o. o. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.groncze