016-02-08 03:39:28.311124)
(turned out to be bad nic, fuck emulex)
is there anything that could dump things like "failed heartbeats in
last 10 minutes" or similiar stats ?
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 1
ow the reason for so much RAM use is b/c of tcmalloc not freeing
> unused memory. Right?
note that I've only did it after most of pg were recovered
> Here is a related "urgent" and "won't fix" bug to which applies
> http://tracker.ceph.com/issues/12681
t; > Mark
> >
> > On 09/09/2015 05:56 AM, Jan Schermer wrote:
> >> Sorry if I wasn't clear.
> >> Going from 2GB to 8GB is not normal, although some slight bloating is
> >> expected. In your case it just got much worse than usual for reasons yet
&
down will not
> release the memory until you do "heap release".
>
> Jan
>
>
> > On 09 Sep 2015, at 12:05, Mariusz Gronczewski
> > wrote:
> >
> > On Tue, 08 Sep 2015 16:14:15 -0500, Chad William Seys
> > wrote:
> >
> >>
profiling/
>
> Shinobu
>
> - Original Message -
> From: "Chad William Seys"
> To: "Mariusz Gronczewski" , "Shinobu Kinjo"
> , ceph-users@lists.ceph.com
> Sent: Wednesday, September 9, 2015 6:14:15 AM
> Subject: Re: Huge memory usage spike in
n state at the moment. But I
didnt know that one, thanks.
High memory usage stopped once cluster rebuilt, but I've planned
cluster to have 2GB per OSD so I needed to add ram to even get to the
point of ceph starting to rebuild, as some OSD ate up to 8 GBs during
recover
--
Mariusz Gronczewski,
that will stop once recovery finishes.
On Tue, 8 Sep 2015 12:31:03 +0200, Jan Schermer
wrote:
> YMMV, same story like SSD selection.
> Intels have their own problems :-)
>
> Jan
>
> > On 08 Sep 2015, at 12:09, Mariusz Gronczewski
> > wrote:
> >
> > For t
traffic came from?
> >
> > Shinobu
> >
> > - Original Message -
> > From: "Jan Schermer"
> > To: "Mariusz Gronczewski"
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, September 7, 2015 9:17:04 PM
> > Subject: Re:
>
> - Original Message -
> From: "Mariusz Gronczewski"
> To: "Shinobu Kinjo"
> Cc: "Jan Schermer" , ceph-users@lists.ceph.com
> Sent: Monday, September 7, 2015 10:19:23 PM
> Subject: Re: [ceph-users] Huge memory usage spike in OSD on hammer/gian
yes
On Mon, 7 Sep 2015 09:15:55 -0400 (EDT), Shinobu Kinjo
wrote:
> > master/slave
>
> Meaning that you are using bonding?
>
> - Original Message -
> From: "Mariusz Gronczewski"
> To: "Shinobu Kinjo"
> Cc: "Jan Schermer" , ce
nope, master/slave, that's why on graph there is only traffic on eth2
On Mon, 7 Sep 2015 09:01:53 -0400 (EDT), Shinobu Kinjo
wrote:
> Are you using lacp in 10g interfaces?
>
> - Original Message -
> From: "Mariusz Gronczewski"
> To: "Shinobu Kinjo&
traffic was?
> >
> > Have you tried to capture that traffic between cluster and public network
> > to see where such a bunch of traffic came from?
> >
> > Shinobu
> >
> > ----- Original Message -
> > From: "Jan Schermer"
> > To:
nd public network
> to see where such a bunch of traffic came from?
>
> Shinobu
>
> - Original Message -
> From: "Jan Schermer"
> To: "Mariusz Gronczewski"
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, September 7, 2015 9:17:04 PM
>
ork traffic went up.
> Nothing in logs on the mons which started 9/4 ~6 AM?
>
> Jan
>
> > On 07 Sep 2015, at 14:11, Mariusz Gronczewski
> > wrote:
> >
> > On Mon, 7 Sep 2015 13:44:55 +0200, Jan Schermer wrote:
> >
> >> Maybe some configuration cha
occured on all OSDs, and it looked like that
http://imgur.com/IIMIyRG
sadly I was on vacation so I didnt manage to catch it before ;/ but I'm
sure there was no config change
> > On 07 Sep 2015, at 13:40, Mariusz Gronczewski
> > wrote:
> >
> > On Mon, 7 Sep 2015 13:02:
better (~2GB
> > per osd) but still much higher usage than before.
> >
> > any ideas what would be a reason for that ? logs are mostly full on
> > OSDs trying to recover and timed out heartbeats
> >
> > --
> > Mariusz Gronczewski, Administrator
&g
osds down to unusabiltity.
I then upgraded one of OSDs to hammer which made it a bit better (~2GB
per osd) but still much higher usage than before.
any ideas what would be a reason for that ? logs are mostly full on
OSDs trying to recover and timed out heartbeats
--
Mariusz Gronczewski
mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/lis
6 will be supported to 2020, and centos 7 was released a
year ago so I'd imagine a lot of people haven't migrated yet and
migration process is nontrivial if you already did some modificiations
to c6 (read: fix broken as fuck init scripts for few apps)
--
Mariusz Gronczewski, Administra
___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> >
nch of relatively weak nodes,
again, having active-active setup with MDS will be more interesting to
him than someone that can just buy new fast machine for it.
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13
have Red Pro. which basically are reds but 7200 RPM and
slightly less expensive than Re. We've been replacing our segate
barracuda DM001 with these (dont get those segates, they are
horrible)
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48
roblems with disk timeout (disks were shitty segates *DM001, no
TLER) so it dropped whole drive from raid. And using MegaCli for
everything is not exactly ergonomic.
But yeah, 72 drives in 4U only makes sense if you use it for bulk
storage
--
Mariusz Gronczewski, Administrator
Efigence S. A.
u
hout just removing all the OSDs and re-adding
> > them? I thought about recreating the symlinks to point at the new SSD
> > labels, but I figured I'd check here first. Thanks!
> >
> > -Steve
> >
> > --
> > Steve Anthony
> > LTS HPC Support Specialis
ph.com
Usually removing OSD without removing host happens when you
remove/replace dead drives.
Hosts are in map so
* CRUSH wont put 2 copies on same node
* you can balance around network interface speed
The question should be "why you remove all OSDs if you are going to
remove host anyway&qu
m:
> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1
> (2014-07-13) x86_64 GNU/Linux
>
> Since this is happening on other Hardware as well, I don't think it's
> Hardware related. I have no Idea if this is an OS issue (which would be
> seriously str
host = cephosd03
> >>> devs = /dev/sdd1
> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
> >>>
> >>> [osd.20]
> >>> host = cephosd03
> >>> devs = /dev/sdf1
> >>> osd_journal = /mnt/ramdisk/$cluster-$id-journal
> >
m X, Y" and except for time there is no other
information to correlate that for example op on osd.1 waited for subop on osd.5
and that subop on osd.5 was slow because of y
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 2
On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson
wrote:
> On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
> > On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
> >> I got weird stalling during writes, sometimes I got same write speed
> >> for few minutes an
even after
stopping bench it does not unlock, just hangs on
HEALTH_WARN 16 requests are blocked > 32 sec; 2 osds have slow requests
16 ops are blocked > 524.288 sec
6 ops are blocked > 524.288 sec on osd.0
10 ops are blocked > 524.288 sec on osd.2
2 osds have slow requests
--
Mariusz
consists of two gigabit switches, capable
> >> of LACP, but not stackable. For redundancy, I'd like to have my
> >> links spread evenly over both switches.
> >> >
> >> > My question where I didn't find a conclusive answer
ommend lecture of kernel docs
( linux-2.6/Documentation/networking/bonding.txt ), it have example
multi-switch architecture (section 12)
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew..
t; Regards
>
> Mark
>
> On 04/04/14 20:56, Mariusz Gronczewski wrote:
> > Nope, one from RDO packages http://openstack.redhat.com/Main_Page
> >
> > On Thu, 3 Apr 2014 23:22:15 +0200, Sebastien Han
> > wrote:
> >
> >> Are you running Havan
>
> "Always give 100%. Unless you're giving blood.”
>
> Phone: +33 (0)1 49 70 99 72
> Mail: sebastien@enovance.com
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance
>
> On 03 Apr 2014, at 13:24, Mariusz Gro
-60a3-48ef-ba40-dfb5946a6a1d
volume-ecf26742-e79e-4d7a-b8a4-9b4dc85dd41f
Is that something specific to RBD backend or it's just nova not deleting
volumes after instance deletion ?
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F:
ctly good data, but latency occasionaly
spiked up to seconds, making whole server lag (it was RAID6 on backup
server)
--
Mariusz Gronczewski, Administrator
efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.groncze
g" i mean completely disconnecting it from OS, which was
problematic.
IT mode benefit is that all IO errors are send to OS "as is" so whatever you
use can handle errors. In case of Linux software raid it can remap some sectors
before completely failing and at least lets you get
l problems, for example we found
failing-but-not-yet-dead disks that sorta kinda worked but their latency was
10x
higher than all other disks in machine.
Mariusz Gronczewski, Administrator
efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: m
s that.
>
You can flash it with any firmware, our IBM controller uses firmware from
supermicro (because IBM doesn't provide IT FW for it)
I wouldn't use it for bunch of SSDs but for spinners performance is fine
--
Mariusz Gronczewski, Administrator
efigence S. A.
ul. Wołosk
7 MiB/s
aes-xts 512b 1406,2 MiB/s 1411,7 MiB/s
serpent-xts 512b 313,7 MiB/s 295,6 MiB/s
twofish-xts 512b 347,8 MiB/s 350,2 MiB/s
so if you have a lot of OSDs per machine and do a lot of sequential IO
you might hit into CPU wall in performance
--
Mariusz Gronczewski, Admin
hat out of 600 disks happened to be on those 3) of data, not
everything that is on given array.
And again, that's only in case those disks die exactly at same moment,
with no time to recovery. even 60 min between failures will let most of
the data replicate. And in worst case, there is always da
Dnia 2013-09-27, o godz. 15:30:21
Guang napisał(a):
> Hi ceph-users,
> I recently deployed a ceph cluster with use of *ceph-deploy* utility,
> on RHEL6.4, during the time, I came across a couple of issues /
> questions which I would like to ask for your help.
>
> 1. ceph-deploy does not help to
h osd.0 but there doesnt seem to be anything wrong with
machine itself (bonnie++ and dd on machine does not show up any lockups)
--
Mariusz Gronczewski, Administrator
Efigence Sp. z o. o.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.groncze
43 matches
Mail list logo