Re: [ceph-users] Typical 10GbE latency

2014-11-12 Thread Alexandre DERUMIER
>>Is this with a 8192 byte payload? Oh, sorry it was with 1500. I'll try to send a report with 8192 tomorrow. - Mail original - De: "Robert LeBlanc" À: "Alexandre DERUMIER" Cc: "Wido den Hollander" , ceph-users@lists.ceph.com Envoyé: Mardi 11 Novembre 2014 23:13:17 Objet: Re: [ceph

[ceph-users] Help regarding Installi​ng ceph on a single machine with cephdeploy on ubuntu 14.04 64 bit

2014-11-12 Thread tej ak
Hi, I am a starter to ceph and deperately trying to figure out how to install and deploy ceph on a single machine with ceph deploy. I have ubuntu 14.04 - 64 bit installed in a virtual machine (on windows 8.1 through VMware player) and have installed devstack on ubuntu. I am trying to install ceph

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-12 Thread Jasper Siero
Hello Greg, The specific PG was always deep scrubbing (ceph pg dump all showed the last deep scrub of this PG was in august) but now when I look at it again the deep scrub is finished en everything is healthy. Maybe it is solved because the mds is running fine now and it unlocked something. Th

Re: [ceph-users] Typical 10GbE latency

2014-11-12 Thread Wido den Hollander
(back to list) On 11/10/2014 06:57 PM, Gary M wrote: > Hi Wido, > > That is a bit weird.. I'd also check the Ethernet controller firmware > version and settings between the other configurations. There must be > something different. > Indeed, there must be something! But I can't figure it out ye

[ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread SCHAER Frederic
Hi, I'm used to RAID software giving me the failing disks slots, and most often blinking the disks on the disk bays. I recently installed a DELL "6GB HBA SAS" JBOD card, said to be an LSI 2008 one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) . Since this is an LSI, I tho

[ceph-users] The strategy of auto-restarting crashed OSD

2014-11-12 Thread David Z
Hi Guys, We are experiencing some OSD crashing issues recently, like messenger crash, some strange crash (still being investigating), etc. Those crashes seems not to reproduce after restarting OSD. So we are thinking about the strategy of auto-restarting crashed OSD for 1 or 2 times, then leav

Re: [ceph-users] v0.87 Giant released

2014-11-12 Thread debian Only
Dear expert could you help to provide some guide upgrade Ceph from firefly to giant ? many thanks ! 2014-10-30 15:37 GMT+07:00 Joao Eduardo Luis : > On 10/30/2014 05:54 AM, Sage Weil wrote: > >> On Thu, 30 Oct 2014, Nigel Williams wrote: >> >>> On 30/10/2014 8:56 AM, Sage Weil wrote: >>> *

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread JF Le Fillâtre
Hi, May or may not work depending on your JBOD and the way it's identified and set up by the LSI card and the kernel: cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier The weird path and the wildcards are due to the way the sysfs is set up. That works with a Dell R520, 6GB

[ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Pieter Koorts
Hi, A while back on a blog I saw mentioned that Ceph should not be run on compute nodes and in the general sense should be on dedicated hardware. Does this really still apply? An example, if you have nodes comprised of 16+ cores 256GB+ RAM Dual 10GBE Network 2+8 OSD (SSD log + HDD store) I unde

Re: [ceph-users] Stackforge Puppet Module

2014-11-12 Thread Nick Fisk
Hi David, Many thanks for your reply. I must admit I have only just started looking at puppet, but a lot of what you said makes sense to me and understand the reason for not having the module auto discover disks. I'm currently having a problem with the ceph::repo class when trying to push this o

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Mark Nelson
Technically there's no reason it shouldn't work, but it does complicate things. Probably the biggest worry would be that if something bad happens on the compute side (say it goes nuts with network or memory transfers) it could slow things down enough that OSDs start failing heartbeat checks ca

Re: [ceph-users] Stackforge Puppet Module

2014-11-12 Thread David Moreau Simard
What comes to mind is that you need to make sure that you've cloned the git repository to /etc/puppet/modules/ceph and not /etc/puppet/modules/puppet-ceph. Feel free to hop on IRC to discuss about puppet-ceph on freenode in #puppet-openstack. You can find me there as dmsimard. -- David Moreau S

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Haomai Wang
Actually, our production cluster(up to ten) all are that ceph-osd ran on compute-node(KVM). The primary action is that you need to constrain the cpu and memory. For example, you can alloc a ceph cpu-set and memory group, let ceph-osd run with it within limited cores and memory. The another risk i

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Andrey Korolyov
On Wed, Nov 12, 2014 at 5:30 PM, Haomai Wang wrote: > Actually, our production cluster(up to ten) all are that ceph-osd ran > on compute-node(KVM). > > The primary action is that you need to constrain the cpu and memory. > For example, you can alloc a ceph cpu-set and memory group, let > ceph-osd

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Pieter Koorts
Hi, Thanks for the replies. Likely will not choose this method but wanted to make sure that it was a good technical reason rather than just a "best practice". I did not quite think of "conntracker" at the time so this is a good one to consider. Thanks Pieter On 12 November 2014 14:30, Haomai Wa

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Robert van Leeuwen
> A while back on a blog I saw mentioned that Ceph should not be run on compute > nodes and in the general > sense should be on dedicated hardware. Does this really still apply? In my opinion storage needs to be rock-solid. Running other (complex) software on a Ceph node increases the chances of

Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread gaoxingxing
I think you may also consider risk like kernel crashes etc,since storage and compute node are sharing the same box. Date: Wed, 12 Nov 2014 14:51:47 + From: pieter.koo...@me.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph and Compute on same hardware? Hi, Thanks for the repli

[ceph-users] rados -p cache-flush-evict-all surprisingly slow

2014-11-12 Thread Martin Millnert
Dear Cephers, I have a lab setup with 6x dual-socket hosts, 48GB RAM, 2x10Gbps hosts, each equipped with 2x S3700 100GB SSDs and 4x 500GB HDD, where the HDDs are mapped in a tree under a 'platter' root tree similar to guidance from Seb at http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-

Re: [ceph-users] The strategy of auto-restarting crashed OSD

2014-11-12 Thread Adeel Nazir
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > David Z > Sent: Wednesday, November 12, 2014 8:16 AM > To: Ceph Community; Ceph-users > Subject: [ceph-users] The strategy of auto-restarting crashed OSD > > Hi Guys, > > We are experiencin

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-12 Thread Chad Seys
Would love to hear if you discover a way to get zapping incomplete PGs! Perhaps this is a common enough issue to open an issue? Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices

2014-11-12 Thread Christoph Adomeit
Hi, i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really happy with it. Linux and Windows VMs run really fast in KVM on the Ceph Storage. Only my Solaris 10 guests are terribly slow on ceph rbd storage. A solaris on Ceph Storage needs 15 Minutes to boot. When I move the S

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Erik Logtenberg
I have no experience with the DELL SAS controller, but usually the advantage of using a simple controller (instead of a RAID card) is that you can use full SMART directly. $ sudo smartctl -a /dev/sda === START OF INFORMATION SECTION === Device Model: INTEL SSDSA2BW300G3H Serial Number:PEP

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Scottix
I would say it depends on your system and where drives are connected to. Some HBA have a cli tool to manage the drives connected like a raid card would do. One other method I found is sometimes it will expose the leds for you http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an article o

Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress

2014-11-12 Thread Gregory Farnum
Yes, this is expected behavior. You're telling the OSD to scrub every PG it holds, and it is doing so. The list of PGs to scrub is getting reset each time, but none of the individual scrubs are getting restarted. (I believe that if you instruct a PG to scrub while it's already doing so, nothing hap

Re: [ceph-users] rados -p cache-flush-evict-all surprisingly slow

2014-11-12 Thread Gregory Farnum
My recollection is that the RADOS tool is issuing a special eviction command on every object in the cache tier using primitives we don't use elsewhere. Their existence is currently vestigial from our initial tiering work (rather than the present caching), but I have some hope we'll extend them agai

Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-12 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 2:32 PM, Christian Balzer wrote: > On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote: > >> On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer wrote: >> > >> > Hello, >> > >> > One of my clusters has become busy enough (I'm looking at you, evil >> > Window VMs that I

Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-12 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird wrote: > I'm having a problem with my cluster. It's running 0.87 right now, but I > saw the same behavior with 0.80.5 and 0.80.7. > > The problem is that my logs are filling up with "replacing existing (lossy) > channel" log lines (see below), to the p

Re: [ceph-users] Federated gateways

2014-11-12 Thread Aaron Bassett
In playing around with this a bit more, I noticed that the two users on the secondary node cant see each others buckets. Is this a problem? > On Nov 11, 2014, at 6:56 PM, Craig Lewis wrote: > >> I see you're running 0.80.5. Are you using Apache 2.4? There is a known >> issue with Apache 2.4 o

Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-12 Thread Scott Laird
Here are the first 33k lines or so: https://dl.dropboxusercontent.com/u/104949139/ceph-osd-log.txt This is a different (but more or less identical) machine from the past set of logs. This system doesn't have quite as many drives in it, so I couldn't spot a same-host error burst, but it's logging

Re: [ceph-users] Typical 10GbE latency

2014-11-12 Thread Udo Lembke
Hi Wido, On 12.11.2014 12:55, Wido den Hollander wrote: > (back to list) > > > Indeed, there must be something! But I can't figure it out yet. Same > controllers, tried the same OS, direct cables, but the latency is 40% > higher. > > perhaps something with pci-e order / interupts? have you checked

Re: [ceph-users] Federated gateways

2014-11-12 Thread Craig Lewis
http://tracker.ceph.com/issues/9206 My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html IIRC, the system uses didn't see the other user's bucket in a bucket listing, but they could read and write the objects fine. On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett wrote: >

[ceph-users] incorrect pool size, wrong ruleset?

2014-11-12 Thread houmles
Hi, I have 2 hosts with 8 2TB drive in each. I want to have 2 replicas between both hosts and then 2 replicas between osds on each host. That way even when I lost one host I still have 2 replicas. Currently I have this ruleset: rule repl { ruleset 5 type replicated min_s

[ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-12 Thread Harm Weites
Hi, When trying to add a new OSD to my cluster the ceph-osd process hangs: # ceph-osd -i $id --mkfs --mkkey At this point I have to explicitly kill -9 the ceph-osd since it doesn't respond to anything. It also didn't adhere to my foreground debug log request; the logs are empty. Stracing the ce

[ceph-users] Problem with radosgw-admin subuser rm

2014-11-12 Thread Seth Mason
Hi -- I'm trying to remove a subuser but it's not removing the S3 keys when I pass in --purge-keys. First I create a sub-user: $ radosgw-admin subuser create --uid=smason --subuser='smason:test' \ --access=full --key-type=s3 --gen-secret "subusers": [ { "id": "smason:test", "pe

Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-12 Thread Sage Weil
On Wed, 12 Nov 2014, Harm Weites wrote: > Hi, > > When trying to add a new OSD to my cluster the ceph-osd process hangs: > > # ceph-osd -i $id --mkfs --mkkey > > > At this point I have to explicitly kill -9 the ceph-osd since it doesn't > respond to anything. It also didn't adhere to my foregro