Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

2014-11-13 Thread Joshua McClintock
AH! Sorry for the false alarm, I clearly have a hard drive problem ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:2

Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

2014-11-13 Thread Sage Weil
Hmm, looks like leveldb is hitting a problem. Is there anything in the kernel log (dmesg) that suggests a disk or file system problem? Are you able to, say, tar up the current/omap directory without problems? This is a single OSD, right? None of the others have been upgraded yet? sage On T

Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

2014-11-13 Thread Joshua McClintock
[root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current 0.10_head 0.1a_head 0.23_head 0.2c_head 0.37_head 0.3_head 0.b_head 1.10_head 1.1b_head 1.24_head 1.2c_head 1.3a_head 1.b_head 2.16_head 2.1_head 2.2a_head 2.32_head 2.3a_head 2.a_head omap 0.11_head 0.1d_head

Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?

2014-11-13 Thread Gregory Farnum
On Thu, Nov 13, 2014 at 3:11 PM, Anthony Alba wrote: > Thanks! What happens when the lone rule fails? Is there a fallback > rule that will place the blob in a random PG? Say I misconfigure, and > my choose/chooseleaf don't add up to pool min size. There's no built-in fallback rule or anything li

Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

2014-11-13 Thread Sage Weil
On Thu, 13 Nov 2014, Joshua McClintock wrote: > I upgraded my mons to the latest version and they appear to work, I then > upgraded my mds and it seems fine.   > I then upgraded one OSD node and the OSD fails to start with the following > dump, any help is appreciated: > > --- begin dump of recent

Re: [ceph-users] calamari build failure

2014-11-13 Thread idzzy
Hi, Sure, Thanks. As described in the link, Is the only way to avoid this issue downgrade to 2014.1.10? vagrant@precise64:~$ dpkg -l | grep salt ii  salt-common    2014.1.13-1precise1   Shared libraries that salt requires for all packages ii  salt-minion       2014.1.13-1precise1   This packag

[ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

2014-11-13 Thread Joshua McClintock
I upgraded my mons to the latest version and they appear to work, I then upgraded my mds and it seems fine. I then upgraded one OSD node and the OSD fails to start with the following dump, any help is appreciated: --- begin dump of recent events --- 0> 2014-11-13 18:20:15.625793 7fbd973ce7a

Re: [ceph-users] calamari build failure

2014-11-13 Thread Mark Loza
Hi, Currently, there's a bug in that salt version for ubuntu precise. See https://github.com/saltstack/salt/issues/17227 On 11/14/14 10:07 AM, idzzy wrote: Hi, vagrant@precise64:/git$ salt-call --version salt-call 2014.1.13 (Hydrogen) As of now I can see the calamari-client pkg in vagrant

Re: [ceph-users] calamari build failure

2014-11-13 Thread idzzy
Hi, vagrant@precise64:/git$ salt-call --version salt-call 2014.1.13 (Hydrogen) As of now I can see the calamari-client pkg in vagrant:/git directory. Does this mean the building pkg success?  But what was the error message which I sent in previous mail? vagrant@precise64:/git$ ls -l total 3340

Re: [ceph-users] calamari build failure

2014-11-13 Thread Mark Loza
Hi, Which version are you currently running? |# salt-call --version| On 11/14/14 9:34 AM, idzzy wrote: Hello, I'm trying to setup calamari with reference to http://ceph.com/category/ceph-gui/. I could create package of calamari server. but the creation of calamari client was failed. Foll

[ceph-users] calamari build failure

2014-11-13 Thread idzzy
Hello, I’m trying to setup calamari with reference to  http://ceph.com/category/ceph-gui/. I could create package of calamari server. but the creation of calamari client was failed. Following is the procedure. build process was failed. How can I fix this? # git clone https://github.com/ceph/cal

Re: [ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices

2014-11-13 Thread Christian Balzer
Hello, On Wed, 12 Nov 2014 17:29:43 +0100 Christoph Adomeit wrote: > Hi, > > i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am > really happy with it. > > Linux and Windows VMs run really fast in KVM on the Ceph Storage. > > Only my Solaris 10 guests are terribly slow on cep

Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?

2014-11-13 Thread Anthony Alba
Thanks! What happens when the lone rule fails? Is there a fallback rule that will place the blob in a random PG? Say I misconfigure, and my choose/chooseleaf don't add up to pool min size. (This also explains why all examples in the wild use only 1 rule per ruleset.) On Fri, Nov 14, 2014 at 7:

Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?

2014-11-13 Thread Gregory Farnum
On Thu, Nov 13, 2014 at 2:58 PM, Anthony Alba wrote: > Hi list, > > > When there are multiple rules in a ruleset, is it the case that "first > one wins"? > > When will a rule faisl, does it fall through to the next rule? > Are min_size, max_size the only determinants? > > Are there any examples?

Re: [ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices

2014-11-13 Thread Smart Weblications GmbH - Florian Wiessner
Hi Christoph, Am 12.11.2014 17:29, schrieb Christoph Adomeit: > Hi, > > i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really > happy with it. > > Linux and Windows VMs run really fast in KVM on the Ceph Storage. > > Only my Solaris 10 guests are terribly slow on ceph rbd

[ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?

2014-11-13 Thread Anthony Alba
Hi list, When there are multiple rules in a ruleset, is it the case that "first one wins"? When will a rule faisl, does it fall through to the next rule? Are min_size, max_size the only determinants? Are there any examples? The only examples I've see put one rule per ruleset (e.g. the docs ha

Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-13 Thread Sage Weil
This appears to be a buggy libtcmalloc. Ceph hasn't gotten to main() yet from the looks of things.. tcmalloc is still initializing. Hopefully fedora has a newer versin of the package? sage On Thu, 13 Nov 2014, Harm Weites wrote: > Hi Sage, > > Here you go: http://paste.openstack.org/show/1

Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-13 Thread Harm Weites
Hi Sage, Here you go: http://paste.openstack.org/show/132936/ Harm Op 13-11-14 om 00:44 schreef Sage Weil: > On Wed, 12 Nov 2014, Harm Weites wrote: >> Hi, >> >> When trying to add a new OSD to my cluster the ceph-osd process hangs: >> >> # ceph-osd -i $id --mkfs --mkkey >> >> >> At this point

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-13 Thread Mike Christie
On 11/13/2014 10:17 AM, David Moreau Simard wrote: > Running into weird issues here as well in a test environment. I don't have a > solution either but perhaps we can find some things in common.. > > Setup in a nutshell: > - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separa

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-13 Thread German Anders
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-13 Thread David Moreau Simard
That's interesting. Although I'm running 3.16.7 and that I'd expect the patch to be in already, I'll downgrade to the "working" 3.16.0 kernel and report back if this fixes the issue. Thanks for the pointer. -- David Moreau Simard > On Nov 13, 2014, at 1:15 PM, German Anders wrote: > > Is po

Re: [ceph-users] Typical 10GbE latency

2014-11-13 Thread German Anders
any special parameters (or best practice) regarding the offload settings for the NICs? I got two ports: p4p1 (Public net) and p4p2 (Cluster internal), the cluster internal has MTU 9000 across all the OSD servers and of course on the SW ports: ceph@cephosd01:~$ ethtool -k p4p1 Features for p4p1

Re: [ceph-users] Typical 10GbE latency

2014-11-13 Thread Stephan Seitz
> >> Indeed, there must be something! But I can't figure it out yet. Same > >> controllers, tried the same OS, direct cables, but the latency is 40% > >> higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k should you provide that. - Stepha

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-13 Thread German Anders
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Very Basic question

2014-11-13 Thread Luca Mazzaferro
Moreover if I restart the service on the ceph-node1, which is the initial monitor and has an osd and mds: [root@ceph-node1 ~]# service ceph restart === mon.ceph-node1 === === mon.ceph-node1 === Stopping Ceph mon.ceph-node1 on ceph-node1...kill 1215...done === mon.ceph-node1 === Starting Ceph mon

Re: [ceph-users] mds continuously crashing on Firefly

2014-11-13 Thread Lincoln Bryant
Hi all, Just providing an update to this -- I started the mds daemon on a new server and rebooted a box with a hung CephFS mount (from the first crash) and the problem seems to have gone away. I'm still not sure why the mds was shutting down with a "Caught signal", though. Cheers, Lincoln

Re: [ceph-users] Very Basic question

2014-11-13 Thread Luca Mazzaferro
Hi, thank you for your answer: On 11/13/2014 06:17 PM, Gregory Farnum wrote: What does "ceph -s" output when things are working? Does the ceph.conf on your admin node BEFORE the problem: from ceph -w because I don't have ceph -s [rzgceph@admin-node my-cluster]$ ceph -w cluster 6fa39bb3-de

Re: [ceph-users] Very Basic question

2014-11-13 Thread Gregory Farnum
What does "ceph -s" output when things are working? Does the ceph.conf on your admin node contain the address of each monitor? (Paste is the relevant lines.) it will need to or the ceph tool won't be able to find the monitors even though the system is working. -Greg On Thu, Nov 13, 2014 at 9:11 AM

Re: [ceph-users] Very Basic question

2014-11-13 Thread Luca Mazzaferro
Hi, On 11/13/2014 06:05 PM, Artem Silenkov wrote: Hello! Only 1 monitor instance? It won't work at most cases. Make more and ensure quorum to reach survivalability. No, three monitor instances, one for each ceph-node. As designed into the quick-ceph-deploy. I tried to kill one of them (the in

Re: [ceph-users] Very Basic question

2014-11-13 Thread Artem Silenkov
Hello! Only 1 monitor instance? It won't work at most cases. Make more and ensure quorum to reach survivalability. Regards, Silenkov Artem --- artem.silen...@gmail.com 2014-11-13 20:02 GMT+03:00 Luca Mazzaferro : > Dear Users, > I followed the instruction of the storage cluster quick start here

[ceph-users] mds continuously crashing on Firefly

2014-11-13 Thread Lincoln Bryant
Hi Cephers, Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a "ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms

[ceph-users] Very Basic question

2014-11-13 Thread Luca Mazzaferro
Dear Users, I followed the instruction of the storage cluster quick start here: http://ceph.com/docs/master/start/quick-ceph-deploy/ I simulate a little storage with 4 VMs ceph-node[1,2,3] and an admin-node. Everything worked fine until I shut down the initial monitor node (ceph-node1). Also

Re: [ceph-users] CephFS, file layouts pools and rados df

2014-11-13 Thread Thomas Lemarchand
Hi Sage, Thank you for your answer. So, there is no anticipated problem with how I did ? Does the 'data' pool performance affects directly my filesystem performance, even if there is no file on it ? Do I need to have the same performance policy on 'data' pools than on the other pools ? Can I us

Re: [ceph-users] CephFS, file layouts pools and rados df

2014-11-13 Thread Sage Weil
On Thu, 13 Nov 2014, Thomas Lemarchand wrote: > Hi Ceph users, > > I need to have different filesystem trees in different pools, mainly for > security reasons. > > So I have ceph users (cephx) with specific access on specific pools. > > I have one metadata pool ('metadata') and tree data pools (

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-13 Thread David Moreau Simard
Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common.. Setup in a nutshell: - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps) - iSCSI Proxy node (ta

[ceph-users] CephFS, file layouts pools and rados df

2014-11-13 Thread Thomas Lemarchand
Hi Ceph users, I need to have different filesystem trees in different pools, mainly for security reasons. So I have ceph users (cephx) with specific access on specific pools. I have one metadata pool ('metadata') and tree data pools ('data', 'wimi-files, 'wimi-recette-files'). I used file layou

[ceph-users] Negative number of objects degraded for extended period of time

2014-11-13 Thread Fred Yang
Hi, The Ceph cluster we are running have few OSDs approaching to 95% 1+ weeks ago so I ran a reweight to balance it out, in the meantime, instructing application to purge data not required. But after large amount of data purge issued from application side(all OSDs' usage dropped below 20%), the cl

Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)

2014-11-13 Thread Dan van der Ster
Hi, On Thu Nov 13 2014 at 3:35:55 PM Anthony Alba wrote: > Ah no. > On 13 Nov 2014 21:49, "Dan van der Ster" > wrote: > >> Hi, >> Did you mkjournal the reused journal? >> >>ceph-osd -i $ID --mkjournal >> >> Cheers, Dan >> > > No - however the man page states that "--mkjournal" is for : > "

Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)

2014-11-13 Thread Anthony Alba
Ah no. On 13 Nov 2014 21:49, "Dan van der Ster" wrote: > Hi, > Did you mkjournal the reused journal? > >ceph-osd -i $ID --mkjournal > > Cheers, Dan > No - however the man page states that "--mkjournal" is for : "Create a new journal file to match an existing object repository. This is usef

Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)

2014-11-13 Thread Dan van der Ster
Hi, Did you mkjournal the reused journal? ceph-osd -i $ID --mkjournal Cheers, Dan On Thu Nov 13 2014 at 2:34:51 PM Anthony Alba wrote: > When I create a new OSD with a block device as journal that has > existing data on it, ceph is causing FAILED assert. The block device > iss a journal fr

[ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)

2014-11-13 Thread Anthony Alba
When I create a new OSD with a block device as journal that has existing data on it, ceph is causing FAILED assert. The block device iss a journal from a previous experiment. It can safely be overwritten. If I zero the block device with dd if=/dev/zero bs=512 count=1000 of=MyJournalDev then the a

Re: [ceph-users] Typical 10GbE latency

2014-11-13 Thread Wido den Hollander
On 12-11-14 21:12, Udo Lembke wrote: > Hi Wido, > On 12.11.2014 12:55, Wido den Hollander wrote: >> (back to list) >> >> >> Indeed, there must be something! But I can't figure it out yet. Same >> controllers, tried the same OS, direct cables, but the latency is 40% >> higher. >> >> > perhaps someth

Re: [ceph-users] Stackforge Puppet Module

2014-11-13 Thread Nick Fisk
Hi David, Yes its clones to the ceph folder. Its only that module which seems to complain, which is a bit odd. I might try and pop onto IRC at some point. Many Thanks, Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Moreau Simard Se