Re: [ceph-users] Problems during first install

2014-08-06 Thread Tijn Buijs

Hello Pratik,

Thanks for this tip. It was the golden one :). I just deleted all my VMs 
again and started over with (again) CentOS 6.5 and 1 OSD disk per data 
VM of 20 GB dynamically allocated. And this time everything worked 
correctly like they mentioned in the documentation :). I went on my way 
and added a second OSD disk to each of the data nodes (also 20 GB 
dynamically) and added that to my Ceph cluster. And this also worked:

[ceph@ceph-admin testcluster]$ ceph health
HEALTH_OK
[ceph@ceph-admin testcluster]$ ceph -s
cluster 4125efe2-caa1-4bf8-8c6d-f10b2c71bf27
 health HEALTH_OK
 monmap e1: 1 mons at {ceph-mon1=10.28.28.71:6789/0}, election 
epoch 1, quorum 0 ceph-mon1

 osdmap e54: 6 osds: 6 up, 6 in
  pgmap v104: 192 pgs, 3 pools, 0 bytes data, 0 objects
210 MB used, 91883 MB / 92093 MB avail
 192 active+clean

This is what I want to see :). All that is left to do now is increase 
the number of monitors from 1 to 3 and I have a nice test environment 
which resembles our production environment closely enough :). I started 
this process already and it didn't work yet, but I will play around with 
it some more. If I can't get it to work I will start a new thread :).
Also I would like to understand why 10 GB per OSD isn't enough to store 
nothing, but 20 GB per OSD is :).


Thnx everybody for your help!

Met vriendelijke groet/With kind regards,

Tijn Buijs

Cloud.nl logo

t...@cloud.nl  | T. 0800-CLOUDNL / +31 (0)162 820 
000 | F. +31 (0)162 820 001
Cloud.nl B.V. | Minervum 7092D | 4817 ZK Breda | www.cloud.nl 


On 05/08/14 12:13, Pratik Rupala wrote:

Hi Tijn,

I had also created my first CEPH storage cluster almost as you have 
created. I had 3 VMs for OSD nodes and 1 VM for Monitor node.
All 3 OSD VMs were having one 10 GB virtual disks. so I faced almost 
same problem as you are facing right now.

Then changing disk space from 10 GB to 20 GB solved my problem.

I don't know if dynamic disks will create any problem. But I think 
instead of having 6 OSDs you can have 3 OSDs, one OSD per VM and can 
increase disk size from 10 GB to 20 GB for 3 OSDs.


I don't know this will solve your problem or not but worthy to try. I 
mean 3 OSDs are enough for testing purpose initially.


Regards,
Pratik


On 8/5/2014 12:37 PM, Tijn Buijs wrote:

Hello Pratik,

I'm using virtual disks as OSDs. I prefer virtual disks over 
directories because this resembles the production environment a bit 
better.
I'm using VirtualBox for virtualisation. The OSDs are dynamic disks, 
not pre-allocated, but this shouldn't be a problem, right? I don't 
have the diskspace on my iMac to have all 6 OSDs pre-allocated :). 
I've made the virtual OSD disks 10 GB each, by the way, so that 
should be enough for a first test, imho.


Met vriendelijke groet/With kind regards,

Tijn Buijs

Cloud.nl logo

t...@cloud.nl  | T. 0800-CLOUDNL / +31 (0)162 
820 000 | F. +31 (0)162 820 001
Cloud.nl B.V. | Minervum 7092D | 4817 ZK Breda | www.cloud.nl 


On 04/08/14 14:51, Pratik Rupala wrote:

Hi,

You mentioned that you have 3 hosts which are VMs. Are you using 
simple directories as OSDs or virtual disks as OSDs?


I had same problem few days back where enough space was not 
available from OSD for the cluster.


Try to increase the size of disks if you are using virtual disks and 
if you are using directories as OSDs then check whether you have 
enough space on root device using df -h command on OSD node.


Regards,
Pratik

On 8/4/2014 4:11 PM, Tijn Buijs wrote:

Hi Everybody,

My idea was that maybe I was inpatient or something, so I let my 
Ceph cluster running over the weekend. So from friday 15:00 until 
now (it is monday morning 11:30 here now) it kept on running. And 
it didn't help :). It still needs to create 192 PGs.
I've reinstalled my entier cluster a few times now. I switched over 
from CentOS 6.5 to Ubuntu 14.04.1 LTS and back to CentOS again, and 
every time I get exactly the same results. The PGs are getting in 
the incomplete, stuck inactive, stuk unclean state. What am I doing 
wrong? :).


For the moment I'm running with 6 OSDs evenly divided over 3 hosts 
(so each host has 2 OSDs). I've only got 1 monitor configured in my 
current cluster. I hit some other problem when trying to add 
monitor 2 and 3 again. And to not complicate things with multiple 
problems at the same time I've switched back to only 1 monitor. The 
cluster should work that way, right?


To make things clear for everybody, here is the output of ceph 
health and ceph -s:

$ ceph health
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean

$ ceph -s
cluster 43d5f48b-d034-4f50-bec8-5c4f3ad8276f
 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 
192 pgs stuck unclean
 monmap e1: 1 mons at {ceph-mon1=10.28.28.71:6789/0}, election 
epoch 1, quorum 0 ceph-mon1

 osdmap e20: 6 os

Re: [ceph-users] Problems during first install

2014-08-06 Thread Christian Balzer
On Wed, 06 Aug 2014 09:18:13 +0200 Tijn Buijs wrote:

> Hello Pratik,
> 
> Thanks for this tip. It was the golden one :). I just deleted all my VMs 
> again and started over with (again) CentOS 6.5 and 1 OSD disk per data 
> VM of 20 GB dynamically allocated. And this time everything worked 
> correctly like they mentioned in the documentation :). I went on my way 
> and added a second OSD disk to each of the data nodes (also 20 GB 
> dynamically) and added that to my Ceph cluster. And this also worked:
> [ceph@ceph-admin testcluster]$ ceph health
> HEALTH_OK
> [ceph@ceph-admin testcluster]$ ceph -s
>  cluster 4125efe2-caa1-4bf8-8c6d-f10b2c71bf27
>   health HEALTH_OK
>   monmap e1: 1 mons at {ceph-mon1=10.28.28.71:6789/0}, election 
> epoch 1, quorum 0 ceph-mon1
>   osdmap e54: 6 osds: 6 up, 6 in
>pgmap v104: 192 pgs, 3 pools, 0 bytes data, 0 objects
>  210 MB used, 91883 MB / 92093 MB avail
>   192 active+clean
> 
> This is what I want to see :). All that is left to do now is increase 
> the number of monitors from 1 to 3 and I have a nice test environment 
> which resembles our production environment closely enough :). I started 
> this process already and it didn't work yet, but I will play around with 
> it some more. If I can't get it to work I will start a new thread :).
> Also I would like to understand why 10 GB per OSD isn't enough to store 
> nothing, but 20 GB per OSD is :).
> 
My guess would be that the journal (default of 5GB and definitely not
"nothing" ^o^) and all the other bits initially created are too much for
comfort in a 10GB disk.

Regards,

Christian

> Thnx everybody for your help!
> 
> Met vriendelijke groet/With kind regards,
> 
> Tijn Buijs
> 
> Cloud.nl logo
> 
> t...@cloud.nl  | T. 0800-CLOUDNL / +31 (0)162 820 
> 000 | F. +31 (0)162 820 001
> Cloud.nl B.V. | Minervum 7092D | 4817 ZK Breda | www.cloud.nl 
> 
> On 05/08/14 12:13, Pratik Rupala wrote:
> > Hi Tijn,
> >
> > I had also created my first CEPH storage cluster almost as you have 
> > created. I had 3 VMs for OSD nodes and 1 VM for Monitor node.
> > All 3 OSD VMs were having one 10 GB virtual disks. so I faced almost 
> > same problem as you are facing right now.
> > Then changing disk space from 10 GB to 20 GB solved my problem.
> >
> > I don't know if dynamic disks will create any problem. But I think 
> > instead of having 6 OSDs you can have 3 OSDs, one OSD per VM and can 
> > increase disk size from 10 GB to 20 GB for 3 OSDs.
> >
> > I don't know this will solve your problem or not but worthy to try. I 
> > mean 3 OSDs are enough for testing purpose initially.
> >
> > Regards,
> > Pratik
> >
> >
> > On 8/5/2014 12:37 PM, Tijn Buijs wrote:
> >> Hello Pratik,
> >>
> >> I'm using virtual disks as OSDs. I prefer virtual disks over 
> >> directories because this resembles the production environment a bit 
> >> better.
> >> I'm using VirtualBox for virtualisation. The OSDs are dynamic disks, 
> >> not pre-allocated, but this shouldn't be a problem, right? I don't 
> >> have the diskspace on my iMac to have all 6 OSDs pre-allocated :). 
> >> I've made the virtual OSD disks 10 GB each, by the way, so that 
> >> should be enough for a first test, imho.
> >>
> >> Met vriendelijke groet/With kind regards,
> >>
> >> Tijn Buijs
> >>
> >> Cloud.nl logo
> >>
> >> t...@cloud.nl  | T. 0800-CLOUDNL / +31 (0)162 
> >> 820 000 | F. +31 (0)162 820 001
> >> Cloud.nl B.V. | Minervum 7092D | 4817 ZK Breda | www.cloud.nl 
> >> 
> >> On 04/08/14 14:51, Pratik Rupala wrote:
> >>> Hi,
> >>>
> >>> You mentioned that you have 3 hosts which are VMs. Are you using 
> >>> simple directories as OSDs or virtual disks as OSDs?
> >>>
> >>> I had same problem few days back where enough space was not 
> >>> available from OSD for the cluster.
> >>>
> >>> Try to increase the size of disks if you are using virtual disks and 
> >>> if you are using directories as OSDs then check whether you have 
> >>> enough space on root device using df -h command on OSD node.
> >>>
> >>> Regards,
> >>> Pratik
> >>>
> >>> On 8/4/2014 4:11 PM, Tijn Buijs wrote:
>  Hi Everybody,
> 
>  My idea was that maybe I was inpatient or something, so I let my 
>  Ceph cluster running over the weekend. So from friday 15:00 until 
>  now (it is monday morning 11:30 here now) it kept on running. And 
>  it didn't help :). It still needs to create 192 PGs.
>  I've reinstalled my entier cluster a few times now. I switched over 
>  from CentOS 6.5 to Ubuntu 14.04.1 LTS and back to CentOS again, and 
>  every time I get exactly the same results. The PGs are getting in 
>  the incomplete, stuck inactive, stuk unclean state. What am I doing 
>  wrong? :).
> 
>  For the moment I'm running with 6 OSDs evenly divided over 3 hosts 
>  (so each host has 2 OSDs). I've only got 1 monitor configured in my 
> >>

Re: [ceph-users] Openstack Havana root fs resize don't work

2014-08-06 Thread Hauke Bruno Wollentin
Hi,

1) I have flavors like 1 vCPU, 2GB memory, 20GB root disk. No swap + no 
ephemeral disk. Then I just create an instance via horizon choosing an image + 
a flavor.

2) OpenStack itselfs runs on Ubuntu 12.04.4 LTS, for the instances I have some 
Ubuntu 12.04/14.04s, Debians and CentOS'.

3) In the spawned instances I see that the partition wasn't resized. 
/proc/partions + fdisk -l show the size of the image partition, not the 
instance partition specified by the flavor.



---
original message
timestamp: Tuesday, August 05, 2014 03:50:55 PM
from: Jeremy Hanmer 
to: Dinu Vlad 
cc: ceph-users@lists.ceph.com 
subject: Re: [ceph-users] Openstack Havana root fs resize don't work
message id: 

> This is *not* a case of that bug.  That LP bug is referring to an
> issue with the 'nova resize' command and *not* with an instance
> resizing its own root filesystem.  I can confirm that the latter case
> works perfectly fine in Havana if you have things configured properly.
> 
> A few questions:
> 
> 1) What workflow are you using?  (Create a volume from an image ->
> boot from that volume, ceps-backed ephemeral, or some other patch?)
> 2) What OS/release are you running?  I've gotten it to work with
> recent versions Centos, Debian, Fedora, and Ubuntu.
> 3) What are you actually seeing on the image?  Is the *partition* not
> being resized at all (as referenced by /proc/partions), or is it just
> the filesystem that isn't being resized (as referenced by df)?
> 
> On Tue, Aug 5, 2014 at 3:41 PM, Dinu Vlad  wrote:
> > There’s a known issue with Havana’s rbd driver in nova and it has nothing
> > to do with ceph. Unfortunately, it is only fixed in icehouse. See
> > https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1219658 for more
> > details.
> > 
> > I can confirm that applying the patch manually works.
> > 
> > On 05 Aug 2014, at 11:00, Hauke Bruno Wollentin  wrote:
> >> Hi folks,
> >> 
> >> we use Ceph Dumpling as storage backend for Openstack Havana. However our
> >> instances are not able to resize its root filesystem.
> >> 
> >> This issue just occurs for the virtual root disk. If we start instances
> >> with an attached volume, the virtual volume disks size is correct.
> >> 
> >> Our infrastructure:
> >> - 1 OpenStack Controller
> >> - 1 OpenStack Neutron Node
> >> - 1 OpenStack Cinder Node
> >> - 4 KVM Hypervisors
> >> - 4 Ceph-Storage Nodes including mons
> >> - 1 dedicated mon
> >> 
> >> As OS we use Ubuntu 12.04.
> >> 
> >> Our cinder.conf on Cinder Node:
> >> 
> >> volume_driver = cinder.volume.driver.RBDDriver
> >> rbd_pool = volumes
> >> rbd_secret = SECRET
> >> rbd_user = cinder
> >> rbd_ceph_conf = /etc/ceph/ceph.conf
> >> rbd_max_clone_depth = 5
> >> glance_api_version = 2
> >> 
> >> Our nova.conf on hypervisors:
> >> 
> >> libvirt_images_type=rbd
> >> libvirt_images_rbd_pool=volumes
> >> libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf
> >> rbd_user=admin
> >> rbd_secret_uuid=SECRET
> >> libvirt_inject_password=false
> >> libvirt_inject_key=false
> >> libvirt_inject_partition=-2
> >> 
> >> In our instances we see that the virtual disk isn't _updated_ in its
> >> size. It still uses the size specified in the images.
> >> 
> >> We use growrootfs in our images as described in the documentation +
> >> verified its functionality (we switched temporarly to LVM as the storage
> >> backend, that works).
> >> 
> >> Our images are manually created regarding the documention (means only 1
> >> partition, no swap, cloud-utils etc.).
> >> 
> >> Does anyone has some hints how to solve this issue?
> >> 
> >> Cheers,
> >> Hauke
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
Hauke Bruno Wollentin
(Infrastructure Engineer Cloud)


iNNOVO Cloud GmbH

Düsseldorfer Straße 40a

65760 Eschborn (Taunus)

Tel. 069/ 24 747 18-26

Fax. 069/ 24 747 18-1022

Mail. hauke-bruno.wollen...@innovo-cloud.de



Geschäftsführung: Dr. Sebastian Ritz, Stefan
Sickenberger

Registergericht Frankfurt a.M., HRB 95751/USt.-IdNr.: DE2870 34448
Frankfurter Volksbank eG (Blz. 501 900 00) Konto 600 200 9917
IBAN DE9450196002009917 BIC: FFVBDEFF



Informationen (einschließlich Pflichtangaben) zu einzelnen, innerhalb der
EU tätigen Gesellschaften und Zweigniederlassungen der iNNOVO Cloud gmbH
finden Sie unter http://www.innovo-cloud.de/pflichtangaben.htm. Diese
E-Mail enthält vertrauliche und/ oder rechtlich geschützte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich
erhalten haben, informieren Sie bitte sofort den Absender und vernichten
Sie diese

[ceph-users] slow OSD brings down the cluster

2014-08-06 Thread Luis Periquito
Hi,

In the last few days I've had some issues with the radosgw in which all
requests would just stop being served.

After some investigation I would go for a single slow OSD. I just restarted
that OSD and everything would just go back to work. Every single time there
was a deep scrub running on that OSD.

This has happened in several different OSDs, running in different machines.
I currently have 32 OSDs on this cluster, with 4 OSD per host.

First thing is should this happen? A single OSD with issues/slowness
shouldn't bring the whole cluster to a crawl...

How can I make it stop happening? What kind of debug information can I
gather to stop this from happening?

any further thoughts?

I'm still running Emperor (0.72.2).

-- 

Luis Periquito

Unix Engineer

Ocado.com 

Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
Hatfield, Herts AL10 9NE

-- 


Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us immediately and 
delete all copies of this message. Please note that it is your 
responsibility to scan this message for viruses.  

References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, 
Hatfield Business Park, Hatfield, Herts. AL10 9NE.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench no clean cleanup

2014-08-06 Thread Kenneth Waegeman

Hi,

I did a test with 'rados -p ecdata bench 10 write' on an ECpool  
with a cache replicated pool over it (ceph 0.83).
The benchmark wrote about 12TB of data. After the 10 seconds run,  
rados started to delete his benchmark files.
But only about 2,5TB got deleted, then rados returned. I tried to do  
it with the cleanup function 'rados -p ecdata cleanup --prefix bench'

and after a lot of time, it returns:

 Warning: using slow linear search
 Removed 2322000 objects

But rados df showed the same statistics as before.
I ran it again, and it again showed 'Removed 2322000 objects', without  
any change in the rados df statistics.
It is probably the 'lazy deletion', because if I try to do a 'rados  
get' on it, there is 'No such file or directory'. But I still see the  
objects when I do 'rados -p ecdata ls'.


Is this indeed because of the lazy deletion?  Is there a way to see  
how much not-deleted objects are in the pool? And is there then a  
reason why rados did remove the first 2,5TB? Or is this just a rados  
bench issue?:)


Thanks again!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow OSD brings down the cluster

2014-08-06 Thread Wido den Hollander

On 08/06/2014 10:43 AM, Luis Periquito wrote:

Hi,

In the last few days I've had some issues with the radosgw in which all
requests would just stop being served.

After some investigation I would go for a single slow OSD. I just
restarted that OSD and everything would just go back to work. Every
single time there was a deep scrub running on that OSD.

This has happened in several different OSDs, running in different
machines. I currently have 32 OSDs on this cluster, with 4 OSD per host.

First thing is should this happen? A single OSD with issues/slowness
shouldn't bring the whole cluster to a crawl...



So, it's not the whole cluster which is slow, but the RGW is requesting 
objects which are in a PG where that OSD is currently primary for.


For you it seems like the whole cluster is down, but it's just 'bad 
luck' in this case.


Have you checked if there is anything wrong with the backing disk? 100% 
busy? Read errors?


You can also simply mark the osd as 'out' leave it out of the cluster. 
Re-format the whole OSD and see if it comes back.


Are you using btrfs by any chance?

Wido


How can I make it stop happening? What kind of debug information can I
gather to stop this from happening?

any further thoughts?

I'm still running Emperor (0.72.2).

--

Luis Periquito

Unix Engineer


Ocado.com 


Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
Hatfield, Herts AL10 9NE


Notice:  This email is confidential and may contain copyright material
of members of the Ocado Group. Opinions and views expressed in this
message may not necessarily reflect the opinions and views of the
members of the Ocado Group.

If you are not the intended recipient, please notify us immediately and
delete all copies of this message. Please note that it is your
responsibility to scan this message for viruses.

References to the “Ocado Group” are to Ocado Group plc (registered in
England and Wales with number 7098618) and its subsidiary undertakings
(as that expression is defined in the Companies Act 2006) from time to
time.  The registered office of Ocado Group plc is Titan Court, 3
Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow OSD brings down the cluster

2014-08-06 Thread Luis Periquito
Hi Wido,

as the backing disk is running a deep scrub it's constantly 100% busy, no
errors though...

I'm running everything on XFS.

I had a similar feeling that was the OSD slowing down those requests. What
would be the affected pool? ".rgw"?

thanks,


On 6 August 2014 10:08, Wido den Hollander  wrote:

> On 08/06/2014 10:43 AM, Luis Periquito wrote:
>
>> Hi,
>>
>> In the last few days I've had some issues with the radosgw in which all
>> requests would just stop being served.
>>
>> After some investigation I would go for a single slow OSD. I just
>> restarted that OSD and everything would just go back to work. Every
>> single time there was a deep scrub running on that OSD.
>>
>> This has happened in several different OSDs, running in different
>> machines. I currently have 32 OSDs on this cluster, with 4 OSD per host.
>>
>> First thing is should this happen? A single OSD with issues/slowness
>> shouldn't bring the whole cluster to a crawl...
>>
>>
> So, it's not the whole cluster which is slow, but the RGW is requesting
> objects which are in a PG where that OSD is currently primary for.
>
> For you it seems like the whole cluster is down, but it's just 'bad luck'
> in this case.
>
> Have you checked if there is anything wrong with the backing disk? 100%
> busy? Read errors?
>
> You can also simply mark the osd as 'out' leave it out of the cluster.
> Re-format the whole OSD and see if it comes back.
>
> Are you using btrfs by any chance?
>
> Wido
>
>  How can I make it stop happening? What kind of debug information can I
>> gather to stop this from happening?
>>
>> any further thoughts?
>>
>> I'm still running Emperor (0.72.2).
>>
>> --
>>
>> Luis Periquito
>>
>> Unix Engineer
>>
>>
>> Ocado.com 
>>
>>
>>
>> Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
>> Hatfield, Herts AL10 9NE
>>
>>
>> Notice:  This email is confidential and may contain copyright material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>>
>> If you are not the intended recipient, please notify us immediately and
>> delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>>
>> References to the “Ocado Group” are to Ocado Group plc (registered in
>> England and Wales with number 7098618) and its subsidiary undertakings
>> (as that expression is defined in the Companies Act 2006) from time to
>> time.  The registered office of Ocado Group plc is Titan Court, 3
>> Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Luis Periquito

Unix Engineer

Ocado.com 

Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
Hatfield, Herts AL10 9NE

-- 


Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us immediately and 
delete all copies of this message. Please note that it is your 
responsibility to scan this message for viruses.  

References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, 
Hatfield Business Park, Hatfield, Herts. AL10 9NE.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems during first install

2014-08-06 Thread Dennis Jacobfeuerborn
On 06.08.2014 09:25, Christian Balzer wrote:
> On Wed, 06 Aug 2014 09:18:13 +0200 Tijn Buijs wrote:
> 
>> Hello Pratik,
>>
>> Thanks for this tip. It was the golden one :). I just deleted all my VMs 
>> again and started over with (again) CentOS 6.5 and 1 OSD disk per data 
>> VM of 20 GB dynamically allocated. And this time everything worked 
>> correctly like they mentioned in the documentation :). I went on my way 
>> and added a second OSD disk to each of the data nodes (also 20 GB 
>> dynamically) and added that to my Ceph cluster. And this also worked:
>> [ceph@ceph-admin testcluster]$ ceph health
>> HEALTH_OK
>> [ceph@ceph-admin testcluster]$ ceph -s
>>  cluster 4125efe2-caa1-4bf8-8c6d-f10b2c71bf27
>>   health HEALTH_OK
>>   monmap e1: 1 mons at {ceph-mon1=10.28.28.71:6789/0}, election 
>> epoch 1, quorum 0 ceph-mon1
>>   osdmap e54: 6 osds: 6 up, 6 in
>>pgmap v104: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>  210 MB used, 91883 MB / 92093 MB avail
>>   192 active+clean
>>
>> This is what I want to see :). All that is left to do now is increase 
>> the number of monitors from 1 to 3 and I have a nice test environment 
>> which resembles our production environment closely enough :). I started 
>> this process already and it didn't work yet, but I will play around with 
>> it some more. If I can't get it to work I will start a new thread :).
>> Also I would like to understand why 10 GB per OSD isn't enough to store 
>> nothing, but 20 GB per OSD is :).
>>
> My guess would be that the journal (default of 5GB and definitely not
> "nothing" ^o^) and all the other bits initially created are too much for
> comfort in a 10GB disk.

My guess is that with 10G OSDs you run into this bug:
http://tracker.ceph.com/issues/8551

Ceph calculates the weights on the basis of the OSD size by dividing the
size in bytes by 1T so that a 1T disk results in a weight of 1.0. A 10G
disk would result in a weight of 0.01 but in your case if you assigned
10G you probably have some filesystem overhead which makes the weight
closer to 0.009. Problem is the code calculating the weight cuts the
number of after the second decimal digit so you end up with a weight of
0.00.
The result is that ceph will not put any data on these OSDs and that
means that all PGs will stay in state incomplete until the weight gets
fixed.

You can verify this by dumping the crush map. If all the OSDs show a
weight of 0 you know this is your problem and you can fix it by
adjusting the weight to something more reasonable.

Regards,
  Dennis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What is difference in storing data between rbd and rados ?

2014-08-06 Thread debian Only
I am confuse to understand how File store in Ceph.

I do two test. where is the File or the object for the File

①rados put Python.msi Python.msi -p data
②rbd -p testpool create fio_test --size 2048

rados command of ① means use Ceph as Object storage ?
rbd command of ② means use Ceph as Block storage ?

As i known, object in Ceph is 4M by default.  this Object will put in PG.
so i try do test as blow.  the fio_test image store in Ceph by 512 object.
 512(object) * 4 = 2048
and i can get object in testpool.

# rbd -p testpool info fio_test
rbd image 'fio_test':
size 2048 MB in 512 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1b6f.2ae8944a
format: 1
# rados -p testpool ls |grep rb.0.1b6f.2ae8944a |wc -l
512


but when i check the data pool, only one file :Python.msi (26M), why not
split Python.msi to many object(4M)  ?

t# rados ls -p pool-B
python.msi
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is possible to use Ramdisk for Ceph journal ?

2014-08-06 Thread debian Only
Thanks for your reply.
I have found and test a way myself.. and now share to others


>Begin>>>  On Debian >>>
root@ceph01-vm:~# modprobe brd rd_nr=1 rd_size=4194304 max_part=0
root@ceph01-vm:~# mkdir /mnt/ramdisk
root@ceph01-vm:~# mkfs.btrfs /dev/ram0

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/ram0
nodesize 4096 leafsize 4096 sectorsize 4096 size 4.00GB
Btrfs Btrfs v0.19
root@ceph01-vm:~# mount /dev/ram0 /mnt/ramdisk/
root@ceph01-vm:~# update-rc.d ramdisk defaults 10 99
cd /etc/rc0.d/
 mv K01ramdisk K99ramdisk
 cd ../rc1.d/
 mv K01ramdisk K99ramdisk
cd ../rc6.d/
mv K01ramdisk K99ramdisk
 cd ../rc2.d/
mv S17ramdisk S08ramdisk
cd ../rc3.d/
mv S17ramdisk S08ramdisk
 cd ../rc4.d/
 mv S17ramdisk S08ramdisk
 cd ../rc5.d/
 mv S17ramdisk S08ramdisk
update-rc.d: using dependency based boot sequencing
root@ceph01-vm:~# cd /etc/rc0.d/
root@ceph01-vm:/etc/rc0.d#  mv K01ramdisk K99ramdisk
root@ceph01-vm:/etc/rc0.d#  cd ../rc1.d/
root@ceph01-vm:/etc/rc1.d#  mv K01ramdisk K99ramdisk
root@ceph01-vm:/etc/rc1.d# cd ../rc6.d/
root@ceph01-vm:/etc/rc6.d# mv K01ramdisk K99ramdisk
root@ceph01-vm:/etc/rc6.d#  cd ../rc2.d/
root@ceph01-vm:/etc/rc2.d# mv S17ramdisk S08ramdisk
root@ceph01-vm:/etc/rc2.d# cd ../rc3.d/
root@ceph01-vm:/etc/rc3.d# mv S17ramdisk S08ramdisk
root@ceph01-vm:/etc/rc3.d#  cd ../rc4.d/
root@ceph01-vm:/etc/rc4.d#  mv S17ramdisk S08ramdisk
root@ceph01-vm:/etc/rc4.d#  cd ../rc5.d/
root@ceph01-vm:/etc/rc5.d#  mv S17ramdisk S08ramdisk
root@ceph01-vm:/etc/rc5.d# service ceph status
=== mon.ceph01-vm ===
mon.ceph01-vm: running {"version":"0.80.5"}
=== osd.2 ===
osd.2: running {"version":"0.80.5"}
=== mds.ceph01-vm ===
mds.ceph01-vm: running {"version":"0.80.5"}
root@ceph01-vm:/etc/rc5.d# service ceph stop osd.2
=== osd.2 ===
Stopping Ceph osd.2 on ceph01-vm...kill 10457...done
root@ceph01-vm:/etc/rc5.d# ceph-osd -i 2 --flush-journal
sh: 1: /sbin/hdparm: not found
2014-08-04 00:40:44.544251 7f5438b7a780 -1 journal _check_disk_write_cache:
pclose failed: (61) No data available
sh: 1: /sbin/hdparm: not found
2014-08-04 00:40:44.568660 7f5438b7a780 -1 journal _check_disk_write_cache:
pclose failed: (61) No data available
2014-08-04 00:40:44.570047 7f5438b7a780 -1 flushed journal
/var/lib/ceph/osd/ceph-2/journal for object store /var/lib/ceph/osd/ceph-2
root@ceph01-vm:/etc/rc5.d# vi /etc/ceph/ceph.conf

put this config in to /etc/ceph/ceph.conf

[osd]
journal dio = false
osd journal size = 3072
[osd.2]
host = ceph01-vm
osd journal = /mnt/ramdisk/journal


root@ceph01-vm:/etc/rc5.d# ceph-osd -c /etc/ceph/ceph.conf -i 2 --mkjournal
2014-08-04 00:41:37.706925 7fa84b9dd780 -1 journal FileJournal::_open: aio
not supported without directio; disabling aio
2014-08-04 00:41:37.707975 7fa84b9dd780 -1 journal FileJournal::_open_file
: unable to preallocation journal to 5368709120 bytes: (28) No space left
on device
2014-08-04 00:41:37.708020 7fa84b9dd780 -1
filestore(/var/lib/ceph/osd/ceph-2) mkjournal error creating journal on
/mnt/ramdisk/journal: (28) No space left on device
2014-08-04 00:41:37.708050 7fa84b9dd780 -1  ** ERROR: error creating fresh
journal /mnt/ramdisk/journal for object store /var/lib/ceph/osd/ceph-2:
(28) No space left on device
root@ceph01-vm:/etc/rc5.d# ceph-osd -c /etc/ceph/ceph.conf -i 2 --mkjournal
2014-08-04 00:41:39.033908 7fd7e7627780 -1 journal FileJournal::_open: aio
not supported without directio; disabling aio
2014-08-04 00:41:39.034067 7fd7e7627780 -1 journal check: ondisk fsid
---- doesn't match expected
6b619888-6ce4-4028-b7b3-a3af2cf0c6c9, invalid (someone else's?) journal
2014-08-04 00:41:39.034252 7fd7e7627780 -1 created new journal
/mnt/ramdisk/journal for object store /var/lib/ceph/osd/ceph-2
root@ceph01-vm:/etc/rc5.d# service ceph start osd.2
=== osd.2 ===
create-or-move updated item name 'osd.2' weight 0.09 at location
{host=ceph01-vm,root=default} to crush map
Starting Ceph osd.2 on ceph01-vm...
starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /mnt/ramdisk/journal
root@ceph01-vm:/etc/rc5.d# service ceph status
=== mon.ceph01-vm ===
mon.ceph01-vm: running {"version":"0.80.5"}
=== osd.2 ===
osd.2: running {"version":"0.80.5"}
=== mds.ceph01-vm ===
mds.ceph01-vm: running {"version":"0.80.5"}
=== osd.2 ===
osd.2: running {"version":"0.80.5"}

<:


2014-08-06 7:14 GMT+07:00 Craig Lewis :

> Try this (adjust the size param as needed):
> mount -t tmpfs -o size=256m tmpfs /mnt/ramdisk
> ceph-deploy osd  prepare ceph04-vm:/dev/sdb:/mnt/ramdisk/journal.osd0
>
>
>
> On Sun, Aug 3, 2014 at 7:13 PM, debian Only  wrote:
>
>> anyone can help?
>>
>>
>> 2014-07-31 23:55 GMT+07:00 debian Only :
>>
>> Dear ,
>>>
>>> i have one test environment  Ceph Firefly 0.80.4, on Debian 7.5 .
>>> i do not have enough  SSD for each OSD.
>>> I want to test speed Ceph perfermance by put journal in a Ramdisk or
>>> tmpfs, but when to add new osd use separate disk for OSD data an

Re: [ceph-users] Install Ceph nodes without network proxy access

2014-08-06 Thread Alfredo Deza
On Tue, Aug 5, 2014 at 10:47 PM, O'Reilly, Dan  wrote:
> Final update: after a good deal of messing about, I did finally get this to 
> work.  Many thanks for the help

Would you mind sharing what changed so this would end up working? Just
want to make sure that it is not something on ceph-deploy's end
>
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] On Behalf Of O'Reilly, 
> Dan [daniel.orei...@dish.com]
> Sent: Tuesday, August 05, 2014 3:04 PM
> To: 'Alfredo Deza'
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Install Ceph nodes without network proxy access
>
> OK, I’m getting farther.  I changed the ceph-deploy command line to be:
>
> ceph-deploy install --no-adjust-repos  tm1cldmonl01
>
> That keeps me from needing to grab keys.
>
> But now I’m getting :
>
> [tm1cldmonl01][WARNIN] Public key for leveldb-1.7.0-2.el6.x86_64.rpm is not 
> installed
>
> The repo definitions that contain that file are:
>
> [extras_noarch]
> name=Ceph Extras noarch releasee
> baseurl=file:///unixdepot/cloud/openstack/repos/ceph/ceph_extras_noarch/
> gpgcheck=0
> enabled=1
> gpgkey=file:///unixdepot/cloud/openstack/repos/ceph/release.asc
>
> [extras_x86_64]
> name=Ceph Extras x86_64 releasee
> baseurl=file:///unixdepot/cloud/openstack/repos/ceph/ceph_extras_x86_64/
> gpgcheck=0
> enabled=1
> gpgkey=file:///unixdepot/cloud/openstack/repos/ceph/release.asc
>
> [noarch]
> name=Ceph noarch releasee
> baseurl=file:///unixdepot/cloud/openstack/repos/ceph/noarch/
> gpgcheck=0
> enabled=1
> gpgkey=file:///unixdepot/cloud/openstack/repos/ceph/release.asc
>
> [x86_64]
> name=Ceph x86_64 releasee
> baseurl=file:///unixdepot/cloud/openstack/repos/ceph/x86_64/
> gpgcheck=0
> enabled=1
> gpgkey=file:///unixdepot/cloud/openstack/repos/ceph/release.asc
>
> Any more ideas?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph writes stall for long perioids with no disk/network activity

2014-08-06 Thread Chris Kitzmiller
On Aug 5, 2014, at 12:43 PM, Mark Nelson wrote:
> On 08/05/2014 08:42 AM, Mariusz Gronczewski wrote:
>> On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson  
>> wrote:
>>> On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
 On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
> I got weird stalling during writes, sometimes I got same write speed
> for few minutes and after some time it starts stalling with 0 MB/s for
> minutes
 
 I'm getting very similar behavior on my cluster. My writes start well but 
 then just kinda stop for a while and then bump along slowly until the 
 bench finishes. I've got a thread about it going here called "Ceph runs 
 great then falters".
>>> 
>>> This kind of behaviour often results when the journal can write much
>>> faster than the OSD data disks.  Initially the journals will be able to
>>> absorb the data and things will run along well, but eventually ceph will
>>> need to stall writes if things get too out of sync.  You may want to
>>> take a look at what's happening on the data disks during your tests to
>>> see if there's anything that looks suspect.  Checking the admin socket
>>> for dump_historic_ops might provide some clues as well.
>>> 
>>> Mark
>> 
>> I did check journals already, they are on same disk as data (separate
>> partition) and during stalls there is no traffic to both of them (like
>> 8 iops on average with 0% io wait).
> 
> This may indicate that 1 OSD could be backing up with possibly most if not 
> all IOs waiting on it.  The idea here is that because data placement is 
> deterministic, if 1 OSD is slow, over time just by random chance all 
> outstanding client operations will back up on it.  Having more concurrency 
> gives you more wiggle room but may not ultimately solve it.
> 
> It's also possible that something else may be causing the OSDs to wait.  
> dump_historic_ops might help.

This turns out to have been my problem. Monitoring my cluster with atop 
(thanks, Christian Balzer) during one of these incidents found that a single 
HDD (out of 90) was pegged to 100% utilization. I replaced the drive and have 
since written over 20TB of data to my RBD device without issue.

I'm not sure I fully understand what's going on when this happens but it is 
pretty clear that it isn't happening any more. It would be great to have some 
sort of warning to say that the load on a single disk is disproportionate to 
the rest of the cluster.

>> I'm 99.99% sure they are ok as they worked few months in cassandra
>> cluster before, and when I was doing some ceph rebalancing and
>> adding/removing OSDs they worked constantly with no stalls
>> 
>> If I look into logs of osds that have slow ops I get something like that:
>> 
>> 2014-08-05 15:31:33.461481 7fbff4fd3700  0 log [WRN] : slow request 
>> 30.147566 seconds old, received at 2014-08-05 15:31:03.313858: 
>> osd_op(client.190830.0:176 
>> benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 
>> 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
>> 2014-08-05 15:32:03.467775 7fbff4fd3700  0 log [WRN] : 1 slow requests, 
>> 1 included below; oldest blocked for > 60.153871 secs
>> 2014-08-05 15:32:03.467794 7fbff4fd3700  0 log [WRN] : slow request 
>> 60.153871 seconds old, received at 2014-08-05 15:31:03.313858: 
>> osd_op(client.190830.0:176 
>> benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 
>> 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
>> 2014-08-05 15:33:03.481163 7fbff4fd3700  0 log [WRN] : 1 slow requests, 
>> 1 included below; oldest blocked for > 120.167272 secs
>> 2014-08-05 15:33:03.481170 7fbff4fd3700  0 log [WRN] : slow request 
>> 120.167272 seconds old, received at 2014-08-05 15:31:03.313858: 
>> osd_op(client.190830.0:176 
>> benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304] 
>> 7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
>> 
>> but when I look on osd.2 I only get cryptic messages like:
>> 
>> 2014-08-05 14:39:34.708788 7ff5d36f4700  0 -- 10.100.245.22:6800/3540 >> 
>> 10.100.245.24:6800/3540 pipe(0x54ada00 sd=157 :6800 s=2 pgs=26 cs=5 l=0 
>> c=0x5409340).fault with nothing to send, going to standby
>> 2014-08-05 14:39:35.594447 7ff5d0ecc700  0 -- 10.100.245.22:6800/3540 >> 
>> 10.100.245.25:6800/3551 pipe(0x41fd500 sd=141 :60790 s=2 pgs=21 cs=5 l=0 
>> c=0x51bd960).fault with nothing to send, going to standby
>> 2014-08-05 14:39:37.594901 7ff5d40fe700  0 -- 10.100.245.22:6800/3540 >> 
>> 10.100.245.25:6802/3709 pipe(0x54adc80 sd=149 :35693 s=2 pgs=24 cs=5 l=0 
>> c=0x51bee00).fault with nothing to send, going to standby
>> 2014-08-05 14:39:53.891172 7ff5d15d3700  0 -- 10.100.245.22:6800/3540 >> 
>> 10.100.245.24:6802/3694 pipe(0x54a8500 sd=137 :60823 s=2 pgs=24 cs=5 l=0 
>> c=0x5409080).fault with nothing to send, going to standby
>> 2014-08-05 14:40:01.410307 7ff5d4905700  0 

Re: [ceph-users] Is possible to use Ramdisk for Ceph journal ?

2014-08-06 Thread Daniel Swarbrick
On 06/08/14 13:07, debian Only wrote:
> Thanks for your reply.
> I have found and test a way myself.. and now share to others
>
>
> >Begin>>>  On Debian >>>
> root@ceph01-vm:~# modprobe brd rd_nr=1 rd_size=4194304 max_part=0
> root@ceph01-vm:~# mkdir /mnt/ramdisk
> root@ceph01-vm:~# mkfs.btrfs /dev/ram0

You should avoid creating filesystems on top of the ramdisk for the
journal. Either create a single ramdisk, and create partitions on it for
each journal, or create multiple ramdisks and use each ramdisk whole.
Put the journals on these raw partitions / block devs. Using a
filesystem (especially one such as btrfs) on top of a ramdisk just
creates unnecessary overhead.

Alternatively use a tmpfs instead of ramdisk. Just create a single tmpfs
that is big enough to hold all your journals, then symlink each
/var/lib/ceph/osd/ceph-XX/journal to a unique file on the tmpfs.

Beware however that tmpfs _can_ be swapped out to disk, if the system
starts to run low on physical memory.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph --status Missing keyring

2014-08-06 Thread O'Reilly, Dan
Any idea what may be the issue here?

[ceph@tm1cldcphal01 ~]$ ceph --status
2014-08-06 07:53:21.767255 7fe31fd1e700 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication
2014-08-06 07:53:21.767263 7fe31fd1e700  0 librados: client.admin 
initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound
[ceph@tm1cldcphal01 ~]$ ll
total 372
-rw--- 1 ceph ceph 71 Aug  5 21:07 ceph.bootstrap-mds.keyring
-rw--- 1 ceph ceph 71 Aug  5 21:07 ceph.bootstrap-osd.keyring
-rw--- 1 ceph ceph 63 Aug  5 21:07 ceph.client.admin.keyring
-rw--- 1 ceph ceph289 Aug  5 21:01 ceph.conf
-rw--- 1 ceph ceph 355468 Aug  6 07:53 ceph.log
-rw--- 1 ceph ceph 73 Aug  5 21:01 ceph.mon.keyring
[ceph@tm1cldcphal01 ~]$ cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 10.18.201.110,10.18.201.76,10.18.201.77
mon_initial_members = tm1cldmonl01, tm1cldmonl02, tm1cldmonl03
fsid = 474a8905-7537-42a6-8edc-1ab9fd2ca5e4

[ceph@tm1cldcphal01 ~]$

Dan O'Reilly
UNIX Systems Administration
[cid:image001.jpg@01CFB14B.BA9C9040]
9601 S. Meridian Blvd.
Englewood, CO 80112
720-514-6293


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] Remote replication

2014-08-06 Thread Sage Weil
On Tue, 5 Aug 2014, Craig Lewis wrote:
> There currently isn't a backup tool for CephFS.  CephFS is a POSIX
> filesystem, so your normal tools should work.  It's a really large POSIX
> filesystem though, so normal tools may not scale well.

Note that CephFS does have one feature that should make efficient 
incremental backup possible: there is an 'rctime' (recursive ctime) 
attribute on all directories that will let you skip entire directory trees 
that haven't seen a modification since the last backup pass.  This is a 
ceph specific feature (nobody else has anything like it that I know of), 
so I suspect getting it supported in tools like rsync will be challenging, 
but I suspect a pretty simple tool can be constructed that makes this work 
and can be composed in a unix-ey way with other tools into a full 
solution...

sage




> 
> There's no generic replication tool for RADOS itself.  If you're using
> librados directly, you'll have to build your own replication system.
> 
> 
> 
> On Mon, Aug 4, 2014 at 10:16 AM, Patrick McGarry 
> wrote:
>   This is probably a question best asked on the ceph-user list.  I
>   have
>   added it here.
> 
> 
>   Best Regards,
> 
>   Patrick McGarry
>   Director Ceph Community || Red Hat
>   http://ceph.com  ||  http://community.redhat.com
>   @scuttlemonkey || @ceph
> 
> 
>   On Mon, Aug 4, 2014 at 2:17 AM, Santhosh Fernandes
>wrote:
>   > Hi all,
>   >
>   > Do we have continuous access or remote replication  feature in
>   ceph ? When
>   > we can get this functionality  implemented?
>   >
>   > Thank you.
>   >
>   > Regards,
>   > Santhosh
>   >
>   >
>   > ___
>   > Ceph-community mailing list
>   > ceph-commun...@lists.ceph.com
>   > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>   >
>   ___
>   ceph-users mailing list
>   ceph-users@lists.ceph.com
>   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow OSD brings down the cluster

2014-08-06 Thread Sage Weil
You can use the

 ceph osd perf

command to get recent queue latency stats for all OSDs.  With a bit 
of sorting this should quickly tell you if any OSDs are going 
significantly slower than the others.

We'd like to automate this in calamari or perhaps even in the monitor, but 
it is not immediately clear what thresholds would provide a useful 
signal without generating noise...

sage


On Wed, 6 Aug 2014, Luis Periquito wrote:

> Hi Wido,
> 
> as the backing disk is running a deep scrub it's constantly 100% busy, no
> errors though...
> 
> I'm running everything on XFS.
> 
> I had a similar feeling that was the OSD slowing down those requests. What
> would be the affected pool? ".rgw"?
> 
> thanks,
> 
> 
> On 6 August 2014 10:08, Wido den Hollander  wrote:
>   On 08/06/2014 10:43 AM, Luis Periquito wrote:
> Hi,
> 
> In the last few days I've had some issues with the
> radosgw in which all
> requests would just stop being served.
> 
> After some investigation I would go for a single
> slow OSD. I just
> restarted that OSD and everything would just go back
> to work. Every
> single time there was a deep scrub running on that
> OSD.
> 
> This has happened in several different OSDs, running
> in different
> machines. I currently have 32 OSDs on this cluster,
> with 4 OSD per host.
> 
> First thing is should this happen? A single OSD with
> issues/slowness
> shouldn't bring the whole cluster to a crawl...
> 
> 
> So, it's not the whole cluster which is slow, but the RGW is
> requesting objects which are in a PG where that OSD is currently
> primary for.
> 
> For you it seems like the whole cluster is down, but it's just 'bad
> luck' in this case.
> 
> Have you checked if there is anything wrong with the backing disk?
> 100% busy? Read errors?
> 
> You can also simply mark the osd as 'out' leave it out of the cluster.
> Re-format the whole OSD and see if it comes back.
> 
> Are you using btrfs by any chance?
> 
> Wido
> 
>   How can I make it stop happening? What kind of debug
>   information can I
>   gather to stop this from happening?
> 
>   any further thoughts?
> 
>   I'm still running Emperor (0.72.2).
> 
>   --
> 
>   Luis Periquito
> 
>   Unix Engineer
> 
> 
> Ocado.com 
> 
> 
> Head Office, Titan Court, 3 Bishop Square, Hatfield Business
> Park,
> Hatfield, Herts AL10 9NE
> 
> 
> Notice:  This email is confidential and may contain copyright
> material
> of members of the Ocado Group. Opinions and views expressed in
> this
> message may not necessarily reflect the opinions and views of
> the
> members of the Ocado Group.
> 
> If you are not the intended recipient, please notify us
> immediately and
> delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
> 
> References to the ?Ocado Group? are to Ocado Group plc (registered
> in
> England and Wales with number 7098618) and its subsidiary
> undertakings
> (as that expression is defined in the Companies Act 2006) from
> time to
> time.  The registered office of Ocado Group plc is Titan Court,
> 3
> Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10
> 9NE.
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> --
> 
> Luis Periquito
> 
> Unix Engineer
> 
> 
> [OK-jrauGL__Y524AJ8DP43U6HIu0VAlmBOvx5Sx8z30WE8uZDb_rNprI4o6OPgv-lD30rjmTyO
> UP-N5Gy_Tbjm0X4V3a_14wg8Jq_AL-fymDId6aRXh6_xBLs1KCUM797w] Ocado.com
> 
> 
> Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park, Hatfield,
> Herts AL10 9NE
> 
> 
> Notice:  This email is confidential and may contain copyright material of
> members of the Ocado Group. Opinions and views expressed in this message may
> not necessarily reflect the opinions and views of the members of the Ocado
> Group.
> 
> If you are not the intended recipient, please notify us immediately and
> delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses. 
> 
> References to the ?Ocado Group? are to Ocado Group plc (registered in England
> and Wales with number 7098618) and its subsidiary undertakings (as that
> expression is defined in the Companies Act 2006) from time to time.  The
> registered office of Ocado Group plc is Titan Court, 3 Bishops Square,
> Hatfield Business Park, Hatfield, Herts. AL10 9NE.
> 
> 
> ___

Re: [ceph-users] Ceph writes stall for long perioids with no disk/network activity

2014-08-06 Thread Christian Balzer
On Wed, 6 Aug 2014 09:19:57 -0400 Chris Kitzmiller wrote:

> On Aug 5, 2014, at 12:43 PM, Mark Nelson wrote:
> > On 08/05/2014 08:42 AM, Mariusz Gronczewski wrote:
> >> On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson
> >>  wrote:
> >>> On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
>  On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
> > I got weird stalling during writes, sometimes I got same write
> > speed for few minutes and after some time it starts stalling with
> > 0 MB/s for minutes
>  
>  I'm getting very similar behavior on my cluster. My writes start
>  well but then just kinda stop for a while and then bump along
>  slowly until the bench finishes. I've got a thread about it going
>  here called "Ceph runs great then falters".
> >>> 
> >>> This kind of behaviour often results when the journal can write much
> >>> faster than the OSD data disks.  Initially the journals will be able
> >>> to absorb the data and things will run along well, but eventually
> >>> ceph will need to stall writes if things get too out of sync.  You
> >>> may want to take a look at what's happening on the data disks during
> >>> your tests to see if there's anything that looks suspect.  Checking
> >>> the admin socket for dump_historic_ops might provide some clues as
> >>> well.
> >>> 
> >>> Mark
> >> 
> >> I did check journals already, they are on same disk as data (separate
> >> partition) and during stalls there is no traffic to both of them (like
> >> 8 iops on average with 0% io wait).
> > 
> > This may indicate that 1 OSD could be backing up with possibly most if
> > not all IOs waiting on it.  The idea here is that because data
> > placement is deterministic, if 1 OSD is slow, over time just by random
> > chance all outstanding client operations will back up on it.  Having
> > more concurrency gives you more wiggle room but may not ultimately
> > solve it.
> > 
> > It's also possible that something else may be causing the OSDs to
> > wait.  dump_historic_ops might help.
> 
> This turns out to have been my problem. Monitoring my cluster with atop
> (thanks, Christian Balzer) during one of these incidents found that a
> single HDD (out of 90) was pegged to 100% utilization. I replaced the
> drive and have since written over 20TB of data to my RBD device without
> issue.
> 
No worries, I'm happy that it helped and turned out to be the most likely
suspect. 

Now your disks don't have these SMART parameters that my equivalent
Toshiba ones have:
---
# smartctl -a /dev/sdg |grep Perfor
  2 Throughput_Performance  0x0005   139   139   054Pre-fail  Offline  
-   72
  8 Seek_Time_Performance   0x0005   117   117   020Pre-fail  Offline  
-   36
---

And I wouldn't trust them entirely as in base my judgment of a disk just on
those, but they are a good start to see if a disk is probably
underperforming or not.

With the previous generation of Seagates I had disks that showed no signs
of trouble with the available SMART parameters, but when testing them they
were performing at as little as 60% of "healthy" one.

You might want to cobble up a script that (when your cluster is at idle,
at steady state or offline) tests the speeds of each and every disk.

> I'm not sure I fully understand what's going on when this happens but it
> is pretty clear that it isn't happening any more. 

> It would be great to
> have some sort of warning to say that the load on a single disk is
> disproportionate to the rest of the cluster.

While I agree, doing that in a "sensible" way might be quite hard. 
The high load of a single OSD is likely to be the cause of some problem
(disk, link, controller, etc) but it  could also be just bad luck at the
poker table that is CRUSH. 
As in, by pure chance your most I/O intensive VMs or RGW or whatever are
hitting the same PG(s), creating a hot spot. If all the action happens
within a 4MB Ceph object size, not THAT unlikely either. 

If you scour the archives of this ML you'll find some people graphing each
and every OSD, node and more. 
Tedious, but at 90 HDDs or more probably a very good idea.

[snip]
> >> 
> >> I've checked for network or IO load on every node and they are just
> >> not doing anything, no kernel errors, and those nodes worked fine
> >> under load when they were us
> > 
> > I'm guessing the issue is probably going to be more subtle than that
> > unfortunately.  At least based on prior issues, it seems like often
> > something is causing latency in some part of the system and when that
> > happens it can have very far-reaching effects.
> 
> 
> I've often wished for some sort of bottleneck finder for ceph. An easy
> way for the system to say where it is experiencing critical latencies
> e.g. network, journals, osd data disks, etc. This would assist
> troubleshooting and initial deployments immensely.

As mentioned above, it's tricky. 
Most certainly desirable, but the ole Mark I eyeball and wetware is quite
good at spotting thes

[ceph-users] librados: client.admin authentication error

2014-08-06 Thread O'Reilly, Dan
Anybody know why this error occurs, and a solution?

[ceph@tm1cldcphal01 ~]$ ceph --version
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
[ceph@tm1cldcphal01 ~]$ ceph --status
2014-08-06 08:55:13.168770 7f5527929700  0 librados: client.admin 
authentication error (95) Operation not supported
Error connecting to cluster: Error

Dan O'Reilly
UNIX Systems Administration
[cid:image001.jpg@01CFB154.40490450]
9601 S. Meridian Blvd.
Englewood, CO 80112
720-514-6293


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd tuning?

2014-08-06 Thread Mark Nelson

On 08/05/2014 06:19 PM, Mark Kirkwood wrote:

On 05/08/14 23:44, Mark Nelson wrote:

On 08/05/2014 02:48 AM, Mark Kirkwood wrote:

On 05/08/14 03:52, Tregaron Bayly wrote:

Does anyone have any insight on how we can tune librbd to perform
closer
to the level of the rbd kernel module?

In our lab we have a four node cluster with 1GbE public network and
10GbE cluster network.  A client node connects to the public network
with 10GbE.

When doing benchmarks on the client using the kernel module we get
decent performance and can cause the OSD nodes to max out their 1GbE
link at peak servicing the requests:

 tx  rx
max  833.66 Mbit/s  |   639.44 Mbit/s
max  938.06 Mbit/s  |   707.35 Mbit/s
max  846.78 Mbit/s  |   702.04 Mbit/s
max  790.66 Mbit/s  |   621.92 Mbit/s

However, using librbd we only get about 30% of performance and I can
see
that it doesn't seem to generate requests fast enough to max out the
links on OSD nodes:

max  309.74 Mbit/s  |   196.77 Mbit/s
max  300.15 Mbit/s  |   154.38 Mbit/s
max  263.06 Mbit/s  |   154.38 Mbit/s
max  368.91 Mbit/s  |   234.38 Mbit/s

I know that I can play with cache settings to help give the client
better service on hits, but I'm wondering how I can soup up librbd so
that it can take advantage of more of the speed available in the
cluster.  It seems like using librbd will leave a lot of the resources
idle.



Hi Tregaron,

I'm guessing that in the librbd case you are injecting the volume into a
VM before running your tests - might be interesting to see your libvirt
XML for the VM... in particular the 'cache' setting for the rbd volume.
If this are not set or is 'default' then changing to 'none' will
probably be significantly faster. In addition adding:

io='native'

may give a bit of  a boost too!


Oh, that reminds me, also make sure to use the virtio bus instead of ide
or something else.  That can make a very large performance difference.



Yes, good point Mark (man this plethora of Marks is confusing...). That
reminds me, we currently have some libvirt configs in the docs that use

bus='ide'

...we should probably weed 'em out - or at least mention that vertio is
the preferred bus (e.g http://ceph.com/docs/master/rbd/libvirt/#summary)


ugh, I thought we had gotten rid of all of those.  Good catch.

Mark




Cheers

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd tuning?

2014-08-06 Thread Sage Weil
On Wed, 6 Aug 2014, Mark Nelson wrote:
> On 08/05/2014 06:19 PM, Mark Kirkwood wrote:
> > On 05/08/14 23:44, Mark Nelson wrote:
> > > On 08/05/2014 02:48 AM, Mark Kirkwood wrote:
> > > > On 05/08/14 03:52, Tregaron Bayly wrote:
> > > > > Does anyone have any insight on how we can tune librbd to perform
> > > > > closer
> > > > > to the level of the rbd kernel module?
> > > > > 
> > > > > In our lab we have a four node cluster with 1GbE public network and
> > > > > 10GbE cluster network.  A client node connects to the public network
> > > > > with 10GbE.
> > > > > 
> > > > > When doing benchmarks on the client using the kernel module we get
> > > > > decent performance and can cause the OSD nodes to max out their 1GbE
> > > > > link at peak servicing the requests:
> > > > > 
> > > > >  tx  rx
> > > > > max  833.66 Mbit/s  |   639.44 Mbit/s
> > > > > max  938.06 Mbit/s  |   707.35 Mbit/s
> > > > > max  846.78 Mbit/s  |   702.04 Mbit/s
> > > > > max  790.66 Mbit/s  |   621.92 Mbit/s
> > > > > 
> > > > > However, using librbd we only get about 30% of performance and I can
> > > > > see
> > > > > that it doesn't seem to generate requests fast enough to max out the
> > > > > links on OSD nodes:
> > > > > 
> > > > > max  309.74 Mbit/s  |   196.77 Mbit/s
> > > > > max  300.15 Mbit/s  |   154.38 Mbit/s
> > > > > max  263.06 Mbit/s  |   154.38 Mbit/s
> > > > > max  368.91 Mbit/s  |   234.38 Mbit/s
> > > > > 
> > > > > I know that I can play with cache settings to help give the client
> > > > > better service on hits, but I'm wondering how I can soup up librbd so
> > > > > that it can take advantage of more of the speed available in the
> > > > > cluster.  It seems like using librbd will leave a lot of the resources
> > > > > idle.
> > > > 
> > > > 
> > > > Hi Tregaron,
> > > > 
> > > > I'm guessing that in the librbd case you are injecting the volume into a
> > > > VM before running your tests - might be interesting to see your libvirt
> > > > XML for the VM... in particular the 'cache' setting for the rbd volume.
> > > > If this are not set or is 'default' then changing to 'none' will
> > > > probably be significantly faster. In addition adding:
> > > > 
> > > > io='native'
> > > > 
> > > > may give a bit of  a boost too!
> > > 
> > > Oh, that reminds me, also make sure to use the virtio bus instead of ide
> > > or something else.  That can make a very large performance difference.
> > > 
> > 
> > Yes, good point Mark (man this plethora of Marks is confusing...). That
> > reminds me, we currently have some libvirt configs in the docs that use
> > 
> > bus='ide'
> > 
> > ...we should probably weed 'em out - or at least mention that vertio is
> > the preferred bus (e.g http://ceph.com/docs/master/rbd/libvirt/#summary)
> 
> ugh, I thought we had gotten rid of all of those.  Good catch.

BTW, do we still need to use something != virtio in order for 
trim/discard?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd tuning?

2014-08-06 Thread Christian Balzer
On Wed, 6 Aug 2014 08:05:33 -0700 (PDT) Sage Weil wrote:

> On Wed, 6 Aug 2014, Mark Nelson wrote:
> > On 08/05/2014 06:19 PM, Mark Kirkwood wrote:
> > > On 05/08/14 23:44, Mark Nelson wrote:
> > > > On 08/05/2014 02:48 AM, Mark Kirkwood wrote:
> > > > > On 05/08/14 03:52, Tregaron Bayly wrote:
> > > > > > Does anyone have any insight on how we can tune librbd to
> > > > > > perform closer
> > > > > > to the level of the rbd kernel module?
> > > > > > 
> > > > > > In our lab we have a four node cluster with 1GbE public
> > > > > > network and 10GbE cluster network.  A client node connects to
> > > > > > the public network with 10GbE.
> > > > > > 
> > > > > > When doing benchmarks on the client using the kernel module we
> > > > > > get decent performance and can cause the OSD nodes to max out
> > > > > > their 1GbE link at peak servicing the requests:
> > > > > > 
> > > > > >  tx  rx
> > > > > > max  833.66 Mbit/s  |   639.44 Mbit/s
> > > > > > max  938.06 Mbit/s  |   707.35 Mbit/s
> > > > > > max  846.78 Mbit/s  |   702.04 Mbit/s
> > > > > > max  790.66 Mbit/s  |   621.92 Mbit/s
> > > > > > 
> > > > > > However, using librbd we only get about 30% of performance and
> > > > > > I can see
> > > > > > that it doesn't seem to generate requests fast enough to max
> > > > > > out the links on OSD nodes:
> > > > > > 
> > > > > > max  309.74 Mbit/s  |   196.77 Mbit/s
> > > > > > max  300.15 Mbit/s  |   154.38 Mbit/s
> > > > > > max  263.06 Mbit/s  |   154.38 Mbit/s
> > > > > > max  368.91 Mbit/s  |   234.38 Mbit/s
> > > > > > 
> > > > > > I know that I can play with cache settings to help give the
> > > > > > client better service on hits, but I'm wondering how I can
> > > > > > soup up librbd so that it can take advantage of more of the
> > > > > > speed available in the cluster.  It seems like using librbd
> > > > > > will leave a lot of the resources idle.
> > > > > 
> > > > > 
> > > > > Hi Tregaron,
> > > > > 
> > > > > I'm guessing that in the librbd case you are injecting the
> > > > > volume into a VM before running your tests - might be
> > > > > interesting to see your libvirt XML for the VM... in particular
> > > > > the 'cache' setting for the rbd volume. If this are not set or
> > > > > is 'default' then changing to 'none' will probably be
> > > > > significantly faster. In addition adding:
> > > > > 
> > > > > io='native'
> > > > > 
> > > > > may give a bit of  a boost too!
> > > > 
> > > > Oh, that reminds me, also make sure to use the virtio bus instead
> > > > of ide or something else.  That can make a very large performance
> > > > difference.
> > > > 
> > > 
> > > Yes, good point Mark (man this plethora of Marks is confusing...).
> > > That reminds me, we currently have some libvirt configs in the docs
> > > that use
> > > 
> > > bus='ide'
> > > 
> > > ...we should probably weed 'em out - or at least mention that vertio
> > > is the preferred bus (e.g
> > > http://ceph.com/docs/master/rbd/libvirt/#summary)
> > 
> > ugh, I thought we had gotten rid of all of those.  Good catch.
> 
> BTW, do we still need to use something != virtio in order for 
> trim/discard?
> 

AFAIK only IDE and virtio-scsi work with/for TRIM and DISCARD.

Never mind the sorry state of the kernelspace interface. 

Christian


> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd tuning?

2014-08-06 Thread Tregaron Bayly
On Wed, 2014-08-06 at 08:05 -0700, Sage Weil wrote:
> BTW, do we still need to use something != virtio in order for 
> trim/discard?

This was also my first concern when virtio was suggested.  We were using
ide primarily so we could take advantage of discard.  The vms we will be
supporting are more pets than cattle so we'll want to reclaim any space
they're not using.

For the record I am getting significantly better performance on my fio
tests with io='native' and bus='virtio' changes in the guest.  vnstat on
the OSD nodes still doesn't show the peak throughput numbers that I get
using the rbd kernel module, but it's in the neighborhood at any rate.

I see that the latest version of fio can use rbd directly so maybe I'll
run my tests that way so I can separate librbd performance from qemu
performance.

Thanks for all the help!

Tregaron

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy disk activate error msg

2014-08-06 Thread German Anders

Hi to all,
 I'm having some issues while trying to deploy a osd with btrfs:

ceph@cephdeploy01:~/ceph-deploy$ ceph-deploy disk activate --fs-type 
btrfs cephosd02:sdd1:/dev/sde1
[ceph_deploy.cli][INFO  ] Invoked (1.4.0): /usr/bin/ceph-deploy disk 
activate --fs-type btrfs cephosd02:sdd1:/dev/sde1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
cephosd02:/dev/sdd1:/dev/sde1

[cephosd02][DEBUG ] connected to host: cephosd02
[cephosd02][DEBUG ] detect platform information from remote host
[cephosd02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host cephosd02 disk /dev/sdd1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[cephosd02][INFO  ] Running command: sudo ceph-disk-activate 
--mark-init upstart --mount /dev/sdd1
[cephosd02][WARNIN] 2014-08-06 11:22:02.106327 7f0188c96700  0 
librados: client.bootstrap-osd authentication error (1) Operation not 
permitted

[cephosd02][WARNIN] Error connecting to cluster: PermissionError
[cephosd02][WARNIN] ERROR:ceph-disk:Failed to activate
[cephosd02][WARNIN] ceph-disk: Error: ceph osd create failed: Command 
'/usr/bin/ceph' returned non-zero exit status 1:
[cephosd02][ERROR ] RuntimeError: command returned non-zero exit 
status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: 
ceph-disk-activate --mark-init upstart --mount /dev/sdd1


It seems that it has something to do with the permissions, I've also 
try to run the command manually on the osd server, but getting the 
same error message. Any ideas?


Thanks in advance,

Best regards,


German Anders

















___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy disk activate error msg

2014-08-06 Thread Alfredo Deza
On Wed, Aug 6, 2014 at 11:23 AM, German Anders  wrote:
> Hi to all,
>   I'm having some issues while trying to deploy a osd with btrfs:
>
> ceph@cephdeploy01:~/ceph-deploy$ ceph-deploy disk activate --fs-type btrfs
> cephosd02:sdd1:/dev/sde1
> [ceph_deploy.cli][INFO  ] Invoked (1.4.0): /usr/bin/ceph-deploy disk
> activate --fs-type btrfs cephosd02:sdd1:/dev/sde1
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> cephosd02:/dev/sdd1:/dev/sde1
> [cephosd02][DEBUG ] connected to host: cephosd02
> [cephosd02][DEBUG ] detect platform information from remote host
> [cephosd02][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] activating host cephosd02 disk /dev/sdd1
> [ceph_deploy.osd][DEBUG ] will use init type: upstart
> [cephosd02][INFO  ] Running command: sudo ceph-disk-activate --mark-init
> upstart --mount /dev/sdd1
> [cephosd02][WARNIN] 2014-08-06 11:22:02.106327 7f0188c96700  0 librados:
> client.bootstrap-osd authentication error (1) Operation not permitted
> [cephosd02][WARNIN] Error connecting to cluster: PermissionError
> [cephosd02][WARNIN] ERROR:ceph-disk:Failed to activate
> [cephosd02][WARNIN] ceph-disk: Error: ceph osd create failed: Command
> '/usr/bin/ceph' returned non-zero exit status 1:
> [cephosd02][ERROR ] RuntimeError: command returned non-zero exit status: 1
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
> ceph-disk-activate --mark-init upstart --mount /dev/sdd1

Can you try with the latest ceph-deploy (1.5.10 as of this writing) ?

And then paste the output of that, hopefully this is something that
was addressed!
>
> It seems that it has something to do with the permissions, I've also try to
> run the command manually on the osd server, but getting the same error
> message. Any ideas?
>
> Thanks in advance,
>
> Best regards,
>
>
> German Anders
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow OSD brings down the cluster

2014-08-06 Thread Mark Nelson

On 08/06/2014 03:43 AM, Luis Periquito wrote:

Hi,

In the last few days I've had some issues with the radosgw in which all
requests would just stop being served.

After some investigation I would go for a single slow OSD. I just
restarted that OSD and everything would just go back to work. Every
single time there was a deep scrub running on that OSD.

This has happened in several different OSDs, running in different
machines. I currently have 32 OSDs on this cluster, with 4 OSD per host.

First thing is should this happen? A single OSD with issues/slowness
shouldn't bring the whole cluster to a crawl...


When a client is writing data out to the cluster, it will issue some 
number of operations that it can have in flight at once.  This has to be 
bound at some level to avoid running out of memory.  Ceph will 
distribute those writes to some number of PGs in a psuedo-random like 
way, and those PGs map to specific OSDs where the data will be placed. 
One of the big advantages of crush is that it lets the client figure 
this mapping out itself based on the object name and the cluster 
topology, so you remove a centralized allocation table lookup from the 
data path which can be a huge win vs other large-scale distributed systems.


The downside is that it means that in a setup where you have 1 disk 
behind each OSD (typically the best setup for Ceph right now), every 
disk will receive a relatively even (or potentially weighted) percentage 
of the writes regardless of how fast/slow/busy it is.  If a single OSD 
is slower than the others, over time it is likely to accumulate enough 
outstanding IOs that eventually nearly every client IO will be waiting 
on that OSD.  The rest of the OSDs in the cluster will only get new IOs 
once an IO completes on the slow one.


Some day, maybe after the keyfilestore is implemented, I think it would 
be a very interesting experiment to try a hybrid approach where you use 
crush to distribute data to nodes, but behind the OSDs you use something 
like an allocation table and dynamically change the ratio of writes to 
different filesystems or key/value stores based on how slow/busy they 
are (especially during compaction, directory splitting, scrub, or if 
there's a really hot object on a specific disk).  You can still avoid 
the network allocation table lookup, but potentially within the node, if 
you can do it fast enough, you might be able to gain some level of 
adaptability and (hopefully) more consistent throughput.


Mark



How can I make it stop happening? What kind of debug information can I
gather to stop this from happening?

any further thoughts?

I'm still running Emperor (0.72.2).

--

Luis Periquito

Unix Engineer


Ocado.com 


Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
Hatfield, Herts AL10 9NE


Notice:  This email is confidential and may contain copyright material
of members of the Ocado Group. Opinions and views expressed in this
message may not necessarily reflect the opinions and views of the
members of the Ocado Group.

If you are not the intended recipient, please notify us immediately and
delete all copies of this message. Please note that it is your
responsibility to scan this message for viruses.

References to the “Ocado Group” are to Ocado Group plc (registered in
England and Wales with number 7098618) and its subsidiary undertakings
(as that expression is defined in the Companies Act 2006) from time to
time.  The registered office of Ocado Group plc is Titan Court, 3
Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy disk activate error msg

2014-08-06 Thread Alfredo Deza
Adding ceph-users, back to the discussion.

Can you tell me if `ceph-deploy admin cephosd02` was what worked or if
it was the scp'ing of keys?

On Wed, Aug 6, 2014 at 12:36 PM, German Anders  wrote:
> It work!!! :) thanks a lot Alfredo. I want to ask also if you know how can I
> remove a osd server from the osd tree:
>
> ceph@cephmon01:~$ ceph osd tree
> # idweighttype nameup/downreweight
> -124.57root default
> -221.84host cephosd01
> 02.73osd.0down0
> 12.73osd.1down0
> 22.73osd.2down0
> 32.73osd.3down0
> 42.73osd.4down0
> 52.73osd.5down0
> 62.73osd.6down0
> 72.73osd.7down0
> -30host cephosd03
> -42.73host cephosd02
> 82.73osd.8down0
>
> I want to remove host "cephosd03" from the tree
>
> Thanks a lot!!
>
> Best regards,
>
>
> German Anders
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --- Original message ---
> Asunto: Re: [ceph-users] ceph-deploy disk activate error msg
> De: Alfredo Deza 
> Para: German Anders 
> Fecha: Wednesday, 06/08/2014 13:32
>
> On Wed, Aug 6, 2014 at 12:23 PM, German Anders  wrote:
>
> Unfortunatly, after upgrade the ceph-deploy version I'm still facing the
> problem:
>
>
> It is possible that you may have invalid keyrings... have you
> tried/retried the setup more than once? Or is that one host
> complaining
> from scratch?
>
> You could try and see if copying the keys from the monitor node helps:
>
> 1) scp /etc/ceph/ceph.c lient.admin.keyring cephosd02:/etc/ceph
> 2) scp /var/lib/ceph/bootstrap-osd/ceph.keyring
> cephosd02:/var/lib/ceph/bootstrap-osd
>
> I think you could try with ceph-deploy as well, with `ceph-deploy
> admin cephosd02`
>
>
>
>
> ceph@cephdeploy01:~/ceph-deploy$ sudo dpkg -s ceph-deploy
>
> Package: ceph-deploy
> Status: install ok installed
> Priority: optional
> Section: admin
> Installed-Size: 437
> Maintainer: Sage Weil 
> Architecture: all
> Version: 1.5.10trusty
> Depends: python (>= 2.7), python-argparse, python-setuptools, python (<<
> 2.8), python:any (>= 2.7.1-0ubuntu2), python-pkg-resources
> Description: Ceph-deploy is an easy to use configuration tool
>
>for the Ceph distributed storage system.
>.
>This package includes the programs and libraries to support
>simple ceph cluster deployment.
> Homepage: http://ceph.com/
>
>
>
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> cephosd02:/dev/sdd1:/dev/sde1
> [cephosd02][DEBUG ] connected to host: cephosd02
> [cephosd02][DEBUG ] detect platform information from remote host
> [cephosd02][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] activating host cephosd02 disk /dev/sdd1
> [ceph_deploy.osd][DEBUG ] will use init type: upstart
> [cephosd02][INFO ] Running command: sudo ceph-disk -v activate --mark-init
> upstart --mount /dev/sdd1
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE
> -ovalue -- /dev/sdd1
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
> --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
> --cluster=ceph --name=osd. --lookup osd_fs_mount_options_btrfs
> [cephosd02][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdd1 on
> /var/lib/ceph/tmp/mnt.tG9uYV with options noatime,user_subvol_rm_allowed
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /bin/mount -t btrfs -o
> noatime,user_subvol_rm_allowed -- /dev/sdd1 /var/lib/ceph/tmp/mnt.tG9uYV
> [cephosd02][WARNIN] DEBUG:ceph-disk:Cluster uuid is
> 40137481-b22c-4b47-b6f7-9f160e81d896
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd
> --cluster=ceph --show-config-value=fsid
> [cephosd02][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
> [cephosd02][WARNIN] DEBUG:ceph-disk:OSD uuid is
> 2996a04b-3966-4a9c-ac91-5639c998b40a
> [cephosd02][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster
> ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise
> 2996a04b-3966-4a9c-ac91-5639c998b40a
> [cephosd02][WARNIN] 2014-08-06 12:22:15.287950 7f2bfc436700 0 librados:
> client.bootstrap-osd authentication error (1) Operation not permitted
>
> [cephosd02][WARNIN] Error connecting to cluster: PermissionError
> [cephosd02][WARNIN] ERROR:ceph-disk:Failed to activate
> [cephosd02][WARNIN] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.tG9uYV
> [cephosd02][WARNIN] INFO:ceph-disk:Running command: /bin/umount --
> /var/lib/ceph/tmp/mnt.tG9uYV
>
> [cephosd02][WARNIN] ceph-disk: Error: ceph osd create failed: Command
> '/usr/bin/ceph' returned non-zero exit status 1:
> [cephosd02][ERROR ] RuntimeError: command returned non-zero exit status: 1

Re: [ceph-users] ceph-deploy disk activate error msg

2014-08-06 Thread German Anders
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 10th Anniversary T-Shirts for Contributors

2014-08-06 Thread Patrick McGarry
Hey cephers,

Just wanted to let folks know that as a way of saying thank you for 10
years of contributions and growth on the Ceph project we'll be
shipping a free limited edition 10th anniversary t-shirt to anyone who
has contributed to the project (and wants one).  All you have to do to
get your shirt (with all the names of the contributors on the back) is
fill out this google form:

https://docs.google.com/forms/d/1Pzs-bp7g1Q52rqCCNOE-i5GDvkgBgde8-foX9O2gLUg/viewform?usp=send_form

Please let me know if you have any questions or problems getting your
info up there.  Thanks!




Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack Havana root fs resize don't work

2014-08-06 Thread Jeremy Hanmer
And you're using cloud-init in these cases, or are you executing
growrootfs via some other means?

If you're using cloud-init, you should see some useful messages in
/var/log/cloud-init.log (particularly on debian/ubuntu; I've found
centos' logs to not be as helpful).

Also, if you're using cloud-init, you want to make sure that you've
got it configured to do the right thing.  That changes a bit depending
on the version, but with debian wheezy, I make sure to have "growpart"
and "resizefs" in cloud_init_modules.  Also, I don't believe it's
strictly necessary, but I add this too because I'm not a big fan of
implicit defaults:

growpart:
  mode: auto
  devices: ['/']

resize_rootfs: True

On Wed, Aug 6, 2014 at 12:45 AM, Hauke Bruno Wollentin
 wrote:
> Hi,
>
> 1) I have flavors like 1 vCPU, 2GB memory, 20GB root disk. No swap + no
> ephemeral disk. Then I just create an instance via horizon choosing an image +
> a flavor.
>
> 2) OpenStack itselfs runs on Ubuntu 12.04.4 LTS, for the instances I have some
> Ubuntu 12.04/14.04s, Debians and CentOS'.
>
> 3) In the spawned instances I see that the partition wasn't resized.
> /proc/partions + fdisk -l show the size of the image partition, not the
> instance partition specified by the flavor.
>
>
>
> ---
> original message
> timestamp: Tuesday, August 05, 2014 03:50:55 PM
> from: Jeremy Hanmer 
> to: Dinu Vlad 
> cc: ceph-users@lists.ceph.com 
> subject: Re: [ceph-users] Openstack Havana root fs resize don't work
> message id:  ch8qy...@mail.gmail.com>
>
>> This is *not* a case of that bug.  That LP bug is referring to an
>> issue with the 'nova resize' command and *not* with an instance
>> resizing its own root filesystem.  I can confirm that the latter case
>> works perfectly fine in Havana if you have things configured properly.
>>
>> A few questions:
>>
>> 1) What workflow are you using?  (Create a volume from an image ->
>> boot from that volume, ceps-backed ephemeral, or some other patch?)
>> 2) What OS/release are you running?  I've gotten it to work with
>> recent versions Centos, Debian, Fedora, and Ubuntu.
>> 3) What are you actually seeing on the image?  Is the *partition* not
>> being resized at all (as referenced by /proc/partions), or is it just
>> the filesystem that isn't being resized (as referenced by df)?
>>
>> On Tue, Aug 5, 2014 at 3:41 PM, Dinu Vlad  wrote:
>> > There’s a known issue with Havana’s rbd driver in nova and it has nothing
>> > to do with ceph. Unfortunately, it is only fixed in icehouse. See
>> > https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1219658 for more
>> > details.
>> >
>> > I can confirm that applying the patch manually works.
>> >
>> > On 05 Aug 2014, at 11:00, Hauke Bruno Wollentin  bruno.wollen...@innovo-cloud.de> wrote:
>> >> Hi folks,
>> >>
>> >> we use Ceph Dumpling as storage backend for Openstack Havana. However our
>> >> instances are not able to resize its root filesystem.
>> >>
>> >> This issue just occurs for the virtual root disk. If we start instances
>> >> with an attached volume, the virtual volume disks size is correct.
>> >>
>> >> Our infrastructure:
>> >> - 1 OpenStack Controller
>> >> - 1 OpenStack Neutron Node
>> >> - 1 OpenStack Cinder Node
>> >> - 4 KVM Hypervisors
>> >> - 4 Ceph-Storage Nodes including mons
>> >> - 1 dedicated mon
>> >>
>> >> As OS we use Ubuntu 12.04.
>> >>
>> >> Our cinder.conf on Cinder Node:
>> >>
>> >> volume_driver = cinder.volume.driver.RBDDriver
>> >> rbd_pool = volumes
>> >> rbd_secret = SECRET
>> >> rbd_user = cinder
>> >> rbd_ceph_conf = /etc/ceph/ceph.conf
>> >> rbd_max_clone_depth = 5
>> >> glance_api_version = 2
>> >>
>> >> Our nova.conf on hypervisors:
>> >>
>> >> libvirt_images_type=rbd
>> >> libvirt_images_rbd_pool=volumes
>> >> libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf
>> >> rbd_user=admin
>> >> rbd_secret_uuid=SECRET
>> >> libvirt_inject_password=false
>> >> libvirt_inject_key=false
>> >> libvirt_inject_partition=-2
>> >>
>> >> In our instances we see that the virtual disk isn't _updated_ in its
>> >> size. It still uses the size specified in the images.
>> >>
>> >> We use growrootfs in our images as described in the documentation +
>> >> verified its functionality (we switched temporarly to LVM as the storage
>> >> backend, that works).
>> >>
>> >> Our images are manually created regarding the documention (means only 1
>> >> partition, no swap, cloud-utils etc.).
>> >>
>> >> Does anyone has some hints how to solve this issue?
>> >>
>> >> Cheers,
>> >> Hauke
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi

[ceph-users] Ceph can't seem to forget.

2014-08-06 Thread Sean Sullivan
I forgot to register before posting so reposting.

I think I have a split issue or I can't seem to get rid of these objects.
How can I tell ceph to forget the objects and revert?

How this happened is that due to the python 2.7.8/ceph bug ( a whole rack
of ceph went town (it had ubuntu 14.10 and that seemed to have 2.7.8 before
14.04). I didn't know what was going on and tried re-installing which
killed the vast majority of the data. 2/3. The drives are gone and the data
on them is lost now.

I tried deleting them via rados but that didn't seem to work either and
just froze there.  Any help would be much appreciated.


Pastebin data below
http://pastebin.com/HU8yZ1ae


cephuser@host:~/CephPDC$ ceph --version
ceph version 0.82-524-gbf04897 (bf048976f50bd0142f291414ea893ef0f205b51a)

cephuser@host:~/CephPDC$ ceph -s
cluster 9e0a4a8e-91fa-4643-887a-c7464aa3fd14
 health HEALTH_WARN 2 pgs recovering; 2 pgs stuck unclean; 5 requests
are blocked > 32 sec; recovery 478/15386946 objects degraded (0.003%);
23/5128982 unfound (0.000%)
 monmap e9: 5 mons at {kg37-12=
10.16.0.124:6789/0,kg37-17=10.16.0.129:6789/0,kg37-23=10.16.0.135:6789/0,kg37-28=10.16.0.140:6789/0,kg37-5=10.16.0.117:6789/0},
election epoch 1450, quorum 0,1,2,3,4 kg37-5,kg37-12,kg37-17,kg37-23,kg37-28
 mdsmap e100: 1/1/1 up {0=kg37-5=up:active}
 osdmap e46061: 245 osds: 245 up, 245 in
  pgmap v3268915: 22560 pgs, 19 pools, 20020 GB data, 5008 kobjects
61956 GB used, 830 TB / 890 TB avail
478/15386946 objects degraded (0.003%); 23/5128982 unfound
(0.000%)
   22558 active+clean
   2 active+recovering
  client io 95939 kB/s rd, 80854 B/s wr, 795 op/s


cephuser@host:~/CephPDC$ ceph health detail
HEALTH_WARN 2 pgs recovering; 2 pgs stuck unclean; 5 requests are blocked >
32 sec; 1 osds have slow requests; recovery 478/15386946 objects degraded
(0.003%); 23/5128982 unfound (0.000%)
pg 5.f4f is stuck unclean since forever, current state active+recovering,
last acting [279,115,78]
pg 5.27f is stuck unclean since forever, current state active+recovering,
last acting [213,0,258]
pg 5.f4f is active+recovering, acting [279,115,78], 10 unfound
pg 5.27f is active+recovering, acting [213,0,258], 13 unfound
5 ops are blocked > 67108.9 sec
5 ops are blocked > 67108.9 sec on osd.279
1 osds have slow requests
recovery 478/15386946 objects degraded (0.003%); 23/5128982 unfound (0.000%)

cephuser@host:~/CephPDC$ ceph pg 5.f4f mark_unfound_lost revert
2014-08-06 12:59:42.282672 7f7d4a6fb700  0 -- 10.16.0.117:0/1005129 >>
10.16.64.29:6844/718 pipe(0x7f7d4005c120 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f7d4005c3b0).fault
2014-08-06 12:59:51.890574 7f7d4a4f9700  0 -- 10.16.0.117:0/1005129 >>
10.16.64.29:6806/7875 pipe(0x7f7d4005f180 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f7d4005fae0).fault
pg has no unfound objects
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fresh deploy of ceph 0.83 has OSD down

2014-08-06 Thread Mark Kirkwood

Hi,

I'm doing a fresh install of ceph 0.83 (src build) to an Ubuntu 14.04 VM 
using ceph-deploy 1.59. Everything goes well until the osd creation, 
which fails to start with a journal open error. The steps are shown 
below (ceph is the deploy target host):



(ceph1) $ uname -a
Linux ceph1 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 
2014 x86_64 x86_64 x86_64 GNU/Linux

(ceph1) $ ceph -v
ceph version 0.83-399-gf77449c (f77449cb4bc6dff36264af6983d345bab3b95c81)

$ ceph-deploy --version
1.5.9
$ ceph-deploy -v new ceph1
$ vi ceph.conf
$ cat ceph.conf
[global]
fsid = 624d6d49-c090-4bfe-a71d-a54b7e13c037
mon_initial_members = ceph1
mon_host = 192.168.122.21
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_journal_size = 2048

$ ceph-deploy -v mon create ceph1
$ ceph-deploy -v gatherkeys ceph1

$ ceph-deploy -v disk zap ceph1:/dev/vdb
$ ceph-deploy -v disk zap ceph1:/dev/vdc

$ ceph-deploy -v osd create ceph1:/dev/vdb:/dev/vdc
...
[ceph1][WARNIN] there is 1 OSD down
[ceph1][WARNIN] there is 1 OSD out

$ tail ceph.osd.0.log
2014-08-07 10:47:45.350623 7ffe95e05800  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 20: 2147483648 bytes, block size 
4096 bytes, directio = 1, aio = 1
2014-08-07 10:47:45.351364 7ffe95e05800 -1 journal read_header error 
decoding journal header
2014-08-07 10:47:45.351398 7ffe95e05800 -1 
filestore(/var/lib/ceph/osd/ceph-0) mount failed to open journal 
/var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2014-08-07 10:47:38.506876 7f36e482d800  0 ceph version 0.83-399-gf77449c (f77449cb4bc6dff36264af6983d345bab3b95c81), process ceph-osd, pid 1765
2014-08-07 10:47:38.521928 7f36e482d800  1 journal _open /dev/vdc1 fd 4: 2147483648 bytes, block size 4096 bytes, directio = 0, aio = 0
2014-08-07 10:47:41.934775 7f9544ffe800  0 ceph version 0.83-399-gf77449c (f77449cb4bc6dff36264af6983d345bab3b95c81), process ceph-osd, pid 1885
2014-08-07 10:47:41.939595 7f9544ffe800  1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) mkfs in /var/lib/ceph/tmp/mnt.UXqBUb
2014-08-07 10:47:41.939647 7f9544ffe800  1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) mkfs fsid is already set to 60c287fc-3544-4830-bec8-2d2dc4e449e5
2014-08-07 10:47:42.005465 7f9544ffe800  0 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) backend xfs (magic 0x58465342)
2014-08-07 10:47:42.005476 7f9544ffe800  1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2014-08-07 10:47:42.065586 7f9544ffe800  1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) leveldb db exists/created
2014-08-07 10:47:42.067012 7f9544ffe800  1 journal _open /var/lib/ceph/tmp/mnt.UXqBUb/journal fd 10: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-08-07 10:47:42.067312 7f9544ffe800 -1 journal check: ondisk fsid ---- doesn't match expected 60c287fc-3544-4830-bec8-2d2dc4e449e5, invalid (someone else's?) journal
2014-08-07 10:47:42.068530 7f9544ffe800  1 journal _open /var/lib/ceph/tmp/mnt.UXqBUb/journal fd 10: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-08-07 10:47:42.072536 7f9544ffe800  0 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) mkjournal created journal on /var/lib/ceph/tmp/mnt.UXqBUb/journal
2014-08-07 10:47:42.072574 7f9544ffe800  1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) mkfs done in /var/lib/ceph/tmp/mnt.UXqBUb
2014-08-07 10:47:42.072688 7f9544ffe800  0 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) backend xfs (magic 0x58465342)
2014-08-07 10:47:42.095571 7f9544ffe800  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.UXqBUb) detect_features: FIEMAP ioctl is supported and appears to work
2014-08-07 10:47:42.095595 7f9544ffe800  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.UXqBUb) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-08-07 10:47:42.107497 7f9544ffe800  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.UXqBUb) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-08-07 10:47:42.107571 7f9544ffe800  0 xfsfilestorebackend(/var/lib/ceph/tmp/mnt.UXqBUb) detect_feature: extsize is disabled by conf
2014-08-07 10:47:42.144479 7f9544ffe800  0 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-08-07 10:47:42.147178 7f9544ffe800  1 journal _open /var/lib/ceph/tmp/mnt.UXqBUb/journal fd 16: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-08-07 10:47:42.150303 7f9544ffe800  1 journal _open /var/lib/ceph/tmp/mnt.UXqBUb/journal fd 16: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-08-07 10:47:42.150665 7f9544ffe800 -1 filestore(/var/lib/ceph/tmp/mnt.UXqBUb) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-08-07 10:47:42.276297 7f9544ffe800  1 journal close /var/lib/ceph/tmp/mnt.UXqBUb/journal
2014-08-07 10:47:42.276790 7f9544ffe800 -1 created object store /var/lib/ceph/tmp/mnt.UXqBUb jo

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-08-06 Thread Kyle Bader
> Can you paste me the whole output of the install? I am curious why/how you 
> are getting el7 and el6 packages.

priority=1 required in /etc/yum.repos.d/ceph.repo entries

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph rbd volume can't remove because image still has watchers

2014-08-06 Thread 杨万元
Hi all:
we use ceph rbd with openstack ,recently there are some  dirty data in
my cinder-volume databases such as volumes status like error-deleting. So
we need manually delete this volumes。
but when I delete the volume on ceph node,ceph tell me this error

  [root@ceph-node3 ~]# rbd -p glance rm
volume-17d9397b-d6e5-45e0-80fa-4bc7b7998842
Removing image: 99% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed.
Try again after   closing/unmapping it or waiting 30s for the crashed
client to timeout.
2014-08-07 11:25:42.793275 7faf8c58b760 -1 librbd: error removing
header: (16) Device or resource busy


   I google this problem and  find this
http://comments.gmane.org/gmane.comp.file-systems.ceph.user/9767
   I did it and got this:

 [root@ceph-node3 ~]# rbd info -p glance
volume-17d9397b-d6e5-45e0-80fa-4bc7b7998842
rbd image 'volume-17d9397b-d6e5-45e0-80fa-4bc7b7998842':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.3b1464f8e96d5
format: 2
features: layering
 [root@ceph-node3 ~]# rados -p glance listwatchers
rbd_header.3b1464f8e96d5
watcher=192.168.39.116:0/1032797 client.252302 cookie=1

  192.168.39.116 is my nova compute node ,so i can't reboot this server,
  what can i do to delete this volume without reboot my  compute-node?

  my ceph version is 0.72.1.

 thanks very much!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com