Re: [ceph-users] Different flavors of storage?

2015-01-23 Thread Jason King
Hi Don,

Take a look at CRUSH settings.
http://ceph.com/docs/master/rados/operations/crush-map/

Jason

2015-01-22 2:41 GMT+08:00 Don Doerner :

> OK, I've set up 'giant' in a single-node cluster, played with a replicated
> pool and an EC pool.  All goes well so far.  Question: I have two different
> kinds of HDD in my server - some fast, 15K RPM SAS drives and some big,
> slow (5400 RPM!) SATA drives.
>
> Right now, I have OSDs on all, and when I created my pool, it got spread
> over all of these drives like peanut butter.
>
> The documentation (e.g., the documentation on cache tiering) hints that
> its possible to differentiate fast from slow devices, but for the life of
> me, I can't see how to create a pool on specific OSDs.  So it must be done
> some different way...
>
> Can someone please provide a pointer?
>
> Regards,
>
> -don-
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Different flavors of storage?

2015-01-23 Thread Luis Periquito
you have a nice howto here
http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
on how to do this with crush rules.

On Fri, Jan 23, 2015 at 6:06 AM, Jason King  wrote:

> Hi Don,
>
> Take a look at CRUSH settings.
> http://ceph.com/docs/master/rados/operations/crush-map/
>
> Jason
>
> 2015-01-22 2:41 GMT+08:00 Don Doerner :
>
>> OK, I've set up 'giant' in a single-node cluster, played with a
>> replicated pool and an EC pool.  All goes well so far.  Question: I have
>> two different kinds of HDD in my server - some fast, 15K RPM SAS drives and
>> some big, slow (5400 RPM!) SATA drives.
>>
>> Right now, I have OSDs on all, and when I created my pool, it got spread
>> over all of these drives like peanut butter.
>>
>> The documentation (e.g., the documentation on cache tiering) hints that
>> its possible to differentiate fast from slow devices, but for the life of
>> me, I can't see how to create a pool on specific OSDs.  So it must be done
>> some different way...
>>
>> Can someone please provide a pointer?
>>
>> Regards,
>>
>> -don-
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] remote storage

2015-01-23 Thread Robert Duncan
Hi All,

This is my first post, I have been using Ceph OSD in OpenStack Icehouse as part 
of the Mirantis distribution with Fuel - this is my only experience with Ceph, 
so as you can imagine - it works, but I don't really understand all of the 
technical details, I am working for a college in Ireland and we are planning on 
deploying a larger private cloud this year using the Kilo release of OpenStack 
when it matures. I am architecting the physical components and storage has 
become quite complex - Currently we use a Dell Equallogic Array and we have 
configured the cinder service to use the driver provided by Dell, the nodes in 
the data centre don't have a lot of local storage. So here is my Ceph ignorance 
laid bare


1-  I have enough compute nodes to run a ceph cluster, radosgw etc. as per 
http://ceph.com/docs/master/radosgw/

2-  I have no available local disks * - this is the problem

3-  I have a Dell Equallogic SAN and fabric (7.5k NL SAS)

4-  I have access to storage as a service from our ISP - this is an IBM 
storwise V7000- I can provision block storage and mount iscsi volumes, it's 
across town but we have a p2p layer 2 connection

The use cases will be students on a Masters in data analytics using OpenStack 
Sahara and S3 for data sets. So if I mounted remote storage or network attached 
storage would it work? Can I put Ceph directly in front of my Equallogic array 
and use ceph for cinder, glance, nova and S3? Has anyone any thoughts or 
experience on this - thanks for taking the time to read this and any input 
would be greatly appreciated.

All the best,
Rob.

The information contained and transmitted in this e-mail is confidential 
information, and is intended only for the named recipient to which it is 
addressed. The content of this e-mail may not have been sent with the authority 
of National College of Ireland. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of National College 
of Ireland. If the reader of this message is not the named recipient or a 
person responsible for delivering it to the named recipient, you are notified 
that the review, dissemination, distribution, transmission, printing or 
copying, forwarding, or any other use of this message or any part of it, 
including any attachments, is strictly prohibited. If you have received this 
communication in error, please delete the e-mail and destroy all record of this 
communication. Thank you for your assistance.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD backup and snapshot

2015-01-23 Thread Frank Yu
I'm also interested in this question. Can any body give some view point?
I wonder *Do* we really need backup the image through snapshot in an
excellent performance, reliability and scalability distributed file system?


2015-01-19 18:58 GMT+08:00 Luis Periquito :

> Hi,
>
> I'm currently creating a business case around ceph RBD, and one of the
> issues revolves around backup.
>
> After having a look at
> http://ceph.com/dev-notes/incremental-snapshots-with-rbd/ I was thinking
> on creating hourly snapshots (corporate policy) on the original cluster
> (replicated pool), and then copying these snapshots to a replica cluster
> (EC pool) located offsite.
>
> After some time we would delete the original snapshot (weeks to months),
> but we would need to maintain the replica for a lot more time (years).
>
> Will this solution work properly? Can we keep a huge amount of snapshots
> of a RBD? Potentially for each RBD in the tens of thousands, with hundreds
> of thousands possible? Has anyone been through these scenarios?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Regards
Frank Yu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-23 Thread Zoltan Arnold Nagy
Just to chime in: it will look fine, feel fine, but underneath it's 
quite easy to get VMFS corruption. Happened in our tests.
Also if you're running LIO, from time to time expect a kernel panic 
(haven't tried with the latest upstream, as I've been using

Ubuntu 14.04 on my "export" hosts for the test, so might have improved...).

As of now I would not recommend this setup without being aware of the 
risks involved.


There have been a few upstream patches getting the LIO code in better 
cluster-aware shape, but no idea if they have been merged

yet. I know RedHat has a guy on this.

On 01/21/2015 02:40 PM, Nick Fisk wrote:


Hi Jake,

Thanks for this, I have been going through this and have a pretty good 
idea on what you are doing now, however I maybe missing something 
looking through your scripts, but I’m still not quite understanding 
how you are managing to make sure locking is happening with the ESXi 
ATS SCSI command.


From this slide

http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY 
(Page 8)


It seems to indicate that for a true active/active setup the two 
targets need to be aware of each other and exchange locking 
information for it to work reliably, I’ve also watched the video from 
the Ceph developer summit where this is discussed and it seems that 
Ceph+Kernel need changes to allow this locking to be pushed back to 
the RBD layer so it can be shared, from what I can see browsing 
through the Linux Git Repo, these patches haven’t made the mainline 
kernel yet.


Can you shed any light on this? As tempting as having active/active 
is, I’m wary about using the configuration until I understand how the 
locking is working and if fringe cases involving multiple ESXi hosts 
writing to the same LUN on different targets could spell disaster.


Many thanks,

Nick

*From:*Jake Young [mailto:jak3...@gmail.com]
*Sent:* 14 January 2015 16:54
*To:* Nick Fisk
*Cc:* Giuseppe Civitella; ceph-users
*Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Yes, it's active/active and I found that VMWare can switch from path 
to path with no issues or service impact.


I posted some config files here: github.com/jak3kaj/misc 



One set is from my LIO nodes, both the primary and secondary configs 
so you can see what I needed to make unique.  The other set 
(targets.conf) are from my tgt nodes.  They are both 4 LUN configs.


Like I said in my previous email, there is no performance difference 
between LIO and tgt.  The only service I'm running on these nodes is a 
single iscsi target instance (either LIO or tgt).


Jake

On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk > wrote:


Hi Jake,

I can’t remember the exact details, but it was something to do
with a potential problem when using the pacemaker resource agents.
I think it was to do with a potential hanging issue when one LUN
on a shared target failed and then it tried to kill all the other
LUNS to fail the target over to another host. This then leaves the
TCM part of LIO locking the RBD which also can’t fail over.

That said I did try multiple LUNS on one target as a test and
didn’t experience any problems.

I’m interested in the way you have your setup configured though.
Are you saying you effectively have an active/active configuration
with a path going to either host, or are you failing the iSCSI IP
between hosts? If it’s the former, have you had any problems with
scsi locking/reservations…etc between the two targets?

I can see the advantage to that configuration as you
reduce/eliminate a lot of the troubles I have had with resources
failing over.

Nick

*From:*Jake Young [mailto:jak3...@gmail.com
]
*Sent:* 14 January 2015 12:50
*To:* Nick Fisk
*Cc:* Giuseppe Civitella; ceph-users
*Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Nick,

Where did you read that having more than 1 LUN per target causes
stability problems?

I am running 4 LUNs per target.

For HA I'm running two linux iscsi target servers that map the
same 4 rbd images. The two targets have the same serial numbers,
T10 address, etc.  I copy the primary's config to the backup and
change IPs. This way VMWare thinks they are different target IPs
on the same host. This has worked very well for me.

One suggestion I have is to try using rbd enabled tgt. The
performance is equivalent to LIO, but I found it is much better at
recovering from a cluster outage. I've had LIO lock up the kernel
or simply not recognize that the rbd images are available; where
tgt will eventually present the rbd images again.

I have been slowly adding servers and am expanding my

Re: [ceph-users] Different flavors of storage?

2015-01-23 Thread Don Doerner
These were exactly the pointers I needed, thank you both very much.

 
Regards,

-don-


On Friday, January 23, 2015 1:09 AM, Luis Periquito  wrote:
 


you have a nice howto here 
http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/ on 
how to do this with crush rules.


On Fri, Jan 23, 2015 at 6:06 AM, Jason King  wrote:

Hi Don,
>
>
>Take a look at CRUSH settings.
>http://ceph.com/docs/master/rados/operations/crush-map/
>
>
>
>Jason
>
>
>2015-01-22 2:41 GMT+08:00 Don Doerner :
>
>OK, I've set up 'giant' in a single-node cluster, played with a replicated 
>pool and an EC pool.  All goes well so far.  Question: I have two different 
>kinds of HDD in my server - some fast, 15K RPM SAS drives and some big, slow 
>(5400 RPM!) SATA drives.
>>
>>
>>Right now, I have OSDs on all, and when I created my pool, it got spread over 
>>all of these drives like peanut butter.
>>
>>
>>The documentation (e.g., the documentation on cache tiering) hints that its 
>>possible to differentiate fast from slow devices, but for the life of me, I 
>>can't see how to create a pool on specific OSDs.  So it must be done some 
>>different way...
>>
>>
>>Can someone please provide a pointer?
>>
>> 
>>Regards,
>>
>>
>>-don-
>>___
>>ceph-users mailing list
>>ceph-users@lists.ceph.com
>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Enabling non default region on existing cluster - data migration

2015-01-23 Thread Yehuda Sadeh
On Wed, Jan 21, 2015 at 7:24 PM, Mark Kirkwood
 wrote:
> I've been looking at the steps required to enable (say) multi region
> metadata sync where there is an existing RGW that has been in use (i.e non
> trivial number of buckets and objects) which been setup without any region
> parameters.
>
> Now given that the existing objects are all in the pools corresponding to
> the default (lack of) region - *not* the new region prefixed ones - is there
> are migration procedure to get them into the *new* ones?
>

One way to do it would be by defining the new region (with everything
set to the default params), and then manually modify the buckets
metadata to reflect that they reside in the new region. There are
radosgw-admin tools to allow modification of buckets metadata
(radosgw-admin metadata get bucket:, radosgw-admin metadata
get bucket.instance::, etc.).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-23 Thread Nick Fisk
Thanks for your responses guys,

 

I’ve been spending a lot of time looking at this recently and I think I’m even 
more confused than when I started. 

 

I been looking at trying to adapt a resource agent made by tiger computing 
(https://github.com/tigercomputing/ocf-lio)  to create a HA LIO failover 
target, Instead of going with the Virtual IP failover method it manipulates the 
ALUA states to present active/standby paths. It’s very complicated and am close 
to giving up.

 

What do you reckon accept defeat and go with a much simpler tgt and virtual IP 
failover solution for time being until the Redhat patches make their way into 
the kernel? 

 

From: Jake Young [mailto:jak3...@gmail.com] 
Sent: 23 January 2015 16:46
To: Zoltan Arnold Nagy
Cc: Nick Fisk; ceph-users
Subject: Re: [ceph-users] Ceph, LIO, VMWARE anyone?

 

Thanks for the feedback Nick and Zoltan,

 

I have been seeing periodic kernel panics when I used LIO.  It was either due 
to LIO or the kernel rbd mapping.  I have seen this on Ubuntu precise with 
kernel 3.14.14 and again in Ubunty trusty with the utopic kernel (currently 
3.16.0-28).  Ironically, this is the primary reason I started exploring a 
redundancy solution for my iSCSI proxy node.  So, yes, these crashes have 
nothing to do with running the Active/Active setup.

 

I am moving my entire setup from LIO to rbd enabled tgt, which I've found to be 
much more stable and gives equivalent performance.

 

I've been testing active/active LIO since July of 2014 with VMWare and I've 
never seen any vmfs corruption.  I am now convinced (thanks Nick) that it is 
possible.  The reason I have not seen any corruption may have to do with how 
VMWare happens to be configured.

 

Originally, I had made a point to use round robin path selection in the VMware 
hosts; but as I did performance testing, I found that it actually didn't help 
performance.  When the host switches iSCSI targets there is a short "spin up 
time" for LIO to get to 100% IO capability.  Since round robin switches targets 
every 30 seconds (60 seconds? I forget), this seemed to be significant.  A 
secondary goal for me was to end up with a config that required minimal tuning 
from VMWare and the target software; so the obvious choice is to leave VMWare's 
path selection at the default which is Fixed and picks the first target in 
ASCII-betical order.  That means I am actually functioning in Active/Passive 
mode.

 

Jake

 

 

 

 

On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy mailto:zol...@linux.vnet.ibm.com> > wrote:

Just to chime in: it will look fine, feel fine, but underneath it's quite easy 
to get VMFS corruption. Happened in our tests.
Also if you're running LIO, from time to time expect a kernel panic (haven't 
tried with the latest upstream, as I've been using
Ubuntu 14.04 on my "export" hosts for the test, so might have improved...).

As of now I would not recommend this setup without being aware of the risks 
involved.

There have been a few upstream patches getting the LIO code in better 
cluster-aware shape, but no idea if they have been merged
yet. I know RedHat has a guy on this.

On 01/21/2015 02:40 PM, Nick Fisk wrote:

Hi Jake,

 

Thanks for this, I have been going through this and have a pretty good idea on 
what you are doing now, however I maybe missing something looking through your 
scripts, but I’m still not quite understanding how you are managing to make 
sure locking is happening with the ESXi ATS SCSI command.

 

>From this slide

 

http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY
   (Page 8)

 

It seems to indicate that for a true active/active setup the two targets need 
to be aware of each other and exchange locking information for it to work 
reliably, I’ve also watched the video from the Ceph developer summit where this 
is discussed and it seems that Ceph+Kernel need changes to allow this locking 
to be pushed back to the RBD layer so it can be shared, from what I can see 
browsing through the Linux Git Repo, these patches haven’t made the mainline 
kernel yet.

 

Can you shed any light on this? As tempting as having active/active is, I’m 
wary about using the configuration until I understand how the locking is 
working and if fringe cases involving multiple ESXi hosts writing to the same 
LUN on different targets could spell disaster.

 

Many thanks,

Nick

 

From: Jake Young [mailto:jak3...@gmail.com] 
Sent: 14 January 2015 16:54


To: Nick Fisk
Cc: Giuseppe Civitella; ceph-users
Subject: Re: [ceph-users] Ceph, LIO, VMWARE anyone?

 

Yes, it's active/active and I found that VMWare can switch from path to path 
with no issues or service impact.

 

  

I posted some config files here: github.com/jak3kaj/misc 

 

 

One set is fr

Re: [ceph-users] RGW Enabling non default region on existing cluster - data migration

2015-01-23 Thread Yehuda Sadeh
Also, one more point to consider. A bucket that was created at the
default region before a region was set is considered to belong to the
master region.

Yehuda

On Fri, Jan 23, 2015 at 8:40 AM, Yehuda Sadeh  wrote:
> On Wed, Jan 21, 2015 at 7:24 PM, Mark Kirkwood
>  wrote:
>> I've been looking at the steps required to enable (say) multi region
>> metadata sync where there is an existing RGW that has been in use (i.e non
>> trivial number of buckets and objects) which been setup without any region
>> parameters.
>>
>> Now given that the existing objects are all in the pools corresponding to
>> the default (lack of) region - *not* the new region prefixed ones - is there
>> are migration procedure to get them into the *new* ones?
>>
>
> One way to do it would be by defining the new region (with everything
> set to the default params), and then manually modify the buckets
> metadata to reflect that they reside in the new region. There are
> radosgw-admin tools to allow modification of buckets metadata
> (radosgw-admin metadata get bucket:, radosgw-admin metadata
> get bucket.instance::, etc.).
>
> Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-23 Thread Jake Young
Thanks for the feedback Nick and Zoltan,

I have been seeing periodic kernel panics when I used LIO.  It was either
due to LIO or the kernel rbd mapping.  I have seen this on Ubuntu precise
with kernel 3.14.14 and again in Ubunty trusty with the utopic kernel
(currently 3.16.0-28).  Ironically, this is the primary reason I started
exploring a redundancy solution for my iSCSI proxy node.  So, yes, these
crashes have nothing to do with running the Active/Active setup.

I am moving my entire setup from LIO to rbd enabled tgt, which I've found
to be much more stable and gives equivalent performance.

I've been testing active/active LIO since July of 2014 with VMWare and I've
never seen any vmfs corruption.  I am now convinced (thanks Nick) that it
is possible.  The reason I have not seen any corruption may have to do with
how VMWare happens to be configured.

Originally, I had made a point to use round robin path selection in the
VMware hosts; but as I did performance testing, I found that it actually
didn't help performance.  When the host switches iSCSI targets there is a
short "spin up time" for LIO to get to 100% IO capability.  Since round
robin switches targets every 30 seconds (60 seconds? I forget), this seemed
to be significant.  A secondary goal for me was to end up with a config
that required minimal tuning from VMWare and the target software; so the
obvious choice is to leave VMWare's path selection at the default which is
Fixed and picks the first target in ASCII-betical order.  That means I am
actually functioning in Active/Passive mode.

Jake




On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy <
zol...@linux.vnet.ibm.com> wrote:

>  Just to chime in: it will look fine, feel fine, but underneath it's quite
> easy to get VMFS corruption. Happened in our tests.
> Also if you're running LIO, from time to time expect a kernel panic
> (haven't tried with the latest upstream, as I've been using
> Ubuntu 14.04 on my "export" hosts for the test, so might have improved...).
>
> As of now I would not recommend this setup without being aware of the
> risks involved.
>
> There have been a few upstream patches getting the LIO code in better
> cluster-aware shape, but no idea if they have been merged
> yet. I know RedHat has a guy on this.
>
> On 01/21/2015 02:40 PM, Nick Fisk wrote:
>
>  Hi Jake,
>
>
>
> Thanks for this, I have been going through this and have a pretty good
> idea on what you are doing now, however I maybe missing something looking
> through your scripts, but I’m still not quite understanding how you are
> managing to make sure locking is happening with the ESXi ATS SCSI command.
>
>
>
> From this slide
>
>
>
>
> http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY
> (Page 8)
>
>
>
> It seems to indicate that for a true active/active setup the two targets
> need to be aware of each other and exchange locking information for it to
> work reliably, I’ve also watched the video from the Ceph developer summit
> where this is discussed and it seems that Ceph+Kernel need changes to allow
> this locking to be pushed back to the RBD layer so it can be shared, from
> what I can see browsing through the Linux Git Repo, these patches haven’t
> made the mainline kernel yet.
>
>
>
> Can you shed any light on this? As tempting as having active/active is,
> I’m wary about using the configuration until I understand how the locking
> is working and if fringe cases involving multiple ESXi hosts writing to the
> same LUN on different targets could spell disaster.
>
>
>
> Many thanks,
>
> Nick
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com ]
> *Sent:* 14 January 2015 16:54
>
> *To:* Nick Fisk
> *Cc:* Giuseppe Civitella; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Yes, it's active/active and I found that VMWare can switch from path to
> path with no issues or service impact.
>
>
>
>
>
> I posted some config files here: github.com/jak3kaj/misc
> 
>
>
>
> One set is from my LIO nodes, both the primary and secondary configs so
> you can see what I needed to make unique.  The other set (targets.conf) are
> from my tgt nodes.  They are both 4 LUN configs.
>
>
>
> Like I said in my previous email, there is no performance difference
> between LIO and tgt.  The only service I'm running on these nodes is a
> single iscsi target instance (either LIO or tgt).
>
>
>
> Jake
>
>
>
> On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk  wrote:
>
>  Hi Jake,
>
>
>
> I can’t remember the exact details, but it was something to do with a
> potential problem when using the pacemaker resource agents. I think it was
> to do with a potential hanging issue when one LUN on a shared target failed
> and then it tried to kill all the other LUNS to fail the target over to
> anoth

[ceph-users] Having an issue with: 7 pgs stuck inactive; 7 pgs stuck unclean; 71 requests are blocked > 32

2015-01-23 Thread Glen Aidukas
Hello fellow ceph users,

I ran into a major issue were two KVM hosts will not start due to issues with 
my Ceph cluster.

Here are some details:

Running ceph version 0.87.  There are 10 hosts with 6 drives each for 60 OSDs.

# ceph -s
cluster 1431e336-faa2-4b13-b50d-c1d375b4e64b
 health HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck 
unclean; 71 requests are blocked > 32 sec; pool rbd-b has too few pgs
 monmap e1: 3 mons at {xx}, 
election epoch 92, quorum 0,1,2 ceph-b01,ceph-b02,ceph-b03
 mdsmap e49: 1/1/1 up {0=pmceph-b06=up:active}, 1 up:standby
 osdmap e10023: 60 osds: 60 up, 60 in
  pgmap v19851672: 45056 pgs, 22 pools, 13318 GB data, 3922 kobjects
39863 GB used, 178 TB / 217 TB avail
   45049 active+clean
   7 incomplete
  client io 954 kB/s rd, 386 kB/s wr, 78 op/s

# ceph health detail
HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck unclean; 69 
requests are blocked > 32 sec; 5 osds have slow requests; pool rbd-b has too 
few pgs
pg 3.38b is stuck inactive since forever, current state incomplete, last acting 
[48,35,2]
pg 1.541 is stuck inactive since forever, current state incomplete, last acting 
[48,20,2]
pg 3.57d is stuck inactive for 15676.967208, current state incomplete, last 
acting [55,48,2]
pg 3.5c9 is stuck inactive since forever, current state incomplete, last acting 
[48,2,15]
pg 3.540 is stuck inactive for 15676.959093, current state incomplete, last 
acting [57,48,2]
pg 3.5a5 is stuck inactive since forever, current state incomplete, last acting 
[2,48,57]
pg 3.305 is stuck inactive for 15676.855987, current state incomplete, last 
acting [39,2,48]
pg 3.38b is stuck unclean since forever, current state incomplete, last acting 
[48,35,2]
pg 1.541 is stuck unclean since forever, current state incomplete, last acting 
[48,20,2]
pg 3.57d is stuck unclean for 15676.971318, current state incomplete, last 
acting [55,48,2]
pg 3.5c9 is stuck unclean since forever, current state incomplete, last acting 
[48,2,15]
pg 3.540 is stuck unclean for 15676.963204, current state incomplete, last 
acting [57,48,2]
pg 3.5a5 is stuck unclean since forever, current state incomplete, last acting 
[2,48,57]
pg 3.305 is stuck unclean for 15676.860098, current state incomplete, last 
acting [39,2,48]
pg 3.5c9 is incomplete, acting [48,2,15] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 3.5a5 is incomplete, acting [2,48,57] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 3.57d is incomplete, acting [55,48,2] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 3.540 is incomplete, acting [57,48,2] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 1.541 is incomplete, acting [48,20,2] (reducing pool metadata min_size from 
2 may help; search ceph.com/docs for 'incomplete')
pg 3.38b is incomplete, acting [48,35,2] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 3.305 is incomplete, acting [39,2,48] (reducing pool rbd-b min_size from 2 
may help; search ceph.com/docs for 'incomplete')
20 ops are blocked > 2097.15 sec
49 ops are blocked > 1048.58 sec
13 ops are blocked > 2097.15 sec on osd.2
7 ops are blocked > 2097.15 sec on osd.39
3 ops are blocked > 1048.58 sec on osd.39
41 ops are blocked > 1048.58 sec on osd.48
4 ops are blocked > 1048.58 sec on osd.55
1 ops are blocked > 1048.58 sec on osd.57
5 osds have slow requests
pool rbd-b objects per pg (1084) is more than 12.1798 times cluster average (89)

I ran the following but did not help:

# ceph health detail | grep ^pg | cut -c4-9 | while read i; do ceph pg repair 
${i} ; done
instructing pg 3.38b on osd.48 to repair
instructing pg 1.541 on osd.48 to repair
instructing pg 3.57d on osd.55 to repair
instructing pg 3.5c9 on osd.48 to repair
instructing pg 3.540 on osd.57 to repair
instructing pg 3.5a5 on osd.2 to repair
instructing pg 3.305 on osd.39 to repair
instructing pg 3.38b on osd.48 to repair
instructing pg 1.541 on osd.48 to repair
instructing pg 3.57d on osd.55 to repair
instructing pg 3.5c9 on osd.48 to repair
instructing pg 3.540 on osd.57 to repair
instructing pg 3.5a5 on osd.2 to repair
instructing pg 3.305 on osd.39 to repair
instructing pg 3.5c9 on osd.48 to repair
instructing pg 3.5a5 on osd.2 to repair
instructing pg 3.57d on osd.55 to repair
instructing pg 3.540 on osd.57 to repair
instructing pg 1.541 on osd.48 to repair
instructing pg 3.38b on osd.48 to repair
instructing pg 3.305 on osd.39 to repair

Also, if I run the following cmd, it seems to just hang.

rbd -p rbd-b info vm-50193-disk-1<-- hangs until I do CTRL-c...


Any help would be greatly appreciated!

Glen Aidukas
Manager IT Infrastructure
t: 610.813.2815

[final logo for signature v]

BehaviorMatrix, LLC | 676 Dekalb Pike, Suite 200, Blue

Re: [ceph-users] Having an issue with: 7 pgs stuck inactive; 7 pgs stuck unclean; 71 requests are blocked > 32

2015-01-23 Thread Jean-Charles Lopez
Hi Glen

Run a ceph pg {id} query on one of your stuck PGs to find out what the PG
is waiting for to be completed.

Rgds
JC


On Friday, January 23, 2015, Glen Aidukas 
wrote:

>  Hello fellow ceph users,
>
>
>
> I ran into a major issue were two KVM hosts will not start due to issues
> with my Ceph cluster.
>
>
>
> Here are some details:
>
>
>
> Running ceph version 0.87.  There are 10 hosts with 6 drives each for 60
> OSDs.
>
>
>
> # ceph -s
>
> cluster 1431e336-faa2-4b13-b50d-c1d375b4e64b
>
>  health HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs
> stuck unclean; 71 requests are blocked > 32 sec; pool rbd-b has too few pgs
>
>  monmap e1: 3 mons at {xx},
> election epoch 92, quorum 0,1,2 ceph-b01,ceph-b02,ceph-b03
>
>  mdsmap e49: 1/1/1 up {0=pmceph-b06=up:active}, 1 up:standby
>
>  osdmap e10023: 60 osds: 60 up, 60 in
>
>   pgmap v19851672: 45056 pgs, 22 pools, 13318 GB data, 3922 kobjects
>
> 39863 GB used, 178 TB / 217 TB avail
>
>45049 active+clean
>
>7 incomplete
>
>   client io 954 kB/s rd, 386 kB/s wr, 78 op/s
>
>
>
> # ceph health detail
>
> HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck unclean;
> 69 requests are blocked > 32 sec; 5 osds have slow requests; pool rbd-b has
> too few pgs
>
> pg 3.38b is stuck inactive since forever, current state incomplete, last
> acting [48,35,2]
>
> pg 1.541 is stuck inactive since forever, current state incomplete, last
> acting [48,20,2]
>
> pg 3.57d is stuck inactive for 15676.967208, current state incomplete,
> last acting [55,48,2]
>
> pg 3.5c9 is stuck inactive since forever, current state incomplete, last
> acting [48,2,15]
>
> pg 3.540 is stuck inactive for 15676.959093, current state incomplete,
> last acting [57,48,2]
>
> pg 3.5a5 is stuck inactive since forever, current state incomplete, last
> acting [2,48,57]
>
> pg 3.305 is stuck inactive for 15676.855987, current state incomplete,
> last acting [39,2,48]
>
> pg 3.38b is stuck unclean since forever, current state incomplete, last
> acting [48,35,2]
>
> pg 1.541 is stuck unclean since forever, current state incomplete, last
> acting [48,20,2]
>
> pg 3.57d is stuck unclean for 15676.971318, current state incomplete, last
> acting [55,48,2]
>
> pg 3.5c9 is stuck unclean since forever, current state incomplete, last
> acting [48,2,15]
>
> pg 3.540 is stuck unclean for 15676.963204, current state incomplete, last
> acting [57,48,2]
>
> pg 3.5a5 is stuck unclean since forever, current state incomplete, last
> acting [2,48,57]
>
> pg 3.305 is stuck unclean for 15676.860098, current state incomplete, last
> acting [39,2,48]
>
> pg 3.5c9 is incomplete, acting [48,2,15] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.5a5 is incomplete, acting [2,48,57] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.57d is incomplete, acting [55,48,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.540 is incomplete, acting [57,48,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 1.541 is incomplete, acting [48,20,2] (reducing pool metadata min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.38b is incomplete, acting [48,35,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.305 is incomplete, acting [39,2,48] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> 20 ops are blocked > 2097.15 sec
>
> 49 ops are blocked > 1048.58 sec
>
> 13 ops are blocked > 2097.15 sec on osd.2
>
> 7 ops are blocked > 2097.15 sec on osd.39
>
> 3 ops are blocked > 1048.58 sec on osd.39
>
> 41 ops are blocked > 1048.58 sec on osd.48
>
> 4 ops are blocked > 1048.58 sec on osd.55
>
> 1 ops are blocked > 1048.58 sec on osd.57
>
> 5 osds have slow requests
>
> pool rbd-b objects per pg (1084) is more than 12.1798 times cluster
> average (89)
>
>
>
> I ran the following but did not help:
>
>
>
> # ceph health detail | grep ^pg | cut -c4-9 | while read i; do ceph pg
> repair ${i} ; done
>
> instructing pg 3.38b on osd.48 to repair
>
> instructing pg 1.541 on osd.48 to repair
>
> instructing pg 3.57d on osd.55 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.540 on osd.57 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.305 on osd.39 to repair
>
> instructing pg 3.38b on osd.48 to repair
>
> instructing pg 1.541 on osd.48 to repair
>
> instructing pg 3.57d on osd.55 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.540 on osd.57 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.305 on osd.39 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.5

[ceph-users] Ceph with IB and ETH

2015-01-23 Thread German Anders

Hi to all,

 I've a question regarding Ceph and IB, we plan to migrate our 
Ethernet ceph cluster to a infiniband FDR 56Gb/s architecture. We are 
going to use 2x Mellanox IB SX6036G switches for the Public Network 
and a 2x IB SX6018F switch for the Cluster network, and Mellanox FDR 
ADPT dual port on each MON and OSD servers, then one port of each node 
will go to the PUB and CLUS network also both switches will be 
connected together. Then we also have our existing Ethernet network, 
so the idea is to connect the Ethernet 10GbE Switches with LAG to the 
IB Switches SX6036G, so our existing Ethernet clients can communicate 
with the IB clients... now... is there any specification or 
consideration regarding this type of configuration in terms of Ceph?


Thanks in advance,

Regards,



German Anders

















___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Craig Lewis
It depends.  There are a lot of variables, like how many nodes and disks
you currently have.  Are you using journals on SSD.  How much data is
already in the cluster.  What the client load is on the cluster.

Since you only have 40 GB in the cluster, it shouldn't take long to
backfill.  You may find that it finishes backfilling faster than you can
format the new disks.


Since you only have a single OSD node, you must've changed the crushmap to
allow replication over OSDs instead of hosts.  After you get the new node
in would be the best time to switch back to host level replication.  The
more data you have, the more painful that change will become.






On Sun, Jan 18, 2015 at 10:09 AM, Georgios Dimitrakakis <
gior...@acmac.uoc.gr> wrote:

> Hi Jiri,
>
> thanks for the feedback.
>
> My main concern is if it's better to add each OSD one-by-one and wait for
> the cluster to rebalance every time or do it all-together at once.
>
> Furthermore an estimate of the time to rebalance would be great!
>
> Regards,
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-23 Thread Jake Young
I would go with tgt regardless of your HA solution. I tried to use LIO for
a long time and am glad I finally seriously tested tgt. Two big reasons are

1) latest rbd code will be in tgt
2) two less reasons for a kernel panic in the proxy node (rbd and iscsi)

For me, I'm comfortable with how my system is configured with the
Active/Passive config. This only because of the network architecture and
the fact that I administer the ESXi hosts. I also have separate rbd disks
for each environment, so if I do get VMFS corruption, it is isolated to one
system.

Another thing I forgot is that I disabled all the VAAI accelleration based
on this advice when using tgt:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/039670.html
I was having poor performance with VAAI turned on and tgt. LIO performed
the same with or without VAAI for my workload.  I'm not sure if that
changes the way VMFS locking works enough to sidestep the issue. I think
that I'm falling back to just persistent SCSI reservations instead of ATS.
I think I'm still open to corruption for the same reason.  See here if you
haven't already for more details on VMFS locking:
http://blogs.vmware.com/vsphere/2012/05/vmfs-locking-uncovered.html

Jake

On Friday, January 23, 2015, Nick Fisk  wrote:

> Thanks for your responses guys,
>
>
>
> I’ve been spending a lot of time looking at this recently and I think I’m
> even more confused than when I started.
>
>
>
> I been looking at trying to adapt a resource agent made by tiger computing
> (
> http://xo4t.mjt.lu/link/xo4t/gv9y7rs/1/7MG13jwJZd0R-D8FrJljFA/aHR0cHM6Ly9naXRodWIuY29tL3RpZ2VyY29tcHV0aW5nL29jZi1saW8)
>  to create a HA LIO failover target, Instead of going with the Virtual IP
> failover method it manipulates the ALUA states to present active/standby
> paths. It’s very complicated and am close to giving up.
>
>
>
> What do you reckon accept defeat and go with a much simpler tgt and
> virtual IP failover solution for time being until the Redhat patches make
> their way into the kernel?
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com
> ]
> *Sent:* 23 January 2015 16:46
> *To:* Zoltan Arnold Nagy
> *Cc:* Nick Fisk; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Thanks for the feedback Nick and Zoltan,
>
>
>
> I have been seeing periodic kernel panics when I used LIO.  It was either
> due to LIO or the kernel rbd mapping.  I have seen this on Ubuntu precise
> with kernel 3.14.14 and again in Ubunty trusty with the utopic kernel
> (currently 3.16.0-28).  Ironically, this is the primary reason I started
> exploring a redundancy solution for my iSCSI proxy node.  So, yes, these
> crashes have nothing to do with running the Active/Active setup.
>
>
>
> I am moving my entire setup from LIO to rbd enabled tgt, which I've found
> to be much more stable and gives equivalent performance.
>
>
>
> I've been testing active/active LIO since July of 2014 with VMWare and
> I've never seen any vmfs corruption.  I am now convinced (thanks Nick) that
> it is possible.  The reason I have not seen any corruption may have to do
> with how VMWare happens to be configured.
>
>
>
> Originally, I had made a point to use round robin path selection in the
> VMware hosts; but as I did performance testing, I found that it actually
> didn't help performance.  When the host switches iSCSI targets there is a
> short "spin up time" for LIO to get to 100% IO capability.  Since round
> robin switches targets every 30 seconds (60 seconds? I forget), this seemed
> to be significant.  A secondary goal for me was to end up with a config
> that required minimal tuning from VMWare and the target software; so the
> obvious choice is to leave VMWare's path selection at the default which is
> Fixed and picks the first target in ASCII-betical order.  That means I am
> actually functioning in Active/Passive mode.
>
>
>
> Jake
>
>
>
>
>
>
>
>
>
> On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy <
> zol...@linux.vnet.ibm.com
> > wrote:
>
> Just to chime in: it will look fine, feel fine, but underneath it's quite
> easy to get VMFS corruption. Happened in our tests.
> Also if you're running LIO, from time to time expect a kernel panic
> (haven't tried with the latest upstream, as I've been using
> Ubuntu 14.04 on my "export" hosts for the test, so might have improved...).
>
> As of now I would not recommend this setup without being aware of the
> risks involved.
>
> There have been a few upstream patches getting the LIO code in better
> cluster-aware shape, but no idea if they have been merged
> yet. I know RedHat has a guy on this.
>
> On 01/21/2015 02:40 PM, Nick Fisk wrote:
>
> Hi Jake,
>
>
>
> Thanks for this, I have been going through this and have a pretty good
> idea on what you are doing now, however I maybe missing something looking
> through your scripts, but I’m still not quite understanding how you are
> managing to make sure locking is happening with the ESXi ATS SCSI command.
>
>
>
> From this slid

Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-23 Thread Zoltan Arnold Nagy
Correct me if I'm wrong, but tgt doesn't have full SCSI-3 persistence 
support when _not_ using the LIO

backend for it, right?

AFAIK you can either run tgt with it's own iSCSI implementation or you 
can use tgt to manage your LIO targets.


I assume when you're running tgt with the rbd backend code you're 
skipping all the in-kernel LIO parts (in which case
the RedHat patches won't help a bit), and you won't have proper 
active-active support, since the initiators
have no way to synchronize state (and more importantly, no way to 
synchronize write caching! [I can think

of some really ugly hacks to get around that, tho...]).

On 01/23/2015 05:46 PM, Jake Young wrote:

Thanks for the feedback Nick and Zoltan,

I have been seeing periodic kernel panics when I used LIO. It was 
either due to LIO or the kernel rbd mapping.  I have seen this on 
Ubuntu precise with kernel 3.14.14 and again in Ubunty trusty with the 
utopic kernel (currently 3.16.0-28). Ironically, this is the primary 
reason I started exploring a redundancy solution for my iSCSI proxy 
node.  So, yes, these crashes have nothing to do with running the 
Active/Active setup.


I am moving my entire setup from LIO to rbd enabled tgt, which I've 
found to be much more stable and gives equivalent performance.


I've been testing active/active LIO since July of 2014 with VMWare and 
I've never seen any vmfs corruption.  I am now convinced (thanks Nick) 
that it is possible.  The reason I have not seen any corruption may 
have to do with how VMWare happens to be configured.


Originally, I had made a point to use round robin path selection in 
the VMware hosts; but as I did performance testing, I found that it 
actually didn't help performance. When the host switches iSCSI targets 
there is a short "spin up time" for LIO to get to 100% IO capability.  
Since round robin switches targets every 30 seconds (60 seconds? I 
forget), this seemed to be significant.  A secondary goal for me was 
to end up with a config that required minimal tuning from VMWare and 
the target software; so the obvious choice is to leave VMWare's path 
selection at the default which is Fixed and picks the first target in 
ASCII-betical order.  That means I am actually functioning in 
Active/Passive mode.


Jake




On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy 
mailto:zol...@linux.vnet.ibm.com>> wrote:


Just to chime in: it will look fine, feel fine, but underneath
it's quite easy to get VMFS corruption. Happened in our tests.
Also if you're running LIO, from time to time expect a kernel
panic (haven't tried with the latest upstream, as I've been using
Ubuntu 14.04 on my "export" hosts for the test, so might have
improved...).

As of now I would not recommend this setup without being aware of
the risks involved.

There have been a few upstream patches getting the LIO code in
better cluster-aware shape, but no idea if they have been merged
yet. I know RedHat has a guy on this.

On 01/21/2015 02:40 PM, Nick Fisk wrote:


Hi Jake,

Thanks for this, I have been going through this and have a pretty
good idea on what you are doing now, however I maybe missing
something looking through your scripts, but I’m still not quite
understanding how you are managing to make sure locking is
happening with the ESXi ATS SCSI command.

From this slide


http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY
(Page 8)

It seems to indicate that for a true active/active setup the two
targets need to be aware of each other and exchange locking
information for it to work reliably, I’ve also watched the video
from the Ceph developer summit where this is discussed and it
seems that Ceph+Kernel need changes to allow this locking to be
pushed back to the RBD layer so it can be shared, from what I can
see browsing through the Linux Git Repo, these patches haven’t
made the mainline kernel yet.

Can you shed any light on this? As tempting as having
active/active is, I’m wary about using the configuration until I
understand how the locking is working and if fringe cases
involving multiple ESXi hosts writing to the same LUN on
different targets could spell disaster.

Many thanks,

Nick

*From:*Jake Young [mailto:jak3...@gmail.com]
*Sent:* 14 January 2015 16:54


*To:* Nick Fisk
*Cc:* Giuseppe Civitella; ceph-users
*Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Yes, it's active/active and I found that VMWare can switch from
path to path with no issues or service impact.

I posted some config files here: github.com/jak3kaj/misc



One set is from my LIO nodes, both the primary and second

Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Georgios Dimitrakakis

Hi Craig!


For the moment I have only one node with 10 OSDs.
I want to add a second one with 10 more OSDs.

Each OSD in every node is a 4TB SATA drive. No SSD disks!

The data ara approximately 40GB and I will do my best to have zero
or at least very very low load during the expansion process.

To be honest I haven't touched the crushmap. I wasn't aware that I
should have changed it. Therefore, it still is with the default one.
Is that OK? Where can I read about the host level replication in CRUSH 
map in order
to make sure that it's applied or how can I find if this is already 
enabled?


Any other things that I should be aware of?

All the best,


George



It depends.  There are a lot of variables, like how many nodes and
disks you currently have.  Are you using journals on SSD.  How much
data is already in the cluster.  What the client load is on the
cluster.

Since you only have 40 GB in the cluster, it shouldnt take long to
backfill.  You may find that it finishes backfilling faster than you
can format the new disks.

Since you only have a single OSD node, you mustve changed the 
crushmap

to allow replication over OSDs instead of hosts.  After you get the
new node in would be the best time to switch back to host level
replication.  The more data you have, the more painful that change
will become.

On Sun, Jan 18, 2015 at 10:09 AM, Georgios Dimitrakakis  wrote:


Hi Jiri,

thanks for the feedback.

My main concern is if its better to add each OSD one-by-one and
wait for the cluster to rebalance every time or do it all-together
at once.

Furthermore an estimate of the time to rebalance would be great!

Regards,



Links:
--
[1] mailto:gior...@acmac.uoc.gr


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Craig Lewis
You've either modified the crushmap, or changed the pool size to 1.  The
defaults create 3 replicas on different hosts.

What does `ceph osd dump | grep ^pool` output?  If the size param is 1,
then you reduced the replica count.  If the size param is > 1, you must've
adjusted the crushmap.

Either way, after you add the second node would be the ideal time to change
that back to the default.


Given that you only have 40GB of data in the cluster, you shouldn't have a
problem adding the 2nd node.


On Fri, Jan 23, 2015 at 3:58 PM, Georgios Dimitrakakis  wrote:

> Hi Craig!
>
>
> For the moment I have only one node with 10 OSDs.
> I want to add a second one with 10 more OSDs.
>
> Each OSD in every node is a 4TB SATA drive. No SSD disks!
>
> The data ara approximately 40GB and I will do my best to have zero
> or at least very very low load during the expansion process.
>
> To be honest I haven't touched the crushmap. I wasn't aware that I
> should have changed it. Therefore, it still is with the default one.
> Is that OK? Where can I read about the host level replication in CRUSH map
> in order
> to make sure that it's applied or how can I find if this is already
> enabled?
>
> Any other things that I should be aware of?
>
> All the best,
>
>
> George
>
>
>  It depends.  There are a lot of variables, like how many nodes and
>> disks you currently have.  Are you using journals on SSD.  How much
>> data is already in the cluster.  What the client load is on the
>> cluster.
>>
>> Since you only have 40 GB in the cluster, it shouldnt take long to
>> backfill.  You may find that it finishes backfilling faster than you
>> can format the new disks.
>>
>> Since you only have a single OSD node, you mustve changed the crushmap
>> to allow replication over OSDs instead of hosts.  After you get the
>> new node in would be the best time to switch back to host level
>> replication.  The more data you have, the more painful that change
>> will become.
>>
>> On Sun, Jan 18, 2015 at 10:09 AM, Georgios Dimitrakakis  wrote:
>>
>>  Hi Jiri,
>>>
>>> thanks for the feedback.
>>>
>>> My main concern is if its better to add each OSD one-by-one and
>>> wait for the cluster to rebalance every time or do it all-together
>>> at once.
>>>
>>> Furthermore an estimate of the time to rebalance would be great!
>>>
>>> Regards,
>>>
>>
>>
>> Links:
>> --
>> [1] mailto:gior...@acmac.uoc.gr
>>
>
> --
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com