Re: [ceph-users] latency when OSD falls out of cluster

2013-07-12 Thread Wido den Hollander

Hi Edwin,

On 07/12/2013 08:03 AM, Edwin Peer wrote:

Hi there,

We've been noticing nasty multi-second cluster wide latencies if an OSD
drops out of an active cluster (due to power failure, or even being
stopped cleanly). We've also seen this problem occur when an OSD is
inserted back into the cluster.



You will probably see that Peering Groups (PGs) go into a different 
state then active+clean.


What does ceph -s tell you in such a case?


Obviously, this has the effect of freezing all VMs doing I/O across the
cluster for several seconds when a single node fails. Is this behaviour
expected? Or have I perhaps got something configured wrong?



Not really the expected behavior, but it could be CPU power limitations 
on the OSDs. I notice this latency with a Atom cluster as well, but 
that's mainly due to the fact that the Atoms aren't fast enough to 
figure out what's happening.


Faster AMD or Intel CPUs don't suffer from this. There will be a very 
short I/O stall for certain PGs when an OSD goes down, but that should 
be very short and not every VM should suffer.


How many OSDs do you have with how many PGs per pool?

Wido


We're trying very hard to eliminate all single points of failure in our
architecture, is there anything that can be done about this?

Regards,
Edwin Peer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tuning options for 10GE ethernet and ceph

2013-07-12 Thread Mihály Árva-Tóth
2013/7/11 Mark Nelson 

> On 07/11/2013 10:27 AM, Mihály Árva-Tóth wrote:
>
>> 2013/7/11 Mark Nelson > >
>>
>>
>> On 07/11/2013 10:04 AM, Mihály Árva-Tóth wrote:
>>
>> Hello,
>>
>> We are planning to use Intel 10 GE ethernet between nodes of
>> OSDs. Host
>> operation system will be Ubuntu 12.04 x86_64. Are there any
>> recommendations available to tuning options (ex. sysctl and ceph)?
>>
>> Thank you,
>> Mihaly
>>
>>
>> Hi,
>>
>> Generally if performance and latency look good with something like
>> iperf and a couple of parallel streams you should be able to get
>> good performance with Ceph.  You may find that using jumbo frames
>> can help in some circumstances.  In some cases we've seen that TCP
>> autotuning can cause issues (primarily with reads!), but I think
>> we've basically got that solved through a ceph tunable now.
>>
>>
>> Hi Mark,
>>
>> Thank you. So are there no Ceph-related configration options which can I
>> tuning for good performance on 10GE network? Where can I read more about
>> TCP autotuning issues?
>>
>
> Nothing really comes to mind as far as Ceph goes.  You may want to use a
> separate front and back network if you have the ports/switches available.
>  Having said that, I've got a test setup where I used a bonded 10GbE
> interface, and with RADOS bench was able to achieve 2GB/s with no special
> Ceph network options beyond specifying that I wanted to use the 10GbE
> network.  Of course you'll need the clients, concurrency, and backend disks
> to really get that.
>
> The tcp autotuning issues were first discovered by Jim Schutt about a year
> ago and reported on ceph-devel:
>
> http://www.spinics.net/lists/**ceph-devel/msg05049.html
>
> And our workaround:
>
> http://tracker.ceph.com/issues/2100
>

Hi Mark,

Thank you, great news.

Regards,
Mihaly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] latency when OSD falls out of cluster

2013-07-12 Thread Edwin Peer

On 07/12/2013 09:21 AM, Wido den Hollander wrote:


You will probably see that Peering Groups (PGs) go into a different
state then active+clean.


Indeed, the cluster goes into a health warning state and starts to 
resync the data for the affected OSDs. Nothing is missing, just degraded 
(redundancy level is 2).



Not really the expected behavior, but it could be CPU power limitations
on the OSDs. I notice this latency with a Atom cluster as well, but
that's mainly due to the fact that the Atoms aren't fast enough to
figure out what's happening.


They are fairly meaty hosts - all of them quad core 3.2 GHz Xeons, 
however, we do run VMs on the same boxes (contrary to recommended 
practice). The hosts are lightly loaded though, with load averages 
seldom heading north of 1.0.



Faster AMD or Intel CPUs don't suffer from this. There will be a very
short I/O stall for certain PGs when an OSD goes down, but that should
be very short and not every VM should suffer.

How many OSDs do you have with how many PGs per pool?


1000 PGs, 10 OSDs (2 per host). The number of PGs may be a little high, 
but we plan to add more hosts and consequently OSDs to the cluster as 
time goes on and I was worried about splitting PGs later.


I guess it may be limited to only the affected PGs, I'm not sure, but 
every VM I've cared about (or have been watching) so far has been affected.


Seconds of down time is quite severe, especially when it is a planned 
shut down or rejoining. I can understand if an OSD just disappears, that 
some requests might be directed to the now gone node, but I see similar 
latency hiccups on scheduled shut downs and rejoins too?


Regards,
Edwin Peer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with tgt with ceph support

2013-07-12 Thread Toni F. [ackstorm]

Yes! it seems that i wasn't compiled the rbd support.

System:
State: ready
debug: off
LLDs:
iscsi: ready
Backing stores:
bsg
sg
null
ssc
aio
rdwr (bsoflags sync:direct)
Device types:
disk
cd/dvd
osd
controller
changer
tape
passthrough
iSNS:
iSNS=Off
iSNSServerIP=
iSNSServerPort=3205
iSNSAccessControl=Off

I'm going to recompile it
Thanks a lot!

On 11/07/13 07:45, Dan Mick wrote:



On 07/10/2013 04:12 AM, Toni F. [ackstorm] wrote:

Hi all,

I have installed the v0.37 of tgt.

To test this feature i follow the
http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ guide

When i launch the command:

tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 0
--backing-store iscsi-image --bstype rbd

fails

First i think that lun cannot be 0 because lun 0 is used by the
controller (previous command)


This worked when I first wrote the backend, but tgt may have changed; 
I'll investigate and change the blog entry if so.  Thanks.




If i launch the correct command with lun 1 i have this error:

tgtadm: invalid request

In syslog:

Jul 10 12:54:03 datastore-lnx001 tgtd: device_mgmt(245) sz:28
params:path=iscsi-image,bstype=rbd
Jul 10 12:54:03 datastore-lnx001 tgtd: tgt_device_create(532) failed to
find bstype, rbd

What's wrong? not supported?



Where did you get your tgtd?  Was it built with rbd support (CEPH_RBD 
defined

in the environment for make)?

sudo ./tgtadm --lld iscsi --op show --mode system

should tell you.

How did you set up access to ceph.conf?





--

Toni Fuentes Rico
toni.fuen...@ackstorm.es
Administración de Sistemas

Oficina central: 902 888 345

ACK STORM, S.L.
ISO 9001:2008 (Cert.nº. 536932)
http://ackstorm.es

Este mensaje electrónico contiene información de ACK STORM, S.L. que es privada 
y confidencial, siendo para el uso exclusivo de la persona(s) o entidades 
arriba  mencionadas. Si usted no es el destinatario señalado, le informamos que 
cualquier divulgación, copia, distribución o uso de los contenidos está 
prohibida. Si usted ha recibido este mensaje por error, por favor borre su 
contenido y comuníquenoslo en la dirección ackst...@ackstorm.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with tgt with ceph support

2013-07-12 Thread Toni F. [ackstorm]

It works!

Thanks for all

On 12/07/13 11:23, Toni F. [ackstorm] wrote:

Yes! it seems that i wasn't compiled the rbd support.

System:
State: ready
debug: off
LLDs:
iscsi: ready
Backing stores:
bsg
sg
null
ssc
aio
rdwr (bsoflags sync:direct)
Device types:
disk
cd/dvd
osd
controller
changer
tape
passthrough
iSNS:
iSNS=Off
iSNSServerIP=
iSNSServerPort=3205
iSNSAccessControl=Off

I'm going to recompile it
Thanks a lot!

On 11/07/13 07:45, Dan Mick wrote:



On 07/10/2013 04:12 AM, Toni F. [ackstorm] wrote:

Hi all,

I have installed the v0.37 of tgt.

To test this feature i follow the
http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ guide

When i launch the command:

tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 0
--backing-store iscsi-image --bstype rbd

fails

First i think that lun cannot be 0 because lun 0 is used by the
controller (previous command)


This worked when I first wrote the backend, but tgt may have changed; 
I'll investigate and change the blog entry if so.  Thanks.




If i launch the correct command with lun 1 i have this error:

tgtadm: invalid request

In syslog:

Jul 10 12:54:03 datastore-lnx001 tgtd: device_mgmt(245) sz:28
params:path=iscsi-image,bstype=rbd
Jul 10 12:54:03 datastore-lnx001 tgtd: tgt_device_create(532) failed to
find bstype, rbd

What's wrong? not supported?



Where did you get your tgtd?  Was it built with rbd support (CEPH_RBD 
defined

in the environment for make)?

sudo ./tgtadm --lld iscsi --op show --mode system

should tell you.

How did you set up access to ceph.conf?








--

Toni Fuentes Rico
toni.fuen...@ackstorm.es
Administración de Sistemas

Oficina central: 902 888 345

ACK STORM, S.L.
ISO 9001:2008 (Cert.nº. 536932)
http://ackstorm.es

Este mensaje electrónico contiene información de ACK STORM, S.L. que es privada 
y confidencial, siendo para el uso exclusivo de la persona(s) o entidades 
arriba  mencionadas. Si usted no es el destinatario señalado, le informamos que 
cualquier divulgación, copia, distribución o uso de los contenidos está 
prohibida. Si usted ha recibido este mensaje por error, por favor borre su 
contenido y comuníquenoslo en la dirección ackst...@ackstorm.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?

2013-07-12 Thread Tom Verdaat
Hi Darryl,

Would love to do that too but only if we can configure nova to do this
automatically. Any chance you could dig up and share how you guys
accomplished this?

>From everything I've read so far Grizzly is not up for the task yet. If
I can't set it in nova.conf then it probably won't work with 3rd party
tools like Hostbill and break user self service functionality that we're
aiming for with a public cloud concept. I think we'll need this and this
blueprint implemented to be able to achieve this, and of course this one
for the dashboard would be nice too.

I'll do some more digging into Openstack and see how far we can get with
this.

In the mean time I've done some more research and figured out that:

  * There is a bunch of other cluster file systems but GFS2 and
OCFS2 are the only open source ones I could find, and I believe
the only ones that are integrated in the Linux kernel.
  * OCFS2 seems to have a lot more public information than GFS2. It
has more documentation and a living - though not very active -
mailing list.
  * OCFS2 seems to be in active use by its sponsor Oracle, while I
can't find much on GFS2 from its sponsor RedHat.
  * OCFS2 documentation indicates a node soft limit of 256 versus 16
for GFS2, and there are actual deployments of stable 45 TB+
production clusters.
  * Performance tests from 2010 indicate OCFS2 clearly beating GFS2,
though of course newer versions have been released since.
  * GFS2 has more fencing options than OCFS2.


There is not much info from the last 12 months so it's hard get an
accurate picture. If we have to go with the shared storage approach
OCFS2 looks like the preferred option based on the info I've gathered so
far though.

Tom



Darryl Bond schreef op vr 12-07-2013 om 10:04 [+1000]:

> Tom,
> I'm no expert as I didn't set it up, but we are using Openstack
> Grizzly with KVM/QEMU and RBD volumes for VM's.
> We boot the VMs from the RBD volumes and it all seems to work just
> fine. 
> Migration works perfectly, although live - no break migration only
> works from the command line tools. The GUI uses the pause, migrate
> then un-pause mode.
> Layered snapshot/cloning works just fine through the GUI. I would say
> Grizzly has pretty good integration with CEPH.
> 
> Regards
> Darryl
> 
> 
> On 07/12/13 09:41, Tom Verdaat wrote:
> 
> > Hi Alex, 
> > 
> > 
> > 
> > We're planning to deploy OpenStack Grizzly using KVM. I agree that
> > running every VM directly from RBD devices would be preferable, but
> > booting from volumes is not one of OpenStack's strengths and
> > configuring nova to make boot from volume the default method that
> > works automatically is not really feasible yet.
> > 
> > 
> > So the alternative is to mount a shared filesystem
> > on /var/lib/nova/instances of every compute node. Hence the RBD +
> > OCFS2/GFS2 question.
> > 
> > 
> > Tom
> > 
> > 
> > p.s. yes I've read the rbd-openstack page which covers images and
> > persistent volumes, not running instances which is what my question
> > is about.
> > 
> > 
> > 
> > 2013/7/12 Alex Bligh 
> > 
> > Tom,
> > 
> > 
> > On 11 Jul 2013, at 22:28, Tom Verdaat wrote:
> > 
> > > Actually I want my running VMs to all be stored on the
> > same file system, so we can use live migration to move them
> > between hosts.
> > >
> > > QEMU is not going to help because we're not using it in
> > our virtualization solution.
> > 
> > 
> > 
> > Out of interest, what are you using in your virtualization
> > solution? Most things (including modern Xen) seem to use
> > Qemu for the back end. If your virtualization solution does
> > not use qemu as a back end, you can use kernel rbd devices
> > straight which I think will give you better performance than
> > OCFS2 on RBD devices.
> > 
> > 
> > A
> > 
> > >
> > > 2013/7/11 Alex Bligh 
> > >
> > > On 11 Jul 2013, at 19:25, Gilles Mocellin wrote:
> > >
> > > > Hello,
> > > >
> > > > Yes, you missed that qemu can use directly RADOS volume.
> > > > Look here :
> > > > http://ceph.com/docs/master/rbd/qemu-rbd/
> > > >
> > > > Create :
> > > > qemu-img create -f rbd rbd:data/squeeze 10G
> > > >
> > > > Use :
> > > >
> > > > qemu -m 1024 -drive format=raw,file=rbd:data/squeeze
> > >
> > > I don't think he did. As I read it he wants his VMs to all
> > access the same filing system, and doesn't want to use
> > cephfs.
> > >
> > > OCFS2 on RBD I suppose is a reasonable choice for that.
> > >
> > > --
> > > Alex Bligh
> > >
> > >
> > >
> > >
> > > __

Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?

2013-07-12 Thread Alex Bligh

On 12 Jul 2013, at 13:21, Tom Verdaat wrote:

> In the mean time I've done some more research and figured out that:
>   • There is a bunch of other cluster file systems but GFS2 and OCFS2 are 
> the only open source ones I could find, and I believe the only ones that are 
> integrated in the Linux kernel.
>   • OCFS2 seems to have a lot more public information than GFS2. It has 
> more documentation and a living - though not very active - mailing list.
>   • OCFS2 seems to be in active use by its sponsor Oracle, while I can't 
> find much on GFS2 from its sponsor RedHat.
>   • OCFS2 documentation indicates a node soft limit of 256 versus 16 for 
> GFS2, and there are actual deployments of stable 45 TB+ production clusters.
>   • Performance tests from 2010 indicate OCFS2 clearly beating GFS2, 
> though of course newer versions have been released since.
>   • GFS2 has more fencing options than OCFS2.

FWIW: For VM images (i.e. large files accessed by only one client at once) 
OCFS2 seems to perform better than GFS2. I seem to remember some performance 
issues with small files, and large directories with a lot of contention 
(multiple readers and writers of files or file metadata). You may need to 
forward port some of the more modern tools to your distro.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy Intended Purpose

2013-07-12 Thread Edward Huyer
I'm working on deploying a multi-machine (possibly as many as 7) ceph (61.4) 
cluster for experimentation.  I'm trying to deploy using ceph-deploy on Ubuntu, 
but it seems...flaky.  For instance, I tried to deploy additional monitors and 
ran into the bug(?) where the additional monitors don't work if you don't have 
"public network" defined in ceph.conf, but by the time I found that bit of info 
I had already blown up the cluster.

So my question is, is ceph-deploy the preferred method for deploying larger 
clusters, particularly in production, or is it a quick-and-dirty 
get-something-going-to-play-with tool and manual configuration is preferred for 
"real" clusters?  I've seen documentation suggesting it's not intended for use 
in real clusters, but a lot of other documentation seems to assume it's the 
default deploy tool.

-
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?

2013-07-12 Thread Wolfgang Hennerbichler
FYI: i'm using ocfs2 as you plan to (/var/Lib/nova/instances/) it is stable, 
but Performance isnt blasting.

--
Sent from my mobile device

On 12.07.2013, at 14:21, "Tom Verdaat" 
mailto:t...@server.biz>> wrote:

Hi Darryl,

Would love to do that too but only if we can configure nova to do this 
automatically. Any chance you could dig up and share how you guys accomplished 
this?

>From everything I've read so far Grizzly is not up for the task yet. If I 
>can't set it in nova.conf then it probably won't work with 3rd party tools 
>like Hostbill and break user self service functionality that we're aiming for 
>with a public cloud concept. I think we'll need 
>this and 
>this
> blueprint implemented to be able to achieve this, and of course this 
>one 
>for the dashboard would be nice too.

I'll do some more digging into Openstack and see how far we can get with this.

In the mean time I've done some more research and figured out that:

  *   There is a bunch of other cluster file systems but GFS2 and OCFS2 are the 
only open source ones I could find, and I believe the only ones that are 
integrated in the Linux kernel.
  *   OCFS2 seems to have a lot more public information than GFS2. It has more 
documentation and a living - though not very active - mailing list.
  *   OCFS2 seems to be in active use by its sponsor Oracle, while I can't find 
much on GFS2 from its sponsor RedHat.
  *   OCFS2 documentation indicates a node soft limit of 256 versus 16 for 
GFS2, and there are actual deployments of stable 45 TB+ production clusters.
  *   Performance tests from 2010 indicate OCFS2 clearly beating GFS2, though 
of course newer versions have been released since.
  *   GFS2 has more fencing options than OCFS2.

There is not much info from the last 12 months so it's hard get an accurate 
picture. If we have to go with the shared storage approach OCFS2 looks like the 
preferred option based on the info I've gathered so far though.

Tom



Darryl Bond schreef op vr 12-07-2013 om 10:04 [+1000]:
Tom,
I'm no expert as I didn't set it up, but we are using Openstack Grizzly with 
KVM/QEMU and RBD volumes for VM's.
We boot the VMs from the RBD volumes and it all seems to work just fine.
Migration works perfectly, although live - no break migration only works from 
the command line tools. The GUI uses the pause, migrate then un-pause mode.
Layered snapshot/cloning works just fine through the GUI. I would say Grizzly 
has pretty good integration with CEPH.

Regards
Darryl

On 07/12/13 09:41, Tom Verdaat wrote:

Hi Alex,


We're planning to deploy OpenStack Grizzly using KVM. I agree that running 
every VM directly from RBD devices would be preferable, but booting from 
volumes is not one of OpenStack's strengths and configuring nova to make boot 
from volume the default method that works automatically is not really feasible 
yet.


So the alternative is to mount a shared filesystem on /var/lib/nova/instances 
of every compute node. Hence the RBD + OCFS2/GFS2 question.


Tom


p.s. yes I've read the 
rbd-openstack page which covers 
images and persistent volumes, not running instances which is what my question 
is about.


2013/7/12 Alex Bligh mailto:a...@alex.org.uk>>
Tom,

On 11 Jul 2013, at 22:28, Tom Verdaat wrote:

> Actually I want my running VMs to all be stored on the same file system, so 
> we can use live migration to move them between hosts.
>
> QEMU is not going to help because we're not using it in our virtualization 
> solution.


Out of interest, what are you using in your virtualization solution? Most 
things (including modern Xen) seem to use Qemu for the back end. If your 
virtualization solution does not use qemu as a back end, you can use kernel rbd 
devices straight which I think will give you better performance than OCFS2 on 
RBD devices.

A

>
> 2013/7/11 Alex Bligh mailto:a...@alex.org.uk>>
>
> On 11 Jul 2013, at 19:25, Gilles Mocellin wrote:
>
> > Hello,
> >
> > Yes, you missed that qemu can use directly RADOS volume.
> > Look here :
> > http://ceph.com/docs/master/rbd/qemu-rbd/
> >
> > Create :
> > qemu-img create -f rbd rbd:data/squeeze 10G
> >
> > Use :
> >
> > qemu -m 1024 -drive format=raw,file=rbd:data/squeeze
>
> I don't think he did. As I read it he wants his VMs to all access the same 
> filing system, and doesn't want to use cephfs.
>
> OCFS2 on RBD I suppose is a reasonable choice for that.
>
> --
> Alex Bligh
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
Alex Bligh










The contents of this electronic message and any a

[ceph-users] slow request problem

2013-07-12 Thread Stefan Priebe - Profihost AG
Hello list,

anyone else here who always has problems bringing back an offline OSD?
Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
after bringing an OSD oinline again but that's so long that the VMs
crash as they think their disk is offline...

Under bobtail i never had any problems with that.

Please HELP!

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Mark Nelson

On 07/12/2013 01:45 AM, Stefan Priebe - Profihost AG wrote:

Hello,

is this calculation for the number of PGs correct?

36 OSDs, Replication Factor 3

36 * 100 / 3 => 1200 PGs

But i then read that it should be an exponent of 2 so it should be 2048?


At large numbers of PGs it may not matter very much, but I don't think 
it would hurt either!


Basically this has to do with how ceph_stable_mod works.  At 
non-power-of-two values, the bucket counts aren't even, but that's only 
a small part of the story and may ultimately only have a small effect on 
the distribution unless the PG count is small.




Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Gandalf Corvotempesta
2013/7/12 Mark Nelson :
> At large numbers of PGs it may not matter very much, but I don't think it
> would hurt either!
>
> Basically this has to do with how ceph_stable_mod works.  At
> non-power-of-two values, the bucket counts aren't even, but that's only a
> small part of the story and may ultimately only have a small effect on the
> distribution unless the PG count is small.

In case of 12 OSDs for each node, and a cluster made with 18 storage
nodes are you suggesting:

(12*18*100) / 3 = 7200 PGs, that rounded to an exponent of 2 means 8192 ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Mark Nelson

On 07/12/2013 09:53 AM, Gandalf Corvotempesta wrote:

2013/7/12 Mark Nelson :

At large numbers of PGs it may not matter very much, but I don't think it
would hurt either!

Basically this has to do with how ceph_stable_mod works.  At
non-power-of-two values, the bucket counts aren't even, but that's only a
small part of the story and may ultimately only have a small effect on the
distribution unless the PG count is small.


In case of 12 OSDs for each node, and a cluster made with 18 storage
nodes are you suggesting:

(12*18*100) / 3 = 7200 PGs, that rounded to an exponent of 2 means 8192 ?



Well, our official recommendation on the website is PGS = OSDS * 100 / 
replicas.  I think the thought is that with sufficient numbers of OSDs 
the behaviour of ceph_stable_mod shouldn't matter (much).  At some point 
I'd like to do a little more involved of an analysis to see how PG 
distribution changes, but for now I wouldn't really expect a dramatic 
difference between 7200 and 8192 PGs.


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cuttlefish VS Bobtail performance series

2013-07-12 Thread Mark Nelson

Part 4 has been released!  Get your 4MB fio results while they are hot!

http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-4-4m-rbd-performance/

Mark

On 07/11/2013 09:56 AM, Mark Nelson wrote:

And We've now got part 3 out showing 128K FIO results:

http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-3-128k-rbd-performance/


Mark

On 07/10/2013 11:01 AM, Mark Nelson wrote:

Hello again!

Part 2 is now out!  We've got a whole slew of results for 4K FIO tests
on RBD:

http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-2-4k-rbd-performance/



Mark

On 07/09/2013 08:41 AM, Mark Nelson wrote:

Hi Guys,

Just wanted to let everyone know that we've released part 1 of a series
of performance articles that looks at Cuttlefish vs Bobtail on our
Supermicro test chassis.  We'll be looking at both RADOS bench and RBD
performance with a variety of IO sizes, IO patterns, concurrency levels,
file systems, and more!

Every day this week we'll be releasing a new part in the series.  Here's
a link to part 1:

http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-1-introduction-and-rados-bench/




Thanks!
Mark






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with tgt with ceph support

2013-07-12 Thread Toni F. [ackstorm]

It works, but the performance is very poor. 100MB/s or less

Which are your performance experience?

Regards

On 12/07/13 13:56, Toni F. [ackstorm] wrote:

It works!

Thanks for all

On 12/07/13 11:23, Toni F. [ackstorm] wrote:

Yes! it seems that i wasn't compiled the rbd support.

System:
State: ready
debug: off
LLDs:
iscsi: ready
Backing stores:
bsg
sg
null
ssc
aio
rdwr (bsoflags sync:direct)
Device types:
disk
cd/dvd
osd
controller
changer
tape
passthrough
iSNS:
iSNS=Off
iSNSServerIP=
iSNSServerPort=3205
iSNSAccessControl=Off

I'm going to recompile it
Thanks a lot!

On 11/07/13 07:45, Dan Mick wrote:



On 07/10/2013 04:12 AM, Toni F. [ackstorm] wrote:

Hi all,

I have installed the v0.37 of tgt.

To test this feature i follow the
http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ guide

When i launch the command:

tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 0
--backing-store iscsi-image --bstype rbd

fails

First i think that lun cannot be 0 because lun 0 is used by the
controller (previous command)


This worked when I first wrote the backend, but tgt may have 
changed; I'll investigate and change the blog entry if so.  Thanks.




If i launch the correct command with lun 1 i have this error:

tgtadm: invalid request

In syslog:

Jul 10 12:54:03 datastore-lnx001 tgtd: device_mgmt(245) sz:28
params:path=iscsi-image,bstype=rbd
Jul 10 12:54:03 datastore-lnx001 tgtd: tgt_device_create(532) 
failed to

find bstype, rbd

What's wrong? not supported?



Where did you get your tgtd?  Was it built with rbd support 
(CEPH_RBD defined

in the environment for make)?

sudo ./tgtadm --lld iscsi --op show --mode system

should tell you.

How did you set up access to ceph.conf?











--

Toni Fuentes Rico
toni.fuen...@ackstorm.es
Administración de Sistemas

Oficina central: 902 888 345

ACK STORM, S.L.
ISO 9001:2008 (Cert.nº. 536932)
http://ackstorm.es

Este mensaje electrónico contiene información de ACK STORM, S.L. que es privada 
y confidencial, siendo para el uso exclusivo de la persona(s) o entidades 
arriba  mencionadas. Si usted no es el destinatario señalado, le informamos que 
cualquier divulgación, copia, distribución o uso de los contenidos está 
prohibida. Si usted ha recibido este mensaje por error, por favor borre su 
contenido y comuníquenoslo en la dirección ackst...@ackstorm.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with tgt with ceph support

2013-07-12 Thread Dan Mick
Ceph performance is a very very complicated subject. How does that compare
to other access methods?  Say, rbd import/export for an easy test?
On Jul 12, 2013 8:22 AM, "Toni F. [ackstorm]" 
wrote:

> It works, but the performance is very poor. 100MB/s or less
>
> Which are your performance experience?
>
> Regards
>
> On 12/07/13 13:56, Toni F. [ackstorm] wrote:
>
>> It works!
>>
>> Thanks for all
>>
>> On 12/07/13 11:23, Toni F. [ackstorm] wrote:
>>
>>> Yes! it seems that i wasn't compiled the rbd support.
>>>
>>> System:
>>> State: ready
>>> debug: off
>>> LLDs:
>>> iscsi: ready
>>> Backing stores:
>>> bsg
>>> sg
>>> null
>>> ssc
>>> aio
>>> rdwr (bsoflags sync:direct)
>>> Device types:
>>> disk
>>> cd/dvd
>>> osd
>>> controller
>>> changer
>>> tape
>>> passthrough
>>> iSNS:
>>> iSNS=Off
>>> iSNSServerIP=
>>> iSNSServerPort=3205
>>> iSNSAccessControl=Off
>>>
>>> I'm going to recompile it
>>> Thanks a lot!
>>>
>>> On 11/07/13 07:45, Dan Mick wrote:
>>>


 On 07/10/2013 04:12 AM, Toni F. [ackstorm] wrote:

> Hi all,
>
> I have installed the v0.37 of tgt.
>
> To test this feature i follow the
> http://ceph.com/dev-notes/**adding-support-for-rbd-to-**stgt/guide
>
> When i launch the command:
>
> tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 0
> --backing-store iscsi-image --bstype rbd
>
> fails
>
> First i think that lun cannot be 0 because lun 0 is used by the
> controller (previous command)
>

 This worked when I first wrote the backend, but tgt may have changed;
 I'll investigate and change the blog entry if so.  Thanks.


> If i launch the correct command with lun 1 i have this error:
>
> tgtadm: invalid request
>
> In syslog:
>
> Jul 10 12:54:03 datastore-lnx001 tgtd: device_mgmt(245) sz:28
> params:path=iscsi-image,**bstype=rbd
> Jul 10 12:54:03 datastore-lnx001 tgtd: tgt_device_create(532) failed to
> find bstype, rbd
>
> What's wrong? not supported?
>


 Where did you get your tgtd?  Was it built with rbd support (CEPH_RBD
 defined
 in the environment for make)?

 sudo ./tgtadm --lld iscsi --op show --mode system

 should tell you.

 How did you set up access to ceph.conf?



>>>
>>>
>>
>>
>
> --
>
> Toni Fuentes Rico
> toni.fuen...@ackstorm.es
> Administración de Sistemas
>
> Oficina central: 902 888 345
>
> ACK STORM, S.L.
> ISO 9001:2008 (Cert.nº. 536932)
> http://ackstorm.es
>
> Este mensaje electrónico contiene información de ACK STORM, S.L. que es
> privada y confidencial, siendo para el uso exclusivo de la persona(s) o
> entidades arriba  mencionadas. Si usted no es el destinatario señalado, le
> informamos que cualquier divulgación, copia, distribución o uso de los
> contenidos está prohibida. Si usted ha recibido este mensaje por error, por
> favor borre su contenido y comuníquenoslo en la dirección
> ackst...@ackstorm.es
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy

2013-07-12 Thread Scottix
Make sure you understand the ceph architecture
http://ceph.com/docs/next/architecture/ and then go through the ceph-deploy
docs here http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/


On Thu, Jul 11, 2013 at 8:04 PM, SUNDAY A. OLUTAYO wrote:

> I am on first exploration of ceph, I need help to understand these terms;
> ceph-deploy new Host, ceph-deploy new MON Host and ceph-deploy mon create
> Host? I will appreciate your help.
>
> Sent from my LG Mobile
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Follow Me: @Scottix 
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Stefan Priebe - Profihost AG
Right now I have 4096. 36*100/3 => 1200. As recovery take ages I thought this 
might be the reason.

Stefan

This mail was sent with my iPhone.

Am 12.07.2013 um 17:03 schrieb Mark Nelson :

> On 07/12/2013 09:53 AM, Gandalf Corvotempesta wrote:
>> 2013/7/12 Mark Nelson :
>>> At large numbers of PGs it may not matter very much, but I don't think it
>>> would hurt either!
>>> 
>>> Basically this has to do with how ceph_stable_mod works.  At
>>> non-power-of-two values, the bucket counts aren't even, but that's only a
>>> small part of the story and may ultimately only have a small effect on the
>>> distribution unless the PG count is small.
>> 
>> In case of 12 OSDs for each node, and a cluster made with 18 storage
>> nodes are you suggesting:
>> 
>> (12*18*100) / 3 = 7200 PGs, that rounded to an exponent of 2 means 8192 ?
> 
> Well, our official recommendation on the website is PGS = OSDS * 100 / 
> replicas.  I think the thought is that with sufficient numbers of OSDs the 
> behaviour of ceph_stable_mod shouldn't matter (much).  At some point I'd like 
> to do a little more involved of an analysis to see how PG distribution 
> changes, but for now I wouldn't really expect a dramatic difference between 
> 7200 and 8192 PGs.
> 
> Mark
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Mark Nelson

On 07/12/2013 02:19 PM, Stefan Priebe - Profihost AG wrote:

Right now I have 4096. 36*100/3 => 1200. As recovery take ages I thought this 
might be the reason.


Are you seeing any craziness on the mons?



Stefan

This mail was sent with my iPhone.

Am 12.07.2013 um 17:03 schrieb Mark Nelson :


On 07/12/2013 09:53 AM, Gandalf Corvotempesta wrote:

2013/7/12 Mark Nelson :

At large numbers of PGs it may not matter very much, but I don't think it
would hurt either!

Basically this has to do with how ceph_stable_mod works.  At
non-power-of-two values, the bucket counts aren't even, but that's only a
small part of the story and may ultimately only have a small effect on the
distribution unless the PG count is small.


In case of 12 OSDs for each node, and a cluster made with 18 storage
nodes are you suggesting:

(12*18*100) / 3 = 7200 PGs, that rounded to an exponent of 2 means 8192 ?


Well, our official recommendation on the website is PGS = OSDS * 100 / 
replicas.  I think the thought is that with sufficient numbers of OSDs the 
behaviour of ceph_stable_mod shouldn't matter (much).  At some point I'd like 
to do a little more involved of an analysis to see how PG distribution changes, 
but for now I wouldn't really expect a dramatic difference between 7200 and 
8192 PGs.

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Num of PGs

2013-07-12 Thread Stefan Priebe - Profihost AG


Am 12.07.2013 um 21:23 schrieb Mark Nelson :

> On 07/12/2013 02:19 PM, Stefan Priebe - Profihost AG wrote:
>> Right now I have 4096. 36*100/3 => 1200. As recovery take ages I thought 
>> this might be the reason.
> 
> Are you seeing any craziness on the mons?

What could this be? Nothing noticed.

Stefan 



> 
>> 
>> Stefan
>> 
>> This mail was sent with my iPhone.
>> 
>> Am 12.07.2013 um 17:03 schrieb Mark Nelson :
>> 
>>> On 07/12/2013 09:53 AM, Gandalf Corvotempesta wrote:
 2013/7/12 Mark Nelson :
> At large numbers of PGs it may not matter very much, but I don't think it
> would hurt either!
> 
> Basically this has to do with how ceph_stable_mod works.  At
> non-power-of-two values, the bucket counts aren't even, but that's only a
> small part of the story and may ultimately only have a small effect on the
> distribution unless the PG count is small.
 
 In case of 12 OSDs for each node, and a cluster made with 18 storage
 nodes are you suggesting:
 
 (12*18*100) / 3 = 7200 PGs, that rounded to an exponent of 2 means 8192 ?
>>> 
>>> Well, our official recommendation on the website is PGS = OSDS * 100 / 
>>> replicas.  I think the thought is that with sufficient numbers of OSDs the 
>>> behaviour of ceph_stable_mod shouldn't matter (much).  At some point I'd 
>>> like to do a little more involved of an analysis to see how PG distribution 
>>> changes, but for now I wouldn't really expect a dramatic difference between 
>>> 7200 and 8192 PGs.
>>> 
>>> Mark
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy

2013-07-12 Thread SUNDAY A. OLUTAYO
Thanks, I will go through the link

Sent from my LG Mobile

Scottix  wrote:

Make sure you understand the ceph architecture
http://ceph.com/docs/next/architecture/ and then go through the ceph-deploy
docs here http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/


On Thu, Jul 11, 2013 at 8:04 PM, SUNDAY A. OLUTAYO wrote:

> I am on first exploration of ceph, I need help to understand these terms;
> ceph-deploy new Host, ceph-deploy new MON Host and ceph-deploy mon create
> Host? I will appreciate your help.
>
> Sent from my LG Mobile
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Follow Me: @Scottix 
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy Intended Purpose

2013-07-12 Thread Neil Levine
It's the default tool for getting something up and running quickly for
tests/PoC. If you don't want to make too many custom settings changes, are
happy with SSH access to all boxes, then it's fine but if you want more
granular control then we advise you use something like Chef or Puppet.
There are example Chef configs in the Github repo.

Neil


On Fri, Jul 12, 2013 at 5:54 AM, Edward Huyer  wrote:

> I’m working on deploying a multi-machine (possibly as many as 7) ceph
> (61.4) cluster for experimentation.  I’m trying to deploy using ceph-deploy
> on Ubuntu, but it seems…flaky.  For instance, I tried to deploy additional
> monitors and ran into the bug(?) where the additional monitors don’t work
> if you don’t have “public network” defined in ceph.conf, but by the time I
> found that bit of info I had already blown up the cluster.
>
> ** **
>
> So my question is, is ceph-deploy the preferred method for deploying
> larger clusters, particularly in production, or is it a quick-and-dirty
> get-something-going-to-play-with tool and manual configuration is preferred
> for “real” clusters?  I’ve seen documentation suggesting it’s not intended
> for use in real clusters, but a lot of other documentation seems to assume
> it’s the default deploy tool.
>
> ** **
>
> -
>
> Edward Huyer
>
> School of Interactive Games and Media
>
> Golisano 70-2373
>
> 152 Lomb Memorial Drive
>
> Rochester, NY 14623
>
> 585-475-6651
>
> erh...@rit.edu
>
> ** **
>
> Obligatory Legalese:
>
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> ** **
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Including pool_id in the crush hash ? FLAG_HASHPSPOOL ?

2013-07-12 Thread Gregory Farnum
On Thu, Jul 11, 2013 at 6:06 AM, Sylvain Munaut
 wrote:
> Hi,
>
>
> I'd like the pool_id to be included in the hash used for the PG, to
> try and improve the data distribution. (I have 10 pool).
>
> I see that there is a flag named FLAG_HASHPSPOOL. Is it possible to
> enable it on existing pool ?

Hmm, right now there is not. :( I've made a ticket:
http://tracker.ceph.com/issues/5614
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible bug with image.list_lockers()

2013-07-12 Thread Gregory Farnum
On Thu, Jul 11, 2013 at 4:38 PM, Mandell Degerness
 wrote:
> I'm not certain what the correct behavior should be in this case, so
> maybe it is not a bug, but here is what is happening:
>
> When an OSD becomes full, a process fails and we unmount the rbd
> attempt to remove the lock associated with the rbd for the process.
> The unmount works fine, but removing the lock is failing right now
> because the list_lockers() function call never returns.
>
> Here is a code snippet I tried with a fake rbd lock on a test cluster:
>
> import rbd
> import rados
> with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster:
>   with cluster.open_ioctx('rbd') as ioctx:
> with rbd.Image(ioctx, 'msd1') as image:
>   image.list_lockers()
>
> The process never returns, even after the ceph cluster is returned to
> healthy.  The only indication of the error is an error in the
> /var/log/messages file:
>
> Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793
> 7ffc66d72700  0 client.6911.objecter  FULL, paused modify
> 0x7ffc687c6050 tid 2
>
> Any help would be greatly appreciated.
>
> ceph version:
>
> ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)

Interesting. Updating the lock state requires write access to the
object, which is why it blocks when the cluster gets full — removing
that would be a lot of work for very little gain. However, the request
should get woken up once the cluster is no longer full! Here's a
ticket: http://tracker.ceph.com/issues/5615
Josh or Yehuda, do you have any thoughts on obvious causes before we
dig into librados/objecter code?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com