date:20140613

[ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-13 Thread David

Hi,

We’re going to take down one OSD node for maintenance (add cpu + ram) which 
might take 10-20 minutes.
What’s the best practice here in a production cluster running dumpling 
0.67.7-1~bpo70+1?

Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-13 Thread Wido den Hollander


On 06/13/2014 10:56 AM, David wrote:

Hi,

We’re going to take down one OSD node for maintenance (add cpu + ram) which 
might take 10-20 minutes.
What’s the best practice here in a production cluster running dumpling 
0.67.7-1~bpo70+1?



I suggest:

$ ceph osd set noout

This way NO OSD will be marked as out and prevent data re-distribution.

After the OSDs are back up and synced:

$ ceph osd unset noout


Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-13 Thread David

Thanks Wido,

So during no out data will be degraded but not resynced, which won’t interrupt 
operations ( running default 3 replicas and a normal map, so each osd node only 
has 1 replica of the data)
Do we need to do anything after bringing the node up again or will it resynch 
automatically?

Kind Regards,
David Majchrzak

13 jun 2014 kl. 11:13 skrev Wido den Hollander :

> On 06/13/2014 10:56 AM, David wrote:
>> Hi,
>> 
>> We’re going to take down one OSD node for maintenance (add cpu + ram) which 
>> might take 10-20 minutes.
>> What’s the best practice here in a production cluster running dumpling 
>> 0.67.7-1~bpo70+1?
>> 
> 
> I suggest:
> 
> $ ceph osd set noout
> 
> This way NO OSD will be marked as out and prevent data re-distribution.
> 
> After the OSDs are back up and synced:
> 
> $ ceph osd unset noout
> 
>> Kind Regards,
>> David Majchrzak
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-13 Thread Wido den Hollander


On 06/13/2014 11:18 AM, David wrote:

Thanks Wido,

So during no out data will be degraded but not resynced, which won’t interrupt 
operations ( running default 3 replicas and a normal map, so each osd node only 
has 1 replica of the data)
Do we need to do anything after bringing the node up again or will it resynch 
automatically?



Correct. The OSDs will be marked as down, so that will cause the PGs to 
go into a degraded state, but they will stay marked as "in", not 
triggering data re-distribution.


You don't have to do anything. Just let the machine and OSDs boot and 
Ceph will take care of the rest (assuming it's all configured properly).


Afterwards unset the noout flag.

Wido


Kind Regards,
David Majchrzak

13 jun 2014 kl. 11:13 skrev Wido den Hollander :


On 06/13/2014 10:56 AM, David wrote:

Hi,

We’re going to take down one OSD node for maintenance (add cpu + ram) which 
might take 10-20 minutes.
What’s the best practice here in a production cluster running dumpling 
0.67.7-1~bpo70+1?



I suggest:

$ ceph osd set noout

This way NO OSD will be marked as out and prevent data re-distribution.

After the OSDs are back up and synced:

$ ceph osd unset noout


Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-13 Thread David

Alright, thanks! :)

Kind Regards,
David Majchrzak

13 jun 2014 kl. 11:21 skrev Wido den Hollander :

> On 06/13/2014 11:18 AM, David wrote:
>> Thanks Wido,
>> 
>> So during no out data will be degraded but not resynced, which won’t 
>> interrupt operations ( running default 3 replicas and a normal map, so each 
>> osd node only has 1 replica of the data)
>> Do we need to do anything after bringing the node up again or will it 
>> resynch automatically?
>> 
> 
> Correct. The OSDs will be marked as down, so that will cause the PGs to go 
> into a degraded state, but they will stay marked as "in", not triggering data 
> re-distribution.
> 
> You don't have to do anything. Just let the machine and OSDs boot and Ceph 
> will take care of the rest (assuming it's all configured properly).
> 
> Afterwards unset the noout flag.
> 
> Wido
> 
>> Kind Regards,
>> David Majchrzak
>> 
>> 13 jun 2014 kl. 11:13 skrev Wido den Hollander :
>> 
>>> On 06/13/2014 10:56 AM, David wrote:
 Hi,
 
 We’re going to take down one OSD node for maintenance (add cpu + ram) 
 which might take 10-20 minutes.
 What’s the best practice here in a production cluster running dumpling 
 0.67.7-1~bpo70+1?
 
>>> 
>>> I suggest:
>>> 
>>> $ ceph osd set noout
>>> 
>>> This way NO OSD will be marked as out and prevent data re-distribution.
>>> 
>>> After the OSDs are back up and synced:
>>> 
>>> $ ceph osd unset noout
>>> 
 Kind Regards,
 David Majchrzak
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
>>> 
>>> 
>>> --
>>> Wido den Hollander
>>> 42on B.V.
>>> 
>>> Phone: +31 (0)20 700 9902
>>> Skype: contact42on
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pid_max value?

2014-06-13 Thread Kaifeng Yao

The thread creation depends on the OSD number per host, as well as the
cluster size. You have really a lot (40!!) OSDs on a single node, but the
good part is that you¹ve got a small cluster (only 4 nodes).

If you already run into the problem then the only way is to increase
pid_max. Remember to reserver at least 2x or 3x buffer. During recovery it
may create much more threads than usual, especially when the scale is
large. Using a large pid_max number doesn¹t hurt, the messenger system
reaps inactive threads.

On such a high density system you may also see thread scheduling consumes
too much CPU time, sometimes OSDs are unable to send or process heartbeat
messages and they are marked as out. Newer kernel version does much better
thread scheduling job, so you can try a kernel upgrade when it happens.

 

On 6/12/14, 2:47 AM, "Maciej Bonin"  wrote:

>We have not experienced any downsides to this approach performance or
>stability-wise, if you prefer you can experiment with the values, but I
>see no real advantage in doing so.
>
>Regards,
>Maciej Bonin
>Systems Engineer | M247 Limited
>M247.com  Connected with our Customers
>Contact us today to discuss your hosting and connectivity requirements
>ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology
>Fast 500 EMEA | Sunday Times Tech Track 100
>M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra
>Court, Manchester, M32 0QT
> 
>ISO 27001 Data Protection Classification: A - Public
> 
>
>
>-Original Message-
>From: Cao, Buddy [mailto:buddy@intel.com]
>Sent: 11 June 2014 17:00
>To: Maciej Bonin; ceph-users@lists.ceph.com
>Subject: RE: pid_max value?
>
>Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each
>storage node?  Do you think "kernel.pid_max = 4194303" is reasonable
>since it increase a lot from the default OS setting.
>
>
>Wei Cao (Buddy)
>
>-Original Message-
>From: Maciej Bonin [mailto:maciej.bo...@m247.com]
>Sent: Wednesday, June 11, 2014 10:07 PM
>To: Cao, Buddy; ceph-users@lists.ceph.com
>Subject: RE: pid_max value?
>
>Hello,
>
>The values we use are as follows:
># sysctl -p
>net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog =
>3 net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144
>net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3
>net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2
>net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608
>net.core.wmem_max = 8388608 net.core.rmem_default = 65536
>net.core.wmem_default = 65536 net.ipv4.tcp_rmem = 4096 87380 8388608
>net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 8388608 8388608
>8388608 net.ipv4.route.flush = 1 kernel.pid_max = 4194303
>
>The timeouts don't really make sense without tw reuse/recycling but we
>found increasing the max and letting the old ones hang gives better
>performance.
>Somaxconn was the most important value we had to increase as with 3 mons,
>3 storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started
>running into major problems with servers dying left and right.
>Most of those values are lifted from some openstack python script IIRC,
>please let us know if you find a more efficient/stable configuration,
>however we're quite happy with this one.
>
>Regards,
>Maciej Bonin
>Systems Engineer | M247 Limited
>M247.com  Connected with our Customers
>Contact us today to discuss your hosting and connectivity requirements
>ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology
>Fast 500 EMEA | Sunday Times Tech Track 100
>M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra
>Court, Manchester, M32 0QT
> 
>ISO 27001 Data Protection Classification: A - Public
> 
>
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>Cao, Buddy
>Sent: 11 June 2014 15:00
>To: ceph-users@lists.ceph.com
>Subject: [ceph-users] pid_max value?
>
>Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768
>enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph
>node already run into "create thread fail" problem in osd log which root
>cause at pid_max.
>
>
>Wei Cao (Buddy)
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Moving Ceph cluster to different network segment

2014-06-13 Thread Fred Yang

Thanks, John.

That seems will take care of monitors, how about osd? Any idea how to
change IP addresses without triggering a resync?

Fred

Sent from my Samsung Galaxy S3
On Jun 12, 2014 1:21 PM, "John Wilkins"  wrote:

> Fred,
>
> I'm not sure it will completely answer your question, but I would
> definitely have a look at:
> http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address
>
> There are some important steps in there for monitors.
>
>
> On Wed, Jun 11, 2014 at 12:08 PM, Fred Yang 
> wrote:
>
>> We need to move Ceph cluster to different network segment for
>> interconnectivity between mon and osc, anybody has the procedure regarding
>> how that can be done? Note that the host name reference will be changed, so
>> originally the osd host referenced as cephnode1, in the new segment it will
>> be cephnode1-n.
>>
>> Thanks,
>> Fred
>>
>> Sent from my Samsung Galaxy S3
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> John Wilkins
> Senior Technical Writer
> Intank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Moving Ceph cluster to different network segment

2014-06-13 Thread Wido den Hollander

On 06/13/2014 01:41 PM, Fred Yang wrote:

Thanks, John.

That seems will take care of monitors, how about osd? Any idea how to
change IP addresses without triggering a resync?

IPs of OSDs are dynamic. Their IP is no part of the data distribution.
Simply renumber them and restart the daemon.

I suggest:

1. Stop OSD(s)
2. Renumber machine
3. Start OSD(s)

That should be all. There will be some recovery due to I/Os which
occurred between 1 and 3.

Wido

Fred

Sent from my Samsung Galaxy S3

On Jun 12, 2014 1:21 PM, "John Wilkins" mailto:john.wilk...@inktank.com>> wrote:

Fred,

I'm not sure it will completely answer your question, but I would
definitely have a look at:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address

There are some important steps in there for monitors.

On Wed, Jun 11, 2014 at 12:08 PM, Fred Yang mailto:frederic.y...@gmail.com>> wrote:

We need to move Ceph cluster to different network segment for
interconnectivity between mon and osc, anybody has the procedure
regarding how that can be done? Note that the host name
reference will be changed, so originally the osd host referenced
as cephnode1, in the new segment it will be cephnode1-n.

Thanks,
Fred

Sent from my Samsung Galaxy S3

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Moving Ceph cluster to different network segment

2014-06-13 Thread Fred Yang

Wido,
So the cluster reference osd based on the hostname, or the GUID(hopefully)?
Note that I mentioned in original email the hostname associated to the IP
will also be changed as well, it will be as simple as changing IP and
restart osd? I remembered I tested in Dumpling a while ago and it didn't
work, this cluster is running on Emperor and not sure whether that will
make any difference.

Fred
On Jun 13, 2014 7:51 AM, "Wido den Hollander"  wrote:

> On 06/13/2014 01:41 PM, Fred Yang wrote:
>
>> Thanks, John.
>>
>> That seems will take care of monitors, how about osd? Any idea how to
>> change IP addresses without triggering a resync?
>>
>>
> IPs of OSDs are dynamic. Their IP is no part of the data distribution.
> Simply renumber them and restart the daemon.
>
> I suggest:
>
> 1. Stop OSD(s)
> 2. Renumber machine
> 3. Start OSD(s)
>
> That should be all. There will be some recovery due to I/Os which occurred
> between 1 and 3.
>
> Wido
>
>  Fred
>>
>> Sent from my Samsung Galaxy S3
>>
>> On Jun 12, 2014 1:21 PM, "John Wilkins" > > wrote:
>>
>> Fred,
>>
>> I'm not sure it will completely answer your question, but I would
>> definitely have a look at:
>> http://ceph.com/docs/master/rados/operations/add-or-rm-
>> mons/#changing-a-monitor-s-ip-address
>>
>> There are some important steps in there for monitors.
>>
>>
>> On Wed, Jun 11, 2014 at 12:08 PM, Fred Yang > > wrote:
>>
>> We need to move Ceph cluster to different network segment for
>> interconnectivity between mon and osc, anybody has the procedure
>> regarding how that can be done? Note that the host name
>> reference will be changed, so originally the osd host referenced
>> as cephnode1, in the new segment it will be cephnode1-n.
>>
>> Thanks,
>> Fred
>>
>> Sent from my Samsung Galaxy S3
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> --
>> John Wilkins
>> Senior Technical Writer
>> Intank
>> john.wilk...@inktank.com 
>> (415) 425-9599 
>> http://inktank.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Moving Ceph cluster to different network segment

2014-06-13 Thread Jake Young

I recently changed IP and hostname of an osd node running dumpling and had
no problems.

You do need to have your ceph.conf file built correctly or your osds won't
start. Make sure the new IPs and new hostname are in there before you
change the IP.

The crushmap showed a new bucket (host name) containing the osds that were
moved and the original bucket remained in the crushmap, but with no
children. I was able to unlink the original bucket with no problem.

Jake

On Friday, June 13, 2014, Fred Yang  wrote:

> Wido,
> So the cluster reference osd based on the hostname, or the
> GUID(hopefully)? Note that I mentioned in original email the hostname
> associated to the IP will also be changed as well, it will be as simple as
> changing IP and restart osd? I remembered I tested in Dumpling a while ago
> and it didn't work, this cluster is running on Emperor and not sure whether
> that will make any difference.
>
> Fred
> On Jun 13, 2014 7:51 AM, "Wido den Hollander"  > wrote:
>
>> On 06/13/2014 01:41 PM, Fred Yang wrote:
>>
>>> Thanks, John.
>>>
>>> That seems will take care of monitors, how about osd? Any idea how to
>>> change IP addresses without triggering a resync?
>>>
>>>
>> IPs of OSDs are dynamic. Their IP is no part of the data distribution.
>> Simply renumber them and restart the daemon.
>>
>> I suggest:
>>
>> 1. Stop OSD(s)
>> 2. Renumber machine
>> 3. Start OSD(s)
>>
>> That should be all. There will be some recovery due to I/Os which
>> occurred between 1 and 3.
>>
>> Wido
>>
>>  Fred
>>>
>>> Sent from my Samsung Galaxy S3
>>>
>>> On Jun 12, 2014 1:21 PM, "John Wilkins" >> 
>>> >> >> wrote:
>>>
>>> Fred,
>>>
>>> I'm not sure it will completely answer your question, but I would
>>> definitely have a look at:
>>> http://ceph.com/docs/master/rados/operations/add-or-rm-
>>> mons/#changing-a-monitor-s-ip-address
>>>
>>> There are some important steps in there for monitors.
>>>
>>>
>>> On Wed, Jun 11, 2014 at 12:08 PM, Fred Yang >> 
>>> >> >> wrote:
>>>
>>> We need to move Ceph cluster to different network segment for
>>> interconnectivity between mon and osc, anybody has the procedure
>>> regarding how that can be done? Note that the host name
>>> reference will be changed, so originally the osd host referenced
>>> as cephnode1, in the new segment it will be cephnode1-n.
>>>
>>> Thanks,
>>> Fred
>>>
>>> Sent from my Samsung Galaxy S3
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>>  >> ceph-users@lists.ceph.com
>>> >
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>>> --
>>> John Wilkins
>>> Senior Technical Writer
>>> Intank
>>> john.wilk...@inktank.com
>>>  >> john.wilk...@inktank.com
>>> >
>>> (415) 425-9599 
>>> http://inktank.com
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Andrey Korolyov

On Fri, Jun 13, 2014 at 7:09 AM, Ke-fei Lin  wrote:
> Hi list,
>
> I deployed a Windows 7 VM with qemu-rbd disk, and got an unexpected booting
> phase performance.
>
> I discovered that when booting the Windows VM up, there are consecutive ~2
> minutes that `ceph -w` gives me an interesting log like: "... 567 KB/s rd,
> 567 op/s", "... 789 KB/s rd, 789 op/s" and so on.
>
> e.g.
> 2014-06-05 15:47:43.125441 mon.0 [INF] pgmap v18095: 320 pgs: 320
> active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 765 kB/s
> rd, 765 op/s
> 2014-06-05 15:47:44.240662 mon.0 [INF] pgmap v18096: 320 pgs: 320
> active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 568 kB/s
> rd, 568 op/s
> ... (skipped)
> 2014-06-05 15:50:02.441523 mon.0 [INF] pgmap v18186: 320 pgs: 320
> active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 412 kB/s
> rd, 412 op/s
>
> Which shows the number of rps is always the same as the number of ops, i.e.
> every operation is nearly 1KB, and I think this leads a very long boot time
> (takes 2 mins to enter desktop). But I can't understand why, is it an issue
> of my Ceph cluster? Or just some special I/O patterns in Windows VM booting
> process?
>
> In addition, I know that there are no qemu-rbd caching benefits during boot
> phase since the cache is not persistent (please corrects me), so is it
> possible to enlarge the read_ahead size in qemu-rbd driver? And does this
> make any sense?
>
> And finally, how can I tune up my Ceph cluster for this workload (booting
> Windows VM)?
>
> Any advice and suggestions will be greatly appreciated.
>
>
> Context:
>
> 4 OSDs (7200rpm/750GB/SATA) with replication factor 2.
>
> The system disk in Windows VM is NTFS formatted with default 4K block size.
>
> $ uname -a
> Linux ceph-consumer 3.11.0-22-generic #38~precise1-Ubuntu SMP Fri May 16
> 20:47:57 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> $ ceph --version
> ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
>
> $ dpkg -l | grep rbd
> ii  librbd-dev   0.80.1-1precise
> RADOS block device client library (development files)
> ii  librbd1  0.80.1-1precise
> RADOS block device client library
>
> $ virsh version
> Compiled against library: libvir 0.9.8
> Using library: libvir 0.9.8
> Using API: QEMU 0.9.8
> Running hypervisor: QEMU 1.7.1 ()
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

Hi,

If you are able to leave only this VM in cluster scope to check,
you`ll perhaps may use virsh domblkstat accumulated values to compare
real number of operations.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Ke-fei Lin

2014-06-13 21:23 GMT+08:00 Andrey Korolyov :

> On Fri, Jun 13, 2014 at 7:09 AM, Ke-fei Lin  wrote:
> > Hi list,
> >
> > I deployed a Windows 7 VM with qemu-rbd disk, and got an unexpected
> booting
> > phase performance.
> >
> > I discovered that when booting the Windows VM up, there are consecutive
> ~2
> > minutes that `ceph -w` gives me an interesting log like: "... 567 KB/s
> rd,
> > 567 op/s", "... 789 KB/s rd, 789 op/s" and so on.
> >
> > e.g.
> > 2014-06-05 15:47:43.125441 mon.0 [INF] pgmap v18095: 320 pgs: 320
> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 765
> kB/s
> > rd, 765 op/s
> > 2014-06-05 15:47:44.240662 mon.0 [INF] pgmap v18096: 320 pgs: 320
> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 568
> kB/s
> > rd, 568 op/s
> > ... (skipped)
> > 2014-06-05 15:50:02.441523 mon.0 [INF] pgmap v18186: 320 pgs: 320
> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 412
> kB/s
> > rd, 412 op/s
> >
> > Which shows the number of rps is always the same as the number of ops,
> i.e.
> > every operation is nearly 1KB, and I think this leads a very long boot
> time
> > (takes 2 mins to enter desktop). But I can't understand why, is it an
> issue
> > of my Ceph cluster? Or just some special I/O patterns in Windows VM
> booting
> > process?
> >
> > In addition, I know that there are no qemu-rbd caching benefits during
> boot
> > phase since the cache is not persistent (please corrects me), so is it
> > possible to enlarge the read_ahead size in qemu-rbd driver? And does this
> > make any sense?
> >
> > And finally, how can I tune up my Ceph cluster for this workload (booting
> > Windows VM)?
> >
> > Any advice and suggestions will be greatly appreciated.
> >
> >
> > Context:
> >
> > 4 OSDs (7200rpm/750GB/SATA) with replication factor 2.
> >
> > The system disk in Windows VM is NTFS formatted with default 4K block
> size.
> >
> > $ uname -a
> > Linux ceph-consumer 3.11.0-22-generic #38~precise1-Ubuntu SMP Fri
> May 16
> > 20:47:57 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> >
> > $ ceph --version
> > ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
> >
> > $ dpkg -l | grep rbd
> > ii  librbd-dev   0.80.1-1precise
> > RADOS block device client library (development files)
> > ii  librbd1  0.80.1-1precise
> > RADOS block device client library
> >
> > $ virsh version
> > Compiled against library: libvir 0.9.8
> > Using library: libvir 0.9.8
> > Using API: QEMU 0.9.8
> > Running hypervisor: QEMU 1.7.1 ()
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> Hi,
>
> If you are able to leave only this VM in cluster scope to check,
> you`ll perhaps may use virsh domblkstat accumulated values to compare
> real number of operations.
>

Thanks, Andrey.

I tried `virsh domblkstat  hda` (only this VM in whole cluster) and got
these values:

hda rd_req 70682
hda rd_bytes 229894656
hda wr_req 1067
hda wr_bytes 12645888
hda flush_operations 0

(These values became stable after ~2 mins)

While the output of `ceph -w` is attached at: http://pastebin.com/Uhdj9drV

Any advices?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Andrey Korolyov

On Fri, Jun 13, 2014 at 5:50 PM, Ke-fei Lin  wrote:
> 2014-06-13 21:23 GMT+08:00 Andrey Korolyov :
>
>> On Fri, Jun 13, 2014 at 7:09 AM, Ke-fei Lin  wrote:
>> > Hi list,
>> >
>> > I deployed a Windows 7 VM with qemu-rbd disk, and got an unexpected
>> > booting
>> > phase performance.
>> >
>> > I discovered that when booting the Windows VM up, there are consecutive
>> > ~2
>> > minutes that `ceph -w` gives me an interesting log like: "... 567 KB/s
>> > rd,
>> > 567 op/s", "... 789 KB/s rd, 789 op/s" and so on.
>> >
>> > e.g.
>> > 2014-06-05 15:47:43.125441 mon.0 [INF] pgmap v18095: 320 pgs: 320
>> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 765
>> > kB/s
>> > rd, 765 op/s
>> > 2014-06-05 15:47:44.240662 mon.0 [INF] pgmap v18096: 320 pgs: 320
>> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 568
>> > kB/s
>> > rd, 568 op/s
>> > ... (skipped)
>> > 2014-06-05 15:50:02.441523 mon.0 [INF] pgmap v18186: 320 pgs: 320
>> > active+clean; 86954 MB data, 190 GB used, 2603 GB / 2793 GB avail; 412
>> > kB/s
>> > rd, 412 op/s
>> >
>> > Which shows the number of rps is always the same as the number of ops,
>> > i.e.
>> > every operation is nearly 1KB, and I think this leads a very long boot
>> > time
>> > (takes 2 mins to enter desktop). But I can't understand why, is it an
>> > issue
>> > of my Ceph cluster? Or just some special I/O patterns in Windows VM
>> > booting
>> > process?
>> >
>> > In addition, I know that there are no qemu-rbd caching benefits during
>> > boot
>> > phase since the cache is not persistent (please corrects me), so is it
>> > possible to enlarge the read_ahead size in qemu-rbd driver? And does
>> > this
>> > make any sense?
>> >
>> > And finally, how can I tune up my Ceph cluster for this workload
>> > (booting
>> > Windows VM)?
>> >
>> > Any advice and suggestions will be greatly appreciated.
>> >
>> >
>> > Context:
>> >
>> > 4 OSDs (7200rpm/750GB/SATA) with replication factor 2.
>> >
>> > The system disk in Windows VM is NTFS formatted with default 4K block
>> > size.
>> >
>> > $ uname -a
>> > Linux ceph-consumer 3.11.0-22-generic #38~precise1-Ubuntu SMP Fri
>> > May 16
>> > 20:47:57 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > $ ceph --version
>> > ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
>> >
>> > $ dpkg -l | grep rbd
>> > ii  librbd-dev   0.80.1-1precise
>> > RADOS block device client library (development files)
>> > ii  librbd1  0.80.1-1precise
>> > RADOS block device client library
>> >
>> > $ virsh version
>> > Compiled against library: libvir 0.9.8
>> > Using library: libvir 0.9.8
>> > Using API: QEMU 0.9.8
>> > Running hypervisor: QEMU 1.7.1 ()
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> Hi,
>>
>> If you are able to leave only this VM in cluster scope to check,
>> you`ll perhaps may use virsh domblkstat accumulated values to compare
>> real number of operations.
>
>
> Thanks, Andrey.
>
> I tried `virsh domblkstat  hda` (only this VM in whole cluster) and got
> these values:
>
> hda rd_req 70682
> hda rd_bytes 229894656
> hda wr_req 1067
> hda wr_bytes 12645888
> hda flush_operations 0
>
> (These values became stable after ~2 mins)
>
> While the output of `ceph -w` is attached at: http://pastebin.com/Uhdj9drV
>
> Any advices?


Thanks, poor man`s analysis shows that it can be true - assuming
median heartbeat value as 1.2s, overall read ops are about 40k, which
is close enough to what qemu stats saying, regarding floating
heartbeat interval. Because ceph -w never had such value as a precise
measurement tool, I may suggest to measure block stats difference on
smaller intervals, about 1s or so, and compare values then. By the
way, what driver do you use in qemu for a block device?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Ke-fei Lin

2014-06-13 22:04 GMT+08:00 Andrey Korolyov :
>
> On Fri, Jun 13, 2014 at 5:50 PM, Ke-fei Lin  wrote:
> > Thanks, Andrey.
> >
> > I tried `virsh domblkstat  hda` (only this VM in whole cluster) and
got
> > these values:
> >
> > hda rd_req 70682
> > hda rd_bytes 229894656
> > hda wr_req 1067
> > hda wr_bytes 12645888
> > hda flush_operations 0
> >
> > (These values became stable after ~2 mins)
> >
> > While the output of `ceph -w` is attached at:
http://pastebin.com/Uhdj9drV
> >
> > Any advices?
>
>
> Thanks, poor man`s analysis shows that it can be true - assuming
> median heartbeat value as 1.2s, overall read ops are about 40k, which
> is close enough to what qemu stats saying, regarding floating
> heartbeat interval. Because ceph -w never had such value as a precise
> measurement tool, I may suggest to measure block stats difference on
> smaller intervals, about 1s or so, and compare values then. By the
> way, what driver do you use in qemu for a block device?

OK, this time I capture the blkstat difference in a smaller interval (less
than 1s).
And a simple calculation gives me some result:

(19531264-19209216)/(38147-37518) = 512
...
(20158976-19531264)/(39373-38147) = 512

Which means in the beginning of boot phase, every read request from VM is
just *512 byte*.
Maybe this is why `ceph -w` shows me every operation is about 1KB (in my
first post)?

So seems this is the inherent problem of Windows VM, but can I do something
in my Ceph
cluster's configuration to improve this?

By the way the related part of my VM definition are:

/usr/bin/kvm

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-06-13 Thread Stefan Priebe


Hi,

while testint firefly i cam into the sitation where i had a client where 
the latest dumpling packages where installed (0.67.9).


As my pool has hashppool false and the tunables are set to default it 
can talk to my firefly ceph sotrage.


For random 4k writes using fio with librbd and 32 jobs and an iodepth of 32.

I get these results:

librbd / librados2 from dumpling:
  write: io=3020.9MB, bw=103083KB/s, iops=25770, runt= 30008msec
  WRITE: io=3020.9MB, aggrb=103082KB/s, minb=103082KB/s, 
maxb=103082KB/s, mint=30008msec, maxt=30008msec


librbd / librados2 from firefly:
  write: io=7344.3MB, bw=83537KB/s, iops=20884, runt= 90026msec
  WRITE: io=7344.3MB, aggrb=83537KB/s, minb=83537KB/s, maxb=83537KB/s, 
mint=90026msec, maxt=90026msec


Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Sage Weil

Right now, no.

We could add a minimum read size to librbd when caching is enabled...  
that would not be particularly difficult.

sage


On Fri, 13 Jun 2014, Ke-fei Lin wrote:

> 2014-06-13 22:04 GMT+08:00 Andrey Korolyov :
> >
> > On Fri, Jun 13, 2014 at 5:50 PM, Ke-fei Lin  wrote:
> > > Thanks, Andrey.
> > >
> > > I tried `virsh domblkstat  hda` (only this VM in whole cluster) and
> got
> > > these values:
> > >
> > > hda rd_req 70682
> > > hda rd_bytes 229894656
> > > hda wr_req 1067
> > > hda wr_bytes 12645888
> > > hda flush_operations 0
> > >
> > > (These values became stable after ~2 mins)
> > >
> > > While the output of `ceph -w` is attached at:
> http://pastebin.com/Uhdj9drV
> > >
> > > Any advices?
> >
> >
> > Thanks, poor man`s analysis shows that it can be true - assuming
> > median heartbeat value as 1.2s, overall read ops are about 40k, which
> > is close enough to what qemu stats saying, regarding floating
> > heartbeat interval. Because ceph -w never had such value as a precise
> > measurement tool, I may suggest to measure block stats difference on
> > smaller intervals, about 1s or so, and compare values then. By the
> > way, what driver do you use in qemu for a block device?
> 
> OK, this time I capture the blkstat difference in a smaller interval (less
> than 1s).
> And a simple calculation gives me some result:
> 
> (19531264-19209216)/(38147-37518) = 512
> ...
> (20158976-19531264)/(39373-38147) = 512
> 
> Which means in the beginning of boot phase, every read request from VM is
> just *512 byte*.
> Maybe this is why `ceph -w` shows me every operation is about 1KB (in my
> first post)?
> 
> So seems this is the inherent problem of Windows VM, but can I do something
> in my Ceph
> cluster's configuration to improve this?
> 
> By the way the related part of my VM definition are:
> 
>     /usr/bin/kvm
>     
>       
>       
>         
>       
>       
>       
>     
> 
> Thanks.
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Andrey Korolyov

In my belief, lot of sequential small reads will be aggregated after
all when targeting filestore contents (of course if the moment of issuing
next one is not dependent on status of previous read, otherwise
they`ll be separated in time in such way that the rotating media
scheduler will not be able to combine requests), am I wrong? If so,
this case only affects OSD CPU consumption (on very large scale).
Ke-fei, is there any real reasons behind staying on IDE and not LSI
SCSI emulation/virtio?

On Fri, Jun 13, 2014 at 8:11 PM, Sage Weil  wrote:
> Right now, no.
>
> We could add a minimum read size to librbd when caching is enabled...
> that would not be particularly difficult.
>
> sage
>
>
> On Fri, 13 Jun 2014, Ke-fei Lin wrote:
>
>> 2014-06-13 22:04 GMT+08:00 Andrey Korolyov :
>> >
>> > On Fri, Jun 13, 2014 at 5:50 PM, Ke-fei Lin  wrote:
>> > > Thanks, Andrey.
>> > >
>> > > I tried `virsh domblkstat  hda` (only this VM in whole cluster) and
>> got
>> > > these values:
>> > >
>> > > hda rd_req 70682
>> > > hda rd_bytes 229894656
>> > > hda wr_req 1067
>> > > hda wr_bytes 12645888
>> > > hda flush_operations 0
>> > >
>> > > (These values became stable after ~2 mins)
>> > >
>> > > While the output of `ceph -w` is attached at:
>> http://pastebin.com/Uhdj9drV
>> > >
>> > > Any advices?
>> >
>> >
>> > Thanks, poor man`s analysis shows that it can be true - assuming
>> > median heartbeat value as 1.2s, overall read ops are about 40k, which
>> > is close enough to what qemu stats saying, regarding floating
>> > heartbeat interval. Because ceph -w never had such value as a precise
>> > measurement tool, I may suggest to measure block stats difference on
>> > smaller intervals, about 1s or so, and compare values then. By the
>> > way, what driver do you use in qemu for a block device?
>>
>> OK, this time I capture the blkstat difference in a smaller interval (less
>> than 1s).
>> And a simple calculation gives me some result:
>>
>> (19531264-19209216)/(38147-37518) = 512
>> ...
>> (20158976-19531264)/(39373-38147) = 512
>>
>> Which means in the beginning of boot phase, every read request from VM is
>> just *512 byte*.
>> Maybe this is why `ceph -w` shows me every operation is about 1KB (in my
>> first post)?
>>
>> So seems this is the inherent problem of Windows VM, but can I do something
>> in my Ceph
>> cluster's configuration to improve this?
>>
>> By the way the related part of my VM definition are:
>>
>> /usr/bin/kvm
>> 
>>   
>>   
>> 
>>   
>>   
>>   
>> 
>>
>> Thanks.
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Ke-fei Lin

2014-06-14 0:11 GMT+08:00 Sage Weil :
> Right now, no.
>
> We could add a minimum read size to librbd when caching is enabled...
> that would not be particularly difficult.
>
> sage

Thanks, so is it possible to set some options like *readahead* in librbd
or QEMU? Seems no docs mentioned this...

By the way I found a discussion (1 year ago) about persistent caching
on QEMU's mailing list:
https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03649.html
Is there any work currently in progress?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Ke-fei Lin

2014-06-14 0:25 GMT+08:00 Andrey Korolyov :
> In my belief, lot of sequential small reads will be aggregated after
> all when targeting filestore contents (of course if the moment of issuing
> next one is not dependent on status of previous read, otherwise
> they`ll be separated in time in such way that the rotating media
> scheduler will not be able to combine requests), am I wrong? If so,
I think so too.
> this case only affects OSD CPU consumption (on very large scale).
> Ke-fei, is there any real reasons behind staying on IDE and not LSI
> SCSI emulation/virtio?
My bad! Will perform another test on scsi/virtio next week. Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

2014-06-13 Thread Sage Weil

On Sat, 14 Jun 2014, Ke-fei Lin wrote:
> 2014-06-14 0:11 GMT+08:00 Sage Weil :
> > Right now, no.
> >
> > We could add a minimum read size to librbd when caching is enabled...
> > that would not be particularly difficult.
> >
> > sage
> 
> Thanks, so is it possible to set some options like *readahead* in librbd
> or QEMU? Seems no docs mentioned this...

We've stayed away from readahead because this is normally done by the fs 
sitting on top of RBD.

> By the way I found a discussion (1 year ago) about persistent caching
> on QEMU's mailing list:
> https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03649.html
> Is there any work currently in progress?

Nope!  There was a blueprint on the subject a few CDS's ago, though:

http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache
http://pad.ceph.com/p/rbd-shared-read-cache
http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

2014-06-13 Thread Josef Johansson


Hey,

I did try this, it didn't work though, so I think I still have to patch 
the kernel though, as the user_xattr is not allowed on tmpfs.


Thanks for the description though.

I think the next step in this is to do it all virtual, maybe on the same 
hardware to avoid network.
Any problems with doing it all virtual? If it's just memory and the same 
machine, we should see the pure ceph performance right?


Anyone done this?

Cheers,
Josef

Stefan Priebe - Profihost AG skrev 2014-05-15 09:58:

Am 15.05.2014 09:56, schrieb Josef Johansson:

On 15/05/14 09:11, Stefan Priebe - Profihost AG wrote:

Am 15.05.2014 00:26, schrieb Josef Johansson:

Hi,

So, apparently tmpfs does not support non-root xattr due to a possible
DoS-vector. There's configuration set for enabling it as far as I can see.

CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y

Anyone know a way around it? Saw that there's a patch for enabling it,
but recompiling my kernel is out of reach right now ;)

I would create an empty file in tmpfs and then format that file as a
block device.

How do you mean exactly? Creating with dd and mounting with losetup?

mount -t tmpfs -o size=4G /mnt /mnt
dd if=/dev/zero of=/mnt/blockdev_a bs=1M count=4000
mkfs.xfs -f /mnt/blockdev_a
mount /mnt/blockdev_a /ceph/osd.X

Dann /mnt/blockdev_a als OSD device nutzen.


Cheers,
Josef

Created the osd with following:

root@osd1:/# dd seek=6G if=/dev/zero of=/dev/shm/test-osd/img bs=1 count=1
root@osd1:/# losetup /dev/loop0 /dev/shm/test-osd/img
root@osd1:/# mkfs.xfs /dev/loop0
root@osd1:/# ceph osd create
50
root@osd1:/# mkdir /var/lib/ceph/osd/ceph-50
root@osd1:/# mount -t xfs /dev/loop0 /var/lib/ceph/osd/ceph-50
root@osd1:/# ceph-osd --debug_ms 50 -i 50 --mkfs --mkkey
--osd-journal=/dev/sdc7 --mkjournal
2014-05-15 00:20:29.796822 7f40063bb780 -1 journal FileJournal::_open:
aio not supported without directio; disabling aio
2014-05-15 00:20:29.798583 7f40063bb780 -1 journal check: ondisk fsid
bc14ff30-e016-4e0d-9672-96262ee5f07e doesn't match expected
b3f5b98b-e024-4153-875d-5c758a6060eb, invalid (someone else's?) journal
2014-05-15 00:20:29.802155 7f40063bb780 -1 journal FileJournal::_open:
aio not supported without directio; disabling aio
2014-05-15 00:20:29.807237 7f40063bb780 -1
filestore(/var/lib/ceph/osd/ceph-50) could not find
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-05-15 00:20:29.809083 7f40063bb780 -1 created object store
/var/lib/ceph/osd/ceph-50 journal /dev/sdc7 for osd.50 fsid
c51a2683-55dc-4634-9d9d-f0fec9a6f389
2014-05-15 00:20:29.809121 7f40063bb780 -1 auth: error reading file:
/var/lib/ceph/osd/ceph-50/keyring: can't open
/var/lib/ceph/osd/ceph-50/keyring: (2) No such file or directory
2014-05-15 00:20:29.809179 7f40063bb780 -1 created new key in keyring
/var/lib/ceph/osd/ceph-50/keyring
root@osd1:/# ceph-osd --debug_ms 50 -i 50 --mkfs --mkkey
--osd-journal=/dev/sdc7 --mkjournal
2014-05-15 00:20:51.122716 7ff813ba4780 -1 journal FileJournal::_open:
aio not supported without directio; disabling aio
2014-05-15 00:20:51.126275 7ff813ba4780 -1 journal FileJournal::_open:
aio not supported without directio; disabling aio
2014-05-15 00:20:51.129532 7ff813ba4780 -1 provided osd id 50 !=
superblock's -1
2014-05-15 00:20:51.129845 7ff813ba4780 -1  ** ERROR: error creating
empty object store in /var/lib/ceph/osd/ceph-50: (22) Invalid argument

Cheers,
Josef

Christian Balzer skrev 2014-05-14 14:33:

Hello!

On Wed, 14 May 2014 11:29:47 +0200 Josef Johansson wrote:


Hi Christian,

I missed this thread, haven't been reading the list that well the last
weeks.

You already know my setup, since we discussed it in an earlier thread. I
don't have a fast backing store, but I see the slow IOPS when doing
randwrite inside the VM, with rbd cache. Still running dumpling here
though.


Nods, I do recall that thread.


A thought struck me that I could test with a pool that consists of OSDs
that have tempfs-based disks, think I have a bit more latency than your
IPoIB but I've pushed 100k IOPS with the same network devices before.
This would verify if the problem is with the journal disks. I'll also
try to run the journal devices in tempfs as well, as it would test
purely Ceph itself.


That would be interesting indeed.
Given what I've seen (with the journal at 20% utilization and the actual
filestore ataround 5%) I'd expect Ceph to be the culprit.
  

I'll get back to you with the results, hopefully I'll manage to get them
done during this night.


Looking forward to that. ^^


Christian

Cheers,
Josef

On 13/05/14 11:03, Christian Balzer wrote:

I'm clearly talking to myself, but whatever.

For Greg, I've played with all the pertinent journal and filestore
options and TCP nodelay, no changes at all.

Is there anybody on this ML who's running a Ceph cluster with a fast
network and FAST filestore, so like me with a big HW cache in front of
a RAID/JBODs or using SSDs for final storage?

If so, what results do you get out of the fi

Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

2014-06-13 Thread Josef Johansson


Hey,

That sounds awful. Have you had any luck in increasing the performance?

Cheers,
Josef

Christian Balzer skrev 2014-05-23 17:57:

For what it's worth (very little in my case)...

Since the cluster wasn't in production yet and Firefly (0.80.1) did hit
Debian Jessie today I upgraded it.

Big mistake...

I did the recommended upgrade song and dance, MONs first, OSDs after that.

Then applied "ceph osd crush tunables default" as per the update
instructions and since "ceph -s" was whining about it.

Lastly I did a "ceph osd pool set rbd hashpspool true" and after that was
finished (people with either a big cluster or slow network probably should
avoid this like the plague) I re-ran the below fio from a VM (old or new
client libraries made no difference) again.

The result, 2800 write IOPS instead of 3200 with Emperor.

So much for improved latency and whatnot...

Christian

On Wed, 14 May 2014 21:33:06 +0900 Christian Balzer wrote:


Hello!

On Wed, 14 May 2014 11:29:47 +0200 Josef Johansson wrote:


Hi Christian,

I missed this thread, haven't been reading the list that well the last
weeks.

You already know my setup, since we discussed it in an earlier thread.
I don't have a fast backing store, but I see the slow IOPS when doing
randwrite inside the VM, with rbd cache. Still running dumpling here
though.


Nods, I do recall that thread.


A thought struck me that I could test with a pool that consists of OSDs
that have tempfs-based disks, think I have a bit more latency than your
IPoIB but I've pushed 100k IOPS with the same network devices before.
This would verify if the problem is with the journal disks. I'll also
try to run the journal devices in tempfs as well, as it would test
purely Ceph itself.


That would be interesting indeed.
Given what I've seen (with the journal at 20% utilization and the actual
filestore ataround 5%) I'd expect Ceph to be the culprit.
  

I'll get back to you with the results, hopefully I'll manage to get
them done during this night.


Looking forward to that. ^^


Christian

Cheers,
Josef

On 13/05/14 11:03, Christian Balzer wrote:

I'm clearly talking to myself, but whatever.

For Greg, I've played with all the pertinent journal and filestore
options and TCP nodelay, no changes at all.

Is there anybody on this ML who's running a Ceph cluster with a fast
network and FAST filestore, so like me with a big HW cache in front
of a RAID/JBODs or using SSDs for final storage?

If so, what results do you get out of the fio statement below per
OSD? In my case with 4 OSDs and 3200 IOPS that's about 800 IOPS per
OSD, which is of course vastly faster than the normal indvidual HDDs
could do.

So I'm wondering if I'm hitting some inherent limitation of how fast
a single OSD (as in the software) can handle IOPS, given that
everything else has been ruled out from where I stand.

This would also explain why none of the option changes or the use of
RBD caching has any measurable effect in the test case below.
As in, a slow OSD aka single HDD with journal on the same disk would
clearly benefit from even the small 32MB standard RBD cache, while in
my test case the only time the caching becomes noticeable is if I
increase the cache size to something larger than the test data size.
^o^

On the other hand if people here regularly get thousands or tens of
thousands IOPS per OSD with the appropriate HW I'm stumped.

Christian

On Fri, 9 May 2014 11:01:26 +0900 Christian Balzer wrote:


On Wed, 7 May 2014 22:13:53 -0700 Gregory Farnum wrote:


Oh, I didn't notice that. I bet you aren't getting the expected
throughput on the RAID array with OSD access patterns, and that's
applying back pressure on the journal.


In the a "picture" being worth a thousand words tradition, I give
you this iostat -x output taken during a fio run:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   50.820.00   19.430.170.00   29.58

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.0051.500.00 1633.50 0.00  7460.00
9.13 0.180.110.000.11   0.01   1.40 sdb
0.00 0.000.00 1240.50 0.00  5244.00 8.45 0.30
0.250.000.25   0.02   2.00 sdc   0.00 5.00
0.00 2468.50 0.00 13419.0010.87 0.240.100.00
0.10   0.09  22.00 sdd   0.00 6.500.00 1913.00
0.00 10313.0010.78 0.200.100.000.10   0.09
16.60

The %user CPU utilization is pretty much entirely the 2 OSD
processes, note the nearly complete absence of iowait.

sda and sdb are the OSDs RAIDs, sdc and sdd are the journal SSDs.
Look at these numbers, the lack of queues, the low wait and service
times (this is in ms) plus overall utilization.

The only conclusion I can draw from these numbers and the network
results below is that the latency happens within the OSD processes.

Regards,

Christian

When I suggested other tests, I meant with

Re: [ceph-users] bootstrap-mds, bootstrap-osd and admin keyring not found

2014-06-13 Thread Zhe Zhang

Shayan Saeed  writes:

> 
> 
> 
> Hi,
> I am following the standard deployment guide for ceph firefly. When I try
to do the step 5 for collecting the key, it gives me warnings saying that
keyrings not found for bootstrap-mds, bootstrap-osd and admin due to which
the next step for deploying osds fail. Other people on this forum have had a
similar problem in the past. How can this problem be solved?
> 
> 
> 
> Regards,Shayan Saeed
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@...
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

You could create keyrings with ceph-make-keys. Did you run ceph with
compiling source code?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Run ceph from source code

2014-06-13 Thread Zhe Zhang

Hello, there,

I am trying to run ceph from source code. configure, make and make install 
worked fine. But after done these steps, I can't see the binary files in 
/etc/init.d/. My current OS is Centos6.5. I also tried Ubuntu 12.04, the same 
issue occurred which said "unknown job ceph..." when I tried to use upstart to 
run monitors and osds. How should I start ceph with source code? basically I 
hope I could modified the code and run it from there.

Zhe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Run ceph from source code

2014-06-13 Thread Gregory Farnum

I don't know anybody who makes much use of "make install", so it's
probably not putting the init system scripts into place. So make sure
they aren't there, copy them from the source tree, and try again?
Patches to fix are welcome! :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Jun 13, 2014 at 1:41 PM, Zhe Zhang  wrote:
> Hello, there,
>
>
>
> I am trying to run ceph from source code. configure, make and make install
> worked fine. But after done these steps, I can't see the binary files in
> /etc/init.d/. My current OS is Centos6.5. I also tried Ubuntu 12.04, the
> same issue occurred which said "unknown job ceph..." when I tried to use
> upstart to run monitors and osds. How should I start ceph with source code?
> basically I hope I could modified the code and run it from there.
>
>
>
> Zhe
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD turned itself off

2014-06-13 Thread Josef Johansson


Hey,

Just examing what happened to an OSD, that was just turned off. Data has 
been moved away from it, so hesitating to turned it back on.


Got the below in the logs, any clues to what the assert talks about?

Cheers,
Josef

-1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t, 
const hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 
7fdacb88

c700 time 2014-06-11 21:13:54.036982
os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio 
|| got != -5)


 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
 1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned 
long, ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, 
std::vector >&)+0x350) [0x708230]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86) 
[0x713366]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3095) 
[0x71acb5]
 5: (PG::do_request(std::tr1::shared_ptr, 
ThreadPool::TPHandle&)+0x3f0) [0x812340]
 6: (OSD::dequeue_op(boost::intrusive_ptr, 
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
 7: (OSD::OpWQ::_process(boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x198) [0x770da8]
 8: (ThreadPool::WorkQueueVal, 
std::tr1::shared_ptr >, boost::intrusive_ptr 
>::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89

ce]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
 11: (()+0x6b50) [0x7fdadffdfb50]
 12: (clone()+0x6d) [0x7fdade53b0ed]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Gregory Farnum

The OSD did a read off of the local filesystem and it got back the EIO
error code. That means the store got corrupted or something, so it
killed itself to avoid spreading bad data to the rest of the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson  wrote:
> Hey,
>
> Just examing what happened to an OSD, that was just turned off. Data has
> been moved away from it, so hesitating to turned it back on.
>
> Got the below in the logs, any clues to what the assert talks about?
>
> Cheers,
> Josef
>
> -1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t, const
> hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88
> c700 time 2014-06-11 21:13:54.036982
> os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio ||
> got != -5)
>
>  ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
>  1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned long,
> ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
>  2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector std::allocator >&)+0x350) [0x708230]
>  3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86)
> [0x713366]
>  4: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3095) [0x71acb5]
>  5: (PG::do_request(std::tr1::shared_ptr,
> ThreadPool::TPHandle&)+0x3f0) [0x812340]
>  6: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
>  7: (OSD::OpWQ::_process(boost::intrusive_ptr,
> ThreadPool::TPHandle&)+0x198) [0x770da8]
>  8: (ThreadPool::WorkQueueVal,
> std::tr1::shared_ptr >, boost::intrusive_ptr
>>::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89
> ce]
>  9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
>  10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
>  11: (()+0x6b50) [0x7fdadffdfb50]
>  12: (clone()+0x6d) [0x7fdade53b0ed]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Josef Johansson


Hi Greg,

Thanks for the clarification. I believe the OSD was in the middle of a 
deep scrub (sorry for not mentioning this straight away), so then it 
could've been a silent error that got wind during scrub?


What's best practice when the store is corrupted like this?

Cheers,
Josef

Gregory Farnum skrev 2014-06-14 02:21:

The OSD did a read off of the local filesystem and it got back the EIO
error code. That means the store got corrupted or something, so it
killed itself to avoid spreading bad data to the rest of the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson  wrote:

Hey,

Just examing what happened to an OSD, that was just turned off. Data has
been moved away from it, so hesitating to turned it back on.

Got the below in the logs, any clues to what the assert talks about?

Cheers,
Josef

-1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t, const
hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88
c700 time 2014-06-11 21:13:54.036982
os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio ||
got != -5)

  ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
  1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned long,
ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
  2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector >&)+0x350) [0x708230]
  3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86)
[0x713366]
  4: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3095) [0x71acb5]
  5: (PG::do_request(std::tr1::shared_ptr,
ThreadPool::TPHandle&)+0x3f0) [0x812340]
  6: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
  7: (OSD::OpWQ::_process(boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x198) [0x770da8]
  8: (ThreadPool::WorkQueueVal,
std::tr1::shared_ptr >, boost::intrusive_ptr

::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89

ce]
  9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
  10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
  11: (()+0x6b50) [0x7fdadffdfb50]
  12: (clone()+0x6d) [0x7fdade53b0ed]
  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Gregory Farnum

On Fri, Jun 13, 2014 at 5:25 PM, Josef Johansson  wrote:
> Hi Greg,
>
> Thanks for the clarification. I believe the OSD was in the middle of a deep
> scrub (sorry for not mentioning this straight away), so then it could've
> been a silent error that got wind during scrub?

Yeah.

>
> What's best practice when the store is corrupted like this?

Remove the OSD from the cluster, and either reformat the disk or
replace as you judge appropriate.
-Greg

>
> Cheers,
> Josef
>
> Gregory Farnum skrev 2014-06-14 02:21:
>
>> The OSD did a read off of the local filesystem and it got back the EIO
>> error code. That means the store got corrupted or something, so it
>> killed itself to avoid spreading bad data to the rest of the cluster.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson 
>> wrote:
>>>
>>> Hey,
>>>
>>> Just examing what happened to an OSD, that was just turned off. Data has
>>> been moved away from it, so hesitating to turned it back on.
>>>
>>> Got the below in the logs, any clues to what the assert talks about?
>>>
>>> Cheers,
>>> Josef
>>>
>>> -1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t,
>>> const
>>> hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88
>>> c700 time 2014-06-11 21:13:54.036982
>>> os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> ||
>>> got != -5)
>>>
>>>   ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
>>>   1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned
>>> long,
>>> ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
>>>   2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
>>> std::vector>> std::allocator >&)+0x350) [0x708230]
>>>   3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86)
>>> [0x713366]
>>>   4: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3095)
>>> [0x71acb5]
>>>   5: (PG::do_request(std::tr1::shared_ptr,
>>> ThreadPool::TPHandle&)+0x3f0) [0x812340]
>>>   6: (OSD::dequeue_op(boost::intrusive_ptr,
>>> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
>>>   7: (OSD::OpWQ::_process(boost::intrusive_ptr,
>>> ThreadPool::TPHandle&)+0x198) [0x770da8]
>>>   8: (ThreadPool::WorkQueueVal,
>>> std::tr1::shared_ptr >, boost::intrusive_ptr

 ::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89
>>>
>>> ce]
>>>   9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
>>>   10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
>>>   11: (()+0x6b50) [0x7fdadffdfb50]
>>>   12: (clone()+0x6d) [0x7fdade53b0ed]
>>>   NOTE: a copy of the executable, or `objdump -rdS ` is
>>> needed to
>>> interpret this.
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Josef Johansson


Thanks for the quick response.

Cheers,
Josef

Gregory Farnum skrev 2014-06-14 02:36:

On Fri, Jun 13, 2014 at 5:25 PM, Josef Johansson  wrote:

Hi Greg,

Thanks for the clarification. I believe the OSD was in the middle of a deep
scrub (sorry for not mentioning this straight away), so then it could've
been a silent error that got wind during scrub?

Yeah.


What's best practice when the store is corrupted like this?

Remove the OSD from the cluster, and either reformat the disk or
replace as you judge appropriate.
-Greg


Cheers,
Josef

Gregory Farnum skrev 2014-06-14 02:21:


The OSD did a read off of the local filesystem and it got back the EIO
error code. That means the store got corrupted or something, so it
killed itself to avoid spreading bad data to the rest of the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson 
wrote:

Hey,

Just examing what happened to an OSD, that was just turned off. Data has
been moved away from it, so hesitating to turned it back on.

Got the below in the logs, any clues to what the assert talks about?

Cheers,
Josef

-1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t,
const
hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88
c700 time 2014-06-11 21:13:54.036982
os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio
||
got != -5)

   ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
   1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned
long,
ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
   2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
std::vector >&)+0x350) [0x708230]
   3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86)
[0x713366]
   4: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3095)
[0x71acb5]
   5: (PG::do_request(std::tr1::shared_ptr,
ThreadPool::TPHandle&)+0x3f0) [0x812340]
   6: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
   7: (OSD::OpWQ::_process(boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x198) [0x770da8]
   8: (ThreadPool::WorkQueueVal,
std::tr1::shared_ptr >, boost::intrusive_ptr

::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89

ce]
   9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
   10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
   11: (()+0x6b50) [0x7fdadffdfb50]
   12: (clone()+0x6d) [0x7fdade53b0ed]
   NOTE: a copy of the executable, or `objdump -rdS ` is
needed to
interpret this.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Run ceph from source code

2014-06-13 Thread Mark Kirkwood


I compile and run from the src build quite often. Here is my recipe:

$ ./autogen.sh
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var 
--with-radosgw

$ time make
$ sudo make install
$ sudo cp src/init-ceph /etc/init.d/ceph
$ sudo cp src/init-radosgw /etc/init.d/radosgw
$ sudo chmod 755 /etc/init.d/radosgw
$ sudo cp src/upstart/* /etc/init
$ sudo cp udev/* /lib/udev/rules.d/

Regards

Mark

On 14/06/14 10:07, Gregory Farnum wrote:

I don't know anybody who makes much use of "make install", so it's
probably not putting the init system scripts into place. So make sure
they aren't there, copy them from the source tree, and try again?
Patches to fix are welcome! :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Jun 13, 2014 at 1:41 PM, Zhe Zhang  wrote:

Hello, there,



I am trying to run ceph from source code. configure, make and make install
worked fine. But after done these steps, I can't see the binary files in
/etc/init.d/. My current OS is Centos6.5. I also tried Ubuntu 12.04, the
same issue occurred which said "unknown job ceph..." when I tried to use
upstart to run monitors and osds. How should I start ceph with source code?
basically I hope I could modified the code and run it from there.



Zhe


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Re: [ceph-users] pid_max value?

Re: [ceph-users] Moving Ceph cluster to different network segment

Re: [ceph-users] Moving Ceph cluster to different network segment

Re: [ceph-users] Moving Ceph cluster to different network segment

Re: [ceph-users] Moving Ceph cluster to different network segment

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

[ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Strange qemu-rbd I/O behavior when booting Windows VM

Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

Re: [ceph-users] bootstrap-mds, bootstrap-osd and admin keyring not found

[ceph-users] Run ceph from source code

Re: [ceph-users] Run ceph from source code

[ceph-users] OSD turned itself off

Re: [ceph-users] OSD turned itself off

Re: [ceph-users] OSD turned itself off

Re: [ceph-users] OSD turned itself off

Re: [ceph-users] OSD turned itself off

Re: [ceph-users] Run ceph from source code

31 matches

Site Navigation

Mail list logo

Footer information