Re: [ceph-users] Ceph with Clos IP fabric

2017-04-24 Thread Aaron Bassett
Agreed. In an ideal world I would have interleaved all my compute, long term 
storage and processing posix. Unfortunately, business doesn't always work out 
so nicely so I'm left with buying and building out to match changing needs. In 
this case we are a small part of a larger org and have been allocated X racks 
in the cage, which is at this point land locked with no room to expand so it is 
actual floor space that's limited. Hence the necessity to go as dense as 
possible when adding any new capacity. Luckily ceph is flexible enough to 
function fine when deployed like an EMC solution, it's just muuuch cheaper and 
more fun to operate!

Aaron

On Apr 24, 2017, at 12:59 AM, Richard Hesse 
mailto:richard.he...@weebly.com>> wrote:

It's not a requirement to build out homogeneous racks of ceph gear. Most larger 
places don't do that (it creates weird hot spots).  If you have 5 racks of 
gear, you're better off spreading out servers in those 5 than just a pair of 
racks that are really built up. In Aaron's case, he can easily do that since 
he's not using a cluster network.

Just be sure to dial in your crush map and failure domains with only a pair of 
installed cabinets.

Thanks for sharing Christian! It's always good to hear about how others are 
using and deploying Ceph, while coming to similar and different conclusions.

Also,when you say datacenter space is expensive, are you referring to power or 
actual floor space? Datacenter space is almost always sold by power and floor 
space is usually secondary. Are there markets where that's opposite? If so, 
those are ripe for new entrants!


On Apr 23, 2017 7:56 PM, "Christian Balzer" 
mailto:ch...@gol.com>> wrote:

Hello,

Aaron pretty much stated most of what I was going to write, but to
generalize things and make some points more obvious, I shall pipe up as
well.

On Sat, 22 Apr 2017 21:45:58 -0700 Richard Hesse wrote:

> Out of curiosity, why are you taking a scale-up approach to building your
> ceph clusters instead of a scale-out approach? Ceph has traditionally been
> geared towards a scale-out, simple shared nothing mindset.

While true, scale-out does come at a cost:
a) rack space, which is mighty expensive where we want/need to be and also
of limited availability in those locations.
b) increased costs by having more individual servers, as in having two
servers with 6 OSDs versus 1 with 12 OSDs will cost you about 30-40% more
at the least (chassis, MB, PSU, NIC).

And then there is the whole scale thing in general, I'm getting the
impression that the majority of Ceph users have small to at best medium
sized clusters, simply because they don't need all that much capacity (in
terms of storage space).

Case in point, our main production Ceph clusters fit into 8-10U with 3-4
HDD based OSD servers and 2-4 SSD based cache tiers, obviously at this
size with everything being redundant (switches, PDU, PSU).
Serving hundreds (nearly 600 atm) of VMs, with a planned peak around
800 VMs.
That Ceph cluster will never have to grow beyond this size.
For me Ceph (RBD) was/is a more scalable approach than DRBD, allowing for
n+1 compute node deployments instead of having pairs (where one can't live
migrate to outside of this pair).

>These dual ToR
> deploys remind me of something from EMC, not ceph. Really curious as I'd
> rather have 5-6 racks of single ToR switches as opposed to three racks of
> dual ToR. Is there a specific application or requirement? It's definitely
> adding a lot of complexity; just wondering what the payoff is.
>

If you have plenty of racks, bully for you.
Though personally I'd try to keep failure domains (especially when they
are as large as full rack!) to something like 10% of the cluster.
We're not using Ethernet for the Ceph network (IPoIB), but if we were it
would be dual TORS with MC-LAG (and dual PSU, PDU) all the way.
Why have a SPOF that WILL impact your system (a rack worth of data
movement) in the first place?

Regards,

Christian

> Also, why are you putting your "cluster network" on the same physical
> interfaces but on separate VLANs? Traffic shaping/policing? What's your
> link speed there on the hosts? (25/40gbps?)
>
> On Sat, Apr 22, 2017 at 12:13 PM, Aaron Bassett 
> mailto:aaron.bass...@nantomics.com>
> > wrote:
>
> > FWIW, I use a CLOS fabric with layer 3 right down to the hosts and
> > multiple ToRs to enable HA/ECMP to each node. I'm using Cumulus Linux's
> > "redistribute neighbor" feature, which advertises a /32 for any ARP'ed
> > neighbor. I set up the hosts with an IP on each physical interface and on
> > an aliased looopback: lo:0. I handle the separate cluster network by adding
> > a vlan to each interface and routing those separately on the ToRs with acls
> > to keep traffic apart.
> >
> > Their documentation may help clarify a bit:
> > https://docs.cumulusnetworks.com/display/DOCS/Redistribute+

[ceph-users] Hadoop with CephFS

2017-04-24 Thread M Ranga Swami Reddy
Hello,
I am using the ceph 10.2.5 release version. Is this version's cephFS
support hadoop cluster requirement? (Anyone using the same)

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken

2017-04-24 Thread Alfredo Deza
On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen  wrote:
> Well, I can definitely build my own,
>
> 1. Precise is NOT EOL on Hammer release, which was confirmed in
> previous mail thread. So we still need to maintain point-in-time
> hammer package for end users.

Ceph Hammer is EOL

>
> 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the
> repo index(only contains latest package in  index), now all 0.94.x
> package on precise are not installable via apt.
>

I think this may be because we didn't built precise for 0.94.10 and
Debian repositories do not support multi-version packages. So although
other versions are there, but the latest one isn't, the repository
acts as there is nothing for Precise.

I would suggest an upgrade to a newer Ceph version at this point,
although Precise isn't built for any newer Ceph versions, so
effectively you are looking at
upgrading to a newer OS as well.

>
> 2017-04-24 14:02 GMT+08:00 xiaoguang fan :
>> If you need this deb package 0.94.10 on precise(12.04), I think you can
>> build it by yourself, you can use the script make_deps.sh
>>
>> 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen :
>>>
>>> Hi,
>>>
>>>  The 0.94.10 packages were not build for Ubuntu Precise, till now.
>>> What is worse, the dist discription
>>>
>>> (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages)
>>>  doesnt contains any ceph core pacages.
>>>
>>>  It make Precise user unable provision their ceph cluster/client.
>>> Could anyone pls help to fix it?
>>>
>>> Xiaoxi
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-24 Thread Ben Morrice

Hello Orit,

Could it be that something has changed in 10.2.5+ which is related to 
reading the endpoints from the zone/period config?


In my master zone I have specified the endpoint with a trailing 
backslash (which is also escaped), however I do not define the secondary 
endpoint this way. Am I hitting a bug here?


Kind regards,

Ben Morrice

__
Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

On 21/04/17 09:36, Ben Morrice wrote:

Hello Orit,

Please find attached the output from the radosgw commands and the 
relevant section from ceph.conf (radosgw)


bbp-gva-master is running 10.2.5

bbp-gva-secondary is running 10.2.7

Kind regards,

Ben Morrice

__
Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

On 21/04/17 07:55, Orit Wasserman wrote:

Hi Ben,

On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice  
wrote:

Hi all,

I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 
(RHEL7)
and authentication is in a very bad state. This installation is part 
of a
multigw configuration, and I have just updated one host in the 
secondary

zone (all other hosts/zones are running 10.2.5).

On the 10.2.7 server I cannot authenticate as a user (normally 
backed by

OpenStack Keystone), but even worse I can also not authenticate with an
admin user.

Please see [1] for the results of performing a list bucket operation 
with

python boto (script works against rgw 10.2.5)

Also, if I try to authenticate from the 'master' rgw zone with a
"radosgw-admin sync status --rgw-zone=bbp-gva-master" I get:

"ERROR: failed to fetch datalog info"

"failed to retrieve sync info: (13) Permission denied"

The above errors correlates to the errors in the log on the server 
running

10.2.7 (debug level 20) at [2]

I'm not sure what I have done wrong or can try next?

By the way, downgrading the packages from 10.2.7 to 10.2.5 returns
authentication functionality

Can you provide the following info:
radosgw-admin period get
radsogw-admin zonegroup get
radsogw-admin zone get

Can you provide your ceph.conf?

Thanks,
Orit


[1]
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva 



[2]
/bbpsrvc15.cscs.ch/admin/log
2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated
digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM=
2017-04-20 16:43:04.916255 7ff87c6c0700 15
auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU=
2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34
2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request
2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER:
err_no=-2027 new_err_no=-2027
2017-04-20 16:43:04.916329 7ff87c6c0700  2 req 354:0.052585:s3:GET
/admin/log:get_obj:op status=0
2017-04-20 16:43:04.916339 7ff87c6c0700  2 req 354:0.052595:s3:GET
/admin/log:get_obj:http status=403
2017-04-20 16:43:04.916343 7ff87c6c0700  1 == req done
req=0x7ff87c6ba710 op status=0 http_status=403 ==
2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() 
returned -2027

2017-04-20 16:43:04.916390 7ff87c6c0700  1 civetweb: 0x7ff990015610:
10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log 
HTTP/1.1" 403 0

- -
2017-04-20 16:43:04.917212 7ff9777e6700 20
cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate()
2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync:
incremental_sync:1544: shard_id=20
mdlog_marker=1_1492686039.901886_5551978.1
sync_marker.marker=1_1492686039.901886_5551978.1 period_marker=
2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync:
incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20
2017-04-20 16:43:04.917236 7ff9777e6700 20
cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: 
operate()

2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate:
shard_id=20: init request
2017-04-20 16:43:04.917240 7ff9777e6700 20
cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: 
operate()

2017-04-20 16:43:04.917241 7ff9777e6700 20 rgw meta sync: operate:
shard_id=20: reading shard status
2017-04-20 16:43:04.917303 7ff9777e6700 20 run: stack=0x7ff97000d420 
is io

blocked
2017-04-20 16:43:04.918285 7ff9777e6700 20
cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: 
operate()

2017-04-20 16:43:04.918295 7ff9777e6700 20 rgw meta sync: operate:
shard_id=20: reading shard status complete
2017-04-20 16:43:04.918307 7ff9777e6700 20 rgw meta sync: shard_id=20
marker=1_1492686039.901886_5551978.1 last_update=2017-04-20
13:00:39.0.901886s
2017-04-20 16:43:04.918316 7ff9777e6700 20
cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: 
operate()

2017-04-20 16:43:04.918317 7ff9777e6700 

Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken

2017-04-24 Thread Alfredo Deza
On Mon, Apr 24, 2017 at 8:53 AM, Alfredo Deza  wrote:
> On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen  wrote:
>> Well, I can definitely build my own,
>>
>> 1. Precise is NOT EOL on Hammer release, which was confirmed in
>> previous mail thread. So we still need to maintain point-in-time
>> hammer package for end users.
>
> Ceph Hammer is EOL
>
>>
>> 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the
>> repo index(only contains latest package in  index), now all 0.94.x
>> package on precise are not installable via apt.
>>
>
> I think this may be because we didn't built precise for 0.94.10 and
> Debian repositories do not support multi-version packages. So although
> other versions are there, but the latest one isn't, the repository
> acts as there is nothing for Precise.
>
> I would suggest an upgrade to a newer Ceph version at this point,
> although Precise isn't built for any newer Ceph versions, so
> effectively you are looking at
> upgrading to a newer OS as well.
>
>>
>> 2017-04-24 14:02 GMT+08:00 xiaoguang fan :
>>> If you need this deb package 0.94.10 on precise(12.04), I think you can
>>> build it by yourself, you can use the script make_deps.sh
>>>
>>> 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen :

 Hi,

  The 0.94.10 packages were not build for Ubuntu Precise, till now.
 What is worse, the dist discription

 (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages)
  doesnt contains any ceph core pacages.

  It make Precise user unable provision their ceph cluster/client.
 Could anyone pls help to fix it?

I've started to build the Precise packages for that version, hopefully
we can get something worked out today or tomorrow. You are right that
Precise wasn't EOL when
that version of Hammer was released, and this was an omission on our end.



 Xiaoxi
 --
 To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v12.0.2 Luminous (dev) released

2017-04-24 Thread Abhishek Lekshmanan
This is the third development checkpoint release of Luminous, the next
long term
stable release.

Major changes from v12.0.1
--
* The original librados rados_objects_list_open (C) and objects_begin
  (C++) object listing API, deprecated in Hammer, has finally been
  removed.  Users of this interface must update their software to use
  either the rados_nobjects_list_open (C) and nobjects_begin (C++) API or
  the new rados_object_list_begin (C) and object_list_begin (C++) API
  before updating the client-side librados library to Luminous.

  Object enumeration (via any API) with the latest librados version
  and pre-Hammer OSDs is no longer supported.  Note that no in-tree
  Ceph services rely on object enumeration via the deprecated APIs, so
  only external librados users might be affected.

  The newest (and recommended) rados_object_list_begin (C) and
  object_list_begin (C++) API is only usable on clusters with the
  SORTBITWISE flag enabled (Jewel and later).  (Note that this flag is
  required to be set before upgrading beyond Jewel.)

* CephFS clients without the 'p' flag in their authentication capability
  string will no longer be able to set quotas or any layout fields.  This
  flag previously only restricted modification of the pool and namespace
  fields in layouts.

* CephFS directory fragmentation (large directory support) is enabled
  by default on new filesystems.  To enable it on existing filesystems
  use "ceph fs set  allow_dirfrags".

* CephFS will generate a health warning if you have fewer standby daemons
  than it thinks you wanted.  By default this will be 1 if you ever had
  a standby, and 0 if you did not.  You can customize this using
  ``ceph fs set  standby_count_wanted ``.  Setting it
  to zero will effectively disable the health check.

* The "ceph mds tell ..." command has been removed.  It is superseded
  by "ceph tell mds. ..."

* RGW introduces server side encryption of uploaded objects with 3
options for
  the management of encryption keys, automatic encryption (only
recommended for
  test setups), customer provided keys similar to Amazon SSE KMS
specification &
  using a key management service (openstack barbician)

For a more detailed changelog, refer to
http://ceph.com/releases/ceph-v12-0-2-luminous-dev-released/

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-12.0.2.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* For ceph-deploy, see
http://docs.ceph.com/docs/master/install/install-ceph-deploy
* Release sha1: 5a1b6b3269da99a18984c138c23935e5eb96f73e

--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-24 Thread Orit Wasserman
Hi Ben,

On Mon, Apr 24, 2017 at 4:36 PM, Ben Morrice  wrote:
> Hello Orit,
>
> Could it be that something has changed in 10.2.5+ which is related to
> reading the endpoints from the zone/period config?
>

I don't remember any change for endpoints config, but I will go over
the changes to make sure.
There were a few changes with tenant handling that may cause this regression.


> In my master zone I have specified the endpoint with a trailing backslash
> (which is also escaped), however I do not define the secondary endpoint this
> way. Am I hitting a bug here?
>

Can you update the secondary endpoint and see if it helps?

Please open a bug in tracker with regarding to this issue.
Regards,
Orit

> Kind regards,
>
> Ben Morrice
>
> __
> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
> EPFL / BBP
> Biotech Campus
> Chemin des Mines 9
> 1202 Geneva
> Switzerland
>
> On 21/04/17 09:36, Ben Morrice wrote:
>>
>> Hello Orit,
>>
>> Please find attached the output from the radosgw commands and the relevant
>> section from ceph.conf (radosgw)
>>
>> bbp-gva-master is running 10.2.5
>>
>> bbp-gva-secondary is running 10.2.7
>>
>> Kind regards,
>>
>> Ben Morrice
>>
>> __
>> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
>> EPFL / BBP
>> Biotech Campus
>> Chemin des Mines 9
>> 1202 Geneva
>> Switzerland
>>
>> On 21/04/17 07:55, Orit Wasserman wrote:
>>>
>>> Hi Ben,
>>>
>>> On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice  wrote:

 Hi all,

 I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7
 (RHEL7)
 and authentication is in a very bad state. This installation is part of
 a
 multigw configuration, and I have just updated one host in the secondary
 zone (all other hosts/zones are running 10.2.5).

 On the 10.2.7 server I cannot authenticate as a user (normally backed by
 OpenStack Keystone), but even worse I can also not authenticate with an
 admin user.

 Please see [1] for the results of performing a list bucket operation
 with
 python boto (script works against rgw 10.2.5)

 Also, if I try to authenticate from the 'master' rgw zone with a
 "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get:

 "ERROR: failed to fetch datalog info"

 "failed to retrieve sync info: (13) Permission denied"

 The above errors correlates to the errors in the log on the server
 running
 10.2.7 (debug level 20) at [2]

 I'm not sure what I have done wrong or can try next?

 By the way, downgrading the packages from 10.2.7 to 10.2.5 returns
 authentication functionality
>>>
>>> Can you provide the following info:
>>> radosgw-admin period get
>>> radsogw-admin zonegroup get
>>> radsogw-admin zone get
>>>
>>> Can you provide your ceph.conf?
>>>
>>> Thanks,
>>> Orit
>>>
 [1]
 boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
 >>>
 encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva

 [2]
 /bbpsrvc15.cscs.ch/admin/log
 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated
 digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM=
 2017-04-20 16:43:04.916255 7ff87c6c0700 15
 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU=
 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34
 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request
 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER:
 err_no=-2027 new_err_no=-2027
 2017-04-20 16:43:04.916329 7ff87c6c0700  2 req 354:0.052585:s3:GET
 /admin/log:get_obj:op status=0
 2017-04-20 16:43:04.916339 7ff87c6c0700  2 req 354:0.052595:s3:GET
 /admin/log:get_obj:http status=403
 2017-04-20 16:43:04.916343 7ff87c6c0700  1 == req done
 req=0x7ff87c6ba710 op status=0 http_status=403 ==
 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned
 -2027
 2017-04-20 16:43:04.916390 7ff87c6c0700  1 civetweb: 0x7ff990015610:
 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1"
 403 0
 - -
 2017-04-20 16:43:04.917212 7ff9777e6700 20
 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate()
 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync:
 incremental_sync:1544: shard_id=20
 mdlog_marker=1_1492686039.901886_5551978.1
 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker=
 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync:
 incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20
 2017-04-20 16:43:04.917236 7ff9777e6700 20
 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine:
 operate()
 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate:
 shard_id=20: init r

[ceph-users] CEPH MON Updates Live

2017-04-24 Thread Ashley Merrick
Hey,

Quick question hopefully have tried a few Google searches but noting concrete.

I am running KVM VM's using KRBD, if I add and remove CEPH mon's are the 
running VM's updated with this information. Or do I need to reboot the VM's for 
them to be provided with the change of MON's?

Thanks!
Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Maintaining write performance under a steady intake of small objects

2017-04-24 Thread Florian Haas
Hi everyone,

so this will be a long email — it's a summary of several off-list
conversations I've had over the last couple of weeks, but the TL;DR
version is this question:

How can a Ceph cluster maintain near-constant performance
characteristics while supporting a steady intake of a large number of
small objects?

This is probably a very common problem, but we have a bit of a dearth of
truly adequate best practices for it. To clarify, what I'm talking about
is an intake on the order of millions per hour. That might sound like a
lot, but if you consider an intake of 700 objects/s at 20 KiB/object,
that's just 14 MB/s. That's not exactly hammering your cluster — but it
amounts to 2.5 million objects created per hour.

Under those circumstances, two things tend to happen:

(1) There's a predictable decline in insert bandwidth. In other words, a
cluster that may allow inserts at a rate of 2.5M/hr rapidly goes down to
1.8M/hr and then 1.7M/hr ... and by "rapidly" I mean hours, not days. As
I understand it, this is mainly due to the FileStore's propensity to
index whole directories with a readdir() call which is an linear-time
operation.

(2) FileStore's mitigation strategy for this is to proactively split
directories so they never get so large as for readdir() to become a
significant bottleneck. That's fine, but in a cluster with a steadily
growing number of objects, that tends to lead to lots and lots of
directory splits happening simultanously — causing inserts to slow to a
crawl.

For (2) there is a workaround: we can initialize a pool with an expected
number of objects, set a pool max_objects quota, and disable on-demand
splitting altogether by setting a negative filestore merge threshold.
That way, all splitting occurs at pool creation time, and before another
split were to happen, you hit the pool quota. So you never hit that
brick wall causes by the thundering herd of directory splits. Of course,
it also means that when you want to insert yet more objects, you need
another pool — but you can handle that at the application level.

It's actually a bit of a dilemma: we want directory splits to happen
proactively, so that readdir() doesn't slow things down, but then we
also *don't* want them to happen, because while they do, inserts flatline.

(2) will likely be killed off completely by BlueStore, because there are
no more directories, hence nothing to split.

For (1) there really isn't a workaround that I'm aware of for FileStore.
And at least preliminary testing shows that BlueStore clusters suffer
from similar, if not the same, performance degradation (although, to be
fair, I haven't yet seen tests under the above parameters with rocksdb
and WAL on NVMe hardware).

For (1) however I understand that there would be a potential solution in
FileStore itself, by throwing away Ceph's own directory indexing and
just rely on flat directory lookups — which should be logarithmic-time
operations in both btrfs and XFS, as both use B-trees for directory
indexing. But I understand that that would be a fairly massive operation
that looks even less attractive to undertake with BlueStore around the
corner.

One suggestion that has been made (credit to Greg) was to do object
packing, i.e. bunch up a lot of discrete data chunks into a single RADOS
object. But in terms of distribution and lookup logic that would have to
be built on top, that seems weird to me (CRUSH on top of CRUSH to find
out which RADOS object a chunk belongs to, or some such?)

So I'm hoping for the likes of Wido and Dan and Mark to have some
alternate suggestions here: what's your take on this? Do you have
suggestions for people with a constant intake of small objects?

Looking forward to hearing your thoughts.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hung rbd requests for one pool

2017-04-24 Thread Phil Lacroute
One guest VM on my test cluster has hung for more than 24 hours while running a 
fio test on an RBD device, but other VMs accessing other images in the same 
pool are fine.  I was able to reproduce the problem by running “rbd info” on 
the same pool as the stuck VM with some debug tracing on (see log below).  How 
can I narrow this down further or resolve the problem?

Here are a few details about the cluster:

ceph version 10.2.7
Three monitors and six OSD nodes with three OSDs each
Each OSD has one SSD with separate partitions for the journal and data, using 
XFS
Clients are KVM guests using rbd devices with virtio

Cluster is healthy:
ceph7:~$ sudo ceph status
cluster 876a19e2-7f61-4774-a6b3-eaab4004f45f
 health HEALTH_OK
 monmap e1: 3 mons at 
{a=192.168.206.10:6789/0,b=192.168.206.11:6789/0,c=192.168.206.12:6789/0}
election epoch 6, quorum 0,1,2 a,b,c
 osdmap e27: 18 osds: 18 up, 18 in
flags sortbitwise,require_jewel_osds
  pgmap v240894: 576 pgs, 2 pools, 416 GB data, 104 kobjects
1248 GB used, 2606 GB / 3854 GB avail
 576 active+clean
  client io 2548 kB/s rd, 2632 kB/s wr, 493 op/s rd, 1121 op/s wr

Log output from “rbd info” on the client node (not in a VM):
ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
2017-04-24 11:30:57.048750 7f55365c5d40  1 -- :/0 messenger.start
2017-04-24 11:30:57.049223 7f55365c5d40  1 -- :/3282647735 --> 
192.168.206.11:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 
0x55c254e1ccc0 con 0x55c254e17850
2017-04-24 11:30:57.050077 7f55365bd700  1 -- 192.168.206.17:0/3282647735 
learned my addr 192.168.206.17:0/3282647735
2017-04-24 11:30:57.051040 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 1  mon_map magic: 0 v1  473+0+0 (2270207254 
0 0) 0x7f550b80 con 0x55c254e17850
2017-04-24 11:30:57.051148 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  
33+0+0 (2714966539 0 0) 0x7f551040 con 0x55c254e17850
2017-04-24 11:30:57.051328 7f551a627700  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.11:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 
0x7f5504001860 con 0x55c254e17850
2017-04-24 11:30:57.052239 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  
206+0+0 (3323982069 0 0) 0x7f551040 con 0x55c254e17850
2017-04-24 11:30:57.052399 7f551a627700  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.11:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 
0x7f5504003370 con 0x55c254e17850
2017-04-24 11:30:57.053313 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  
393+0+0 (1107778031 0 0) 0x7f5508c0 con 0x55c254e17850
2017-04-24 11:30:57.053415 7f551a627700  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.11:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x55c254e1d290 
con 0x55c254e17850
2017-04-24 11:30:57.053477 7f55365c5d40  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.11:6789/0 -- mon_subscribe({osdmap=0}) v2 -- ?+0 0x55c254e12df0 con 
0x55c254e17850
2017-04-24 11:30:57.053851 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 5  mon_map magic: 0 v1  473+0+0 (2270207254 
0 0) 0x7f551360 con 0x55c254e17850
2017-04-24 11:30:57.054058 7f551a627700  1 -- 192.168.206.17:0/3282647735 <== 
mon.1 192.168.206.11:6789/0 6  osd_map(27..27 src has 1..27) v3  
13035+0+0 (2602332718 0 0) 0x7f550cc0 con 0x55c254e17850
2017-04-24 11:30:57.054376 7f55365c5d40  5 librbd::AioImageRequestWQ: 
0x55c254e21c10 : ictx=0x55c254e20760
2017-04-24 11:30:57.054498 7f55365c5d40 20 librbd::ImageState: 0x55c254e19330 
open
2017-04-24 11:30:57.054503 7f55365c5d40 10 librbd::ImageState: 0x55c254e19330 
0x55c254e19330 send_open_unlock
2017-04-24 11:30:57.054512 7f55365c5d40 10 librbd::image::OpenRequest: 
0x55c254e22590 send_v2_detect_header
2017-04-24 11:30:57.054632 7f55365c5d40  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.13:6802/22690 -- osd_op(client.4375.0:1 1.ba46737 rbd_id.image1 
[stat] snapc 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x55c254e25d00 
con 0x55c254e248d0
2017-04-24 11:30:57.056830 7f5518421700  1 -- 192.168.206.17:0/3282647735 <== 
osd.10 192.168.206.13:6802/22690 1  osd_op_reply(1 rbd_id.image1 [stat] 
v0'0 uv7 ondisk = 0) v7  133+0+16 (2025423138 0 1760854024) 0x7f54fb40 
con 0x55c254e248d0
2017-04-24 11:30:57.056949 7f5512ffd700 10 librbd::image::OpenRequest: 
handle_v2_detect_header: r=0
2017-04-24 11:30:57.056965 7f5512ffd700 10 librbd::image::OpenRequest: 
0x55c254e22590 send_v2_get_id
2017-04-24 11:30:57.057026 7f5512ffd700  1 -- 192.168.206.17:0/3282647735 --> 
192.168.206.13:6802/22690 -- osd_op(client.4375.0:2 1.ba46737 rbd_id.image1 
[call rbd.get_id] snapc 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 
0x7f54f40021f0

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
 wrote:
> 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
> 0x7f54f40064e0


You can attempt to run "ceph daemon osd.XYZ ops" against the
potentially stuck OSD to figure out what it's stuck doing.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken

2017-04-24 Thread Alfredo Deza
On Mon, Apr 24, 2017 at 10:05 AM, Alfredo Deza  wrote:
> On Mon, Apr 24, 2017 at 8:53 AM, Alfredo Deza  wrote:
>> On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen  wrote:
>>> Well, I can definitely build my own,
>>>
>>> 1. Precise is NOT EOL on Hammer release, which was confirmed in
>>> previous mail thread. So we still need to maintain point-in-time
>>> hammer package for end users.
>>
>> Ceph Hammer is EOL
>>
>>>
>>> 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the
>>> repo index(only contains latest package in  index), now all 0.94.x
>>> package on precise are not installable via apt.

Can you try now? 0.94.10 for precise was just pushed out.

Let me know if you get into any issues
>>>
>>
>> I think this may be because we didn't built precise for 0.94.10 and
>> Debian repositories do not support multi-version packages. So although
>> other versions are there, but the latest one isn't, the repository
>> acts as there is nothing for Precise.
>>
>> I would suggest an upgrade to a newer Ceph version at this point,
>> although Precise isn't built for any newer Ceph versions, so
>> effectively you are looking at
>> upgrading to a newer OS as well.
>>
>>>
>>> 2017-04-24 14:02 GMT+08:00 xiaoguang fan :
 If you need this deb package 0.94.10 on precise(12.04), I think you can
 build it by yourself, you can use the script make_deps.sh

 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen :
>
> Hi,
>
>  The 0.94.10 packages were not build for Ubuntu Precise, till now.
> What is worse, the dist discription
>
> (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages)
>  doesnt contains any ceph core pacages.
>
>  It make Precise user unable provision their ceph cluster/client.
> Could anyone pls help to fix it?
>
> I've started to build the Precise packages for that version, hopefully
> we can get something worked out today or tomorrow. You are right that
> Precise wasn't EOL when
> that version of Hammer was released, and this was an omission on our end.
>
>
>
> Xiaoxi
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Phil Lacroute
Jason,

Thanks for the suggestion.  That seems to show it is not the OSD that got stuck:

ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
…
2017-04-24 13:13:49.761076 7f739aefc700  1 -- 192.168.206.17:0/1250293899 --> 
192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 
rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc 
0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con 
0x7f737c0064e0
…
2017-04-24 13:14:04.756328 7f73a2880700  1 -- 192.168.206.17:0/1250293899 --> 
192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con 
0x7f737c0064e0

ceph0:~$ sudo ceph pg map 1.af6f1e38
osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]

ceph3:~$ sudo ceph daemon osd.11 ops
{
"ops": [],
"num_ops": 0
}

I repeated this a few times and it’s always the same command and same placement 
group that hangs, but OSD11 has no ops (and neither do OSD16 and OSD2, although 
I think that’s expected).

Is there other tracing I should do on the OSD or something more to look at on 
the client?

Thanks,
Phil

> On Apr 24, 2017, at 12:39 PM, Jason Dillaman  wrote:
> 
> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
>  wrote:
>> 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
>> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
>> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
>> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
>> 0x7f54f40064e0
> 
> 
> You can attempt to run "ceph daemon osd.XYZ ops" against the
> potentially stuck OSD to figure out what it's stuck doing.
> 
> -- 
> Jason



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Peter Maloney
On 04/24/17 22:23, Phil Lacroute wrote:
> Jason,
>
> Thanks for the suggestion.  That seems to show it is not the OSD that
> got stuck:
>
> ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
> …
> 2017-04-24 13:13:49.761076 7f739aefc700  1 --
> 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 --
> osd_op(client.4384.0:3 1.af6f1e38 rbd_header.1058238e1f29 [call
> rbd.get_size,call rbd.get_object_prefix] snapc 0=[]
> ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
> 0x7f737c0064e0
> …
> 2017-04-24 13:14:04.756328 7f73a2880700  1 --
> 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- ping
> magic: 0 v1 -- ?+0 0x7f7374000fc0 con 0x7f737c0064e0
>
> ceph0:~$ sudo ceph pg map 1.af6f1e38
> osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
>
> ceph3:~$ sudo ceph daemon osd.11 ops
> {
> "ops": [],
> "num_ops": 0
> }
>
> I repeated this a few times and it’s always the same command and same
> placement group that hangs, but OSD11 has no ops (and neither do OSD16
> and OSD2, although I think that’s expected).
>
> Is there other tracing I should do on the OSD or something more to
> look at on the client?
>
> Thanks,
> Phil
Does it still happen if you disable exclusive-lock, or maybe separately
fast-diff and object-map?

I have a similar problem where VMs with those 3 features hang and need
kill -9, and without them, they never hang.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
Just to cover all the bases, is 192.168.206.13:6804 really associated
with a running daemon for OSD 11?

On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute
 wrote:
> Jason,
>
> Thanks for the suggestion.  That seems to show it is not the OSD that got
> stuck:
>
> ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
> …
> 2017-04-24 13:13:49.761076 7f739aefc700  1 -- 192.168.206.17:0/1250293899
> --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
> 0x7f737c0064e0
> …
> 2017-04-24 13:14:04.756328 7f73a2880700  1 -- 192.168.206.17:0/1250293899
> --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
> 0x7f737c0064e0
>
> ceph0:~$ sudo ceph pg map 1.af6f1e38
> osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
>
> ceph3:~$ sudo ceph daemon osd.11 ops
> {
> "ops": [],
> "num_ops": 0
> }
>
> I repeated this a few times and it’s always the same command and same
> placement group that hangs, but OSD11 has no ops (and neither do OSD16 and
> OSD2, although I think that’s expected).
>
> Is there other tracing I should do on the OSD or something more to look at
> on the client?
>
> Thanks,
> Phil
>
> On Apr 24, 2017, at 12:39 PM, Jason Dillaman  wrote:
>
> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
>  wrote:
>
> 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
> 0x7f54f40064e0
>
>
>
> You can attempt to run "ceph daemon osd.XYZ ops" against the
> potentially stuck OSD to figure out what it's stuck doing.
>
> --
> Jason
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Phil Lacroute
Yes it is the correct IP and port:

ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804
tcp0  0 192.168.206.13:6804 0.0.0.0:*   LISTEN  
22934/ceph-osd  

I turned up the logging on the osd and I don’t think it received the request.  
However I also noticed a large number of TCP connections to that specific osd 
from the client (192.168.206.17) in CLOSE_WAIT state (131 to be exact).  I 
think there may be a bug causing the osd not to close file descriptors.  Prior 
to the hang I had been running tests continuously for several days so the osd 
process may have been accumulating open sockets.

I’m still gathering information, but based on that is there anything specific 
that would be helpful to find the problem?

Thanks,
Phil

> On Apr 24, 2017, at 5:01 PM, Jason Dillaman  wrote:
> 
> Just to cover all the bases, is 192.168.206.13:6804 really associated
> with a running daemon for OSD 11?
> 
> On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute
>  wrote:
>> Jason,
>> 
>> Thanks for the suggestion.  That seems to show it is not the OSD that got
>> stuck:
>> 
>> ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
>> …
>> 2017-04-24 13:13:49.761076 7f739aefc700  1 -- 192.168.206.17:0/1250293899
>> --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
>> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
>> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
>> 0x7f737c0064e0
>> …
>> 2017-04-24 13:14:04.756328 7f73a2880700  1 -- 192.168.206.17:0/1250293899
>> --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
>> 0x7f737c0064e0
>> 
>> ceph0:~$ sudo ceph pg map 1.af6f1e38
>> osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
>> 
>> ceph3:~$ sudo ceph daemon osd.11 ops
>> {
>>"ops": [],
>>"num_ops": 0
>> }
>> 
>> I repeated this a few times and it’s always the same command and same
>> placement group that hangs, but OSD11 has no ops (and neither do OSD16 and
>> OSD2, although I think that’s expected).
>> 
>> Is there other tracing I should do on the OSD or something more to look at
>> on the client?
>> 
>> Thanks,
>> Phil
>> 
>> On Apr 24, 2017, at 12:39 PM, Jason Dillaman  wrote:
>> 
>> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
>>  wrote:
>> 
>> 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
>> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
>> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
>> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
>> 0x7f54f40064e0
>> 
>> 
>> 
>> You can attempt to run "ceph daemon osd.XYZ ops" against the
>> potentially stuck OSD to figure out what it's stuck doing.
>> 
>> --
>> Jason
>> 
>> 
> 
> 
> 
> -- 
> Jason



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
I would double-check your file descriptor limits on both sides -- OSDs
and the client. 131 sockets shouldn't make a difference. Port is open
on any possible firewalls you have running?

On Mon, Apr 24, 2017 at 8:14 PM, Phil Lacroute
 wrote:
> Yes it is the correct IP and port:
>
> ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804
> tcp0  0 192.168.206.13:6804 0.0.0.0:*   LISTEN
> 22934/ceph-osd
>
> I turned up the logging on the osd and I don’t think it received the
> request.  However I also noticed a large number of TCP connections to that
> specific osd from the client (192.168.206.17) in CLOSE_WAIT state (131 to be
> exact).  I think there may be a bug causing the osd not to close file
> descriptors.  Prior to the hang I had been running tests continuously for
> several days so the osd process may have been accumulating open sockets.
>
> I’m still gathering information, but based on that is there anything
> specific that would be helpful to find the problem?
>
> Thanks,
> Phil
>
> On Apr 24, 2017, at 5:01 PM, Jason Dillaman  wrote:
>
> Just to cover all the bases, is 192.168.206.13:6804 really associated
> with a running daemon for OSD 11?
>
> On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute
>  wrote:
>
> Jason,
>
> Thanks for the suggestion.  That seems to show it is not the OSD that got
> stuck:
>
> ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
> …
> 2017-04-24 13:13:49.761076 7f739aefc700  1 -- 192.168.206.17:0/1250293899
> --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
> 0x7f737c0064e0
> …
> 2017-04-24 13:14:04.756328 7f73a2880700  1 -- 192.168.206.17:0/1250293899
> --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
> 0x7f737c0064e0
>
> ceph0:~$ sudo ceph pg map 1.af6f1e38
> osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
>
> ceph3:~$ sudo ceph daemon osd.11 ops
> {
>"ops": [],
>"num_ops": 0
> }
>
> I repeated this a few times and it’s always the same command and same
> placement group that hangs, but OSD11 has no ops (and neither do OSD16 and
> OSD2, although I think that’s expected).
>
> Is there other tracing I should do on the OSD or something more to look at
> on the client?
>
> Thanks,
> Phil
>
> On Apr 24, 2017, at 12:39 PM, Jason Dillaman  wrote:
>
> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
>  wrote:
>
> 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
> 0x7f54f40064e0
>
>
>
> You can attempt to run "ceph daemon osd.XYZ ops" against the
> potentially stuck OSD to figure out what it's stuck doing.
>
> --
> Jason
>
>
>
>
>
> --
> Jason
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph built from source, can't start ceph-mon

2017-04-24 Thread Henry Ngo
Anyone?

On Sat, Apr 22, 2017 at 12:33 PM, Henry Ngo  wrote:

> I followed the install doc however after deploying the monitor, the doc
> states to start the mon using Upstart. I learned through digging around
> that the Upstart package is not installed using Make Install so it won't
> work. I tried running "ceph-mon -i [host]" and it gives an error. Any ideas?
>
> http://paste.openstack.org/show/607588/
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All osd slow response / blocked requests upon single disk failure

2017-04-24 Thread Syahrul Sazli Shaharir
Dear ceph users,

I am running the following setup:-
- 6 x osd servers (centos 7, mostly HP DL180se G6 with SA P410 controllers)
- Each osd server has 1-2 SSD journals, each handling ~5 7.2k SATA RE disks
- ceph-0.94.10

Normal operations work OK, however when a single disk failed (or
abrupt 'ceph osd down'), all osds other than the ones inside the
downed osd experienced slow response and blocked requests (some more
than others). For example:-

2017-04-24 15:59:58.734235 7f2a62338700  0 log_channel(cluster) log
[WRN] : slow request 30.571582 seconds old, received at 2017-04-24
15:59:28.162572: osd_op(client.11870166.0:118068448
rbd_data.42d93b436c6125.0577 [sparse-read 8192~4096]
1.a6422b98 ack+read e48964) currently reached_pg
2017-04-24 15:59:58.734241 7f2a62338700  0 log_channel(cluster) log
[WRN] : slow request 30.569605 seconds old, received at 2017-04-24
15:59:28.164550: osd_op(client.11870166.0:118068449
rbd_data.42d93b436c6125.0577 [sparse-read 40960~8192]
1.a6422b98 ack+read e48964) currently reached_pg


In contrast, a normal planned 'ceph osd in' or 'ceph osd out' from a
healthy state work OK and doesn't block requests.

References:-
- ceph osd tree (osd.34 @ osd10 down) : https://pastebin.com/s1AaNJM1
- ceph -s (when healthy): https://pastebin.com/h0NLgbG0
- osd cluster performance during rebuild @ 15:45 - 17:30 :
https://imagebin.ca/v/3KEsK0pGeOR3
- osd cluster i/o wait during rebuild @ 15:45 - 17:30 :
https://imagebin.ca/v/3KErkQ4KC8sv

So far I have tried reducing rebuild priority as follows, but to no avail:-
ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'

Is this a case of some slow osd dragging others? Or my setup /
hardware is substandard? Any pointers on what I should look into next,
would be greatly appreciated - thanks.

-- 
--sazli
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken

2017-04-24 Thread Nathan Cutler

Hi Xiaoxi


 Just wanna to confirm again, according to the definition of
"LTS" in ceph, Hammer suppose not EOL till Luminous is released,


This is correct.


before that, can we expecting  hammer upgrades and packages on
Precise/Other old OS will still be provided?

  We have all our server side ceph cluster on Jewel but the pain
point is there are still a few thousands hypervisors still on Ubuntu
12.04 , thus have to maintain hammer for these old stuffs.


Luminous release (and, hence, hammer EOL) is very close. Now would be a 
good time to test the upgrade and let us know which hammer fixes you 
need, if any.


Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com