Re: [ceph-users] firefly timing

2014-03-18 Thread Stefan Priebe - Profihost AG
Hi Sage,

i really would like to test the tiering. Is there any detailed
documentation about it and how it works?

Greets,
Stefan

Am 18.03.2014 05:45, schrieb Sage Weil:
> Hi everyone,
> 
> It's taken longer than expected, but the tests for v0.78 are calming down 
> and it looks like we'll be able to get the release out this week.
> 
> However, we've decided NOT to make this release firefly.  It will be a 
> normal development release.  This will be the first release that includes 
> some key new functionality (erasure coding and cache tiering) and although 
> it is passing our tests we'd like to have some operational experience with 
> it in more users' hands before we commit to supporting it long term.
> 
> The tentative plan is to freeze and then release v0.79 after a normal two 
> week cycle.  This will serve as a 'release candidate' that shaves off a 
> few rough edges from the pending release (including some improvements with 
> the API for setting up erasure coded pools).  It is possible that 0.79 
> will turn into firefly, but more likely that we will opt for another two 
> weeks of hardening and make 0.80 the release we name firefly and maintain 
> for the long term.
> 
> Long story short: 0.78 will be out soon, and you should test it!  It is 
> will vary from the final firefly in a few subtle ways, but any feedback or 
> usability and bug reports at this point will be very helpful in shaping 
> things.
> 
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Ирек Фасихов
I'm ready to test the tiering.


2014-03-18 11:07 GMT+04:00 Stefan Priebe - Profihost AG <
s.pri...@profihost.ag>:

> Hi Sage,
>
> i really would like to test the tiering. Is there any detailed
> documentation about it and how it works?
>
> Greets,
> Stefan
>
> Am 18.03.2014 05:45, schrieb Sage Weil:
> > Hi everyone,
> >
> > It's taken longer than expected, but the tests for v0.78 are calming down
> > and it looks like we'll be able to get the release out this week.
> >
> > However, we've decided NOT to make this release firefly.  It will be a
> > normal development release.  This will be the first release that includes
> > some key new functionality (erasure coding and cache tiering) and
> although
> > it is passing our tests we'd like to have some operational experience
> with
> > it in more users' hands before we commit to supporting it long term.
> >
> > The tentative plan is to freeze and then release v0.79 after a normal two
> > week cycle.  This will serve as a 'release candidate' that shaves off a
> > few rough edges from the pending release (including some improvements
> with
> > the API for setting up erasure coded pools).  It is possible that 0.79
> > will turn into firefly, but more likely that we will opt for another two
> > weeks of hardening and make 0.80 the release we name firefly and maintain
> > for the long term.
> >
> > Long story short: 0.78 will be out soon, and you should test it!  It is
> > will vary from the final firefly in a few subtle ways, but any feedback
> or
> > usability and bug reports at this point will be very helpful in shaping
> > things.
> >
> > Thanks!
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Alexandre DERUMIER

Hi Stefan, 

http://ceph.com/docs/master/dev/cache-pool/ 




- Mail original -

De: "Stefan Priebe - Profihost AG"  
À: "Sage Weil" , ceph-de...@vger.kernel.org 
Cc: ceph-us...@ceph.com 
Envoyé: Mardi 18 Mars 2014 08:07:19 
Objet: Re: [ceph-users] firefly timing 

Hi Sage, 

i really would like to test the tiering. Is there any detailed 
documentation about it and how it works? 

Greets, 
Stefan 

Am 18.03.2014 05:45, schrieb Sage Weil: 
> Hi everyone, 
> 
> It's taken longer than expected, but the tests for v0.78 are calming down 
> and it looks like we'll be able to get the release out this week. 
> 
> However, we've decided NOT to make this release firefly. It will be a 
> normal development release. This will be the first release that includes 
> some key new functionality (erasure coding and cache tiering) and although 
> it is passing our tests we'd like to have some operational experience with 
> it in more users' hands before we commit to supporting it long term. 
> 
> The tentative plan is to freeze and then release v0.79 after a normal two 
> week cycle. This will serve as a 'release candidate' that shaves off a 
> few rough edges from the pending release (including some improvements with 
> the API for setting up erasure coded pools). It is possible that 0.79 
> will turn into firefly, but more likely that we will opt for another two 
> weeks of hardening and make 0.80 the release we name firefly and maintain 
> for the long term. 
> 
> Long story short: 0.78 will be out soon, and you should test it! It is 
> will vary from the final firefly in a few subtle ways, but any feedback or 
> usability and bug reports at this point will be very helpful in shaping 
> things. 
> 
> Thanks! 
> sage 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majord...@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Please Help

2014-03-18 Thread Alfredo Deza
On Tue, Mar 18, 2014 at 1:22 AM, Ashraful Arefeen
 wrote:
> Hi,
> I want to use ceph for testing purpose. While setting the system up I have
> faced some problem with keyrings. Whenever I ran this command from my admin
> node (ceph-deploy gatherkeys node01) I got these warning.
>
> [ceph_deploy.gatherkeys][WARNIN] Unable to find
> /etc/ceph/ceph.client.admin.keyring on ['node01']
> [ceph_deploy.gatherkeys][WARNIN] Unable to find
> /var/lib/ceph/bootstrap-osd/ceph.keyring on ['node01']
> [ceph_deploy.gatherkeys][WARNIN] Unable to find
> /var/lib/ceph/bootstrap-mds/ceph.keyring on ['node01']

How are you getting to this point? with `gatherkeys` ? I strongly
suggest to use `mon create-initial` which
will do that for you if your monitors have formed quorum.

This behavior of not being able to gather the keys might be because
your monitors never formed quorum and so
could never generate the keys.

Also, what version of ceph-deploy are you using?

Have you tried this from scratch or have you just tried to re-deploy
the monitors again on the same hosts?

I suggest blowing all your installation and start from scratch if
possible. Run both purge and purgadata (in that order). Install
again and re-do all the steps.

>
> I have 3 nodes apart from the admin one. Node01 is used for setting up the
> monitor and node02, node03 are used for osds. I am unable to run the prepare
> command for osds also. Whenever I run the prepare command it asked me to run
> the gatherkeys command as the keyrings are missing in the admin node.
>
> How to solve this problem. Until these commands everything goes pretty well.
> Moreover I am using Emperor version of ceph and my operating system is
> ubuntu 12.04. Please help me.
>
> --
> Ashraful Arefeen
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS replaying journal

2014-03-18 Thread John Spray
If you have the logs from the time when "something happened between
the MDS and the client" then please send them along.  The bug where
SessionMap write failures were ignored is just a theory based on the
information available -- errors from before the point where replay
started to fail could give us a clearer idea.

Cheers,
John




On Mon, Mar 17, 2014 at 10:26 PM, Luke Jing Yuan  wrote:
> Hi John,
>
> Thanks for the info and the instructions to solve the problem. However how 
> can this bug be triggered the first place. In our search through the logs, we 
> noticed something happened between the MDS and the client before the Silat 
> error messages started to pop up.
>
> Regards,
> Luke
>
>> On Mar 18, 2014, at 5:12 AM, "John Spray"  wrote:
>>
>> Thanks for sending the logs so quickly.
>>
>> 626 2014-03-18 00:58:01.009623 7fba5cbbe700 10 mds.0.journal
>> EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7235981 prealloc
>> [141df86~1] used 141db9e
>> 627 2014-03-18 00:58:01.009627 7fba5cbbe700 20 mds.0.journal  (session
>> prealloc [1373451~3e8])
>> 628 2014-03-18 00:58:01.010696 7fba5cbbe700 -1 mds/journal.cc: In
>> function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)'
>> thread 7fba5cbbe700 time 2014-03-18 00:58:01.009644
>>
>> The first line indicates that the version of SessionMap loaded from
>> disk is 7235981 while the version updated in the journal is 8632368.
>> The difference is much larger than one would expect, as we are only a
>> few events into the journal at the point of the failure.  The
>> assertion is checking that the inode claimed by the journal is in the
>> range allocated to the client session, and it is failing because the
>> stale sessionmap version is in use.
>>
>> In version 0.72.2, there was a bug in the MDS that caused failures to
>> write the SessionMap object to disk to be ignored.  This could result
>> in a situation where there is an inconsistency between the contents of
>> the log and the contents of the SessionMap object.  A check was added
>> to avoid this in the latest code (b0dce8a0)
>>
>> In a future release we will be adding tools for repairing damaged
>> systems in cases like this, but at the moment your options are quite
>> limited.
>> * If the data is replaceable then you might simply use "ceph mds
>> newfs" to start from scratch.
>> * If you can cope with losing some of the most recent modifications
>> but keeping most of the filesystem, you could try the experimental
>> journal reset function:
>> ceph-mds -i mon0 -d  --reset-journal 0
>>   This is destructive: it will discard any metadata updates that have
>> been written to the journal but not to the backing store.  However, it
>> is less destructive than newfs.  It may crash when it completes, look
>> for output like this at the beginning before any stack trace to
>> indicate success:
>>   writing journal head
>>   writing EResetJournal entry
>>   done
>>
>> We are looking forward to making the MDS and associated tools more
>> resilient ahead of making the filesystem a fully supported part of
>> ceph.
>>
>> John
>>
>>> On Mon, Mar 17, 2014 at 5:09 PM, Luke Jing Yuan  wrote:
>>> Hi John,
>>>
>>> Thanks for responding to our issues, attached is the ceph.log file as per 
>>> request. As for the ceph-mds.log, I will have to send it in 3 parts later 
>>> due to our SMTP server's policy.
>>>
>>> Regards,
>>> Luke
>>>
>>> -Original Message-
>>> From: ceph-users-boun...@lists.ceph.com 
>>> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John Spray
>>> Sent: Tuesday, 18 March, 2014 12:57 AM
>>> To: Wong Ming Tat
>>> Cc: ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] Ceph MDS replaying journal
>>>
>>> Clarification: in step 1, stop the MDS service on *all* MDS servers (I 
>>> notice there are standby daemons in the "ceph status" output).
>>>
>>> John
>>>
 On Mon, Mar 17, 2014 at 4:45 PM, John Spray  wrote:
 Hello,

 To understand what's gone wrong here, we'll need to increase the
 verbosity of the logging from the MDS service and then trying starting
 it again.

 1. Stop the MDS service (on ubuntu this would be "stop ceph-mds-all")
 2. Move your old log file away so that we will have a fresh one mv
 /var/log/ceph/ceph-mds.mon01.log /var/log/ceph/ceph-mds.mon01.log.old
 3. Start the mds service manually (so that it just tries once instead
 of flapping):
 ceph-mds -i mon01 -f --debug-mds=20 --debug-journaler=10

 The resulting log file may be quite big so you may want to gzip it
 before sending it to the list.

 In addition to the MDS log, please attach your cluster log
 (/var/log/ceph/ceph.log).

 Thanks,
 John

> On Mon, Mar 17, 2014 at 7:02 AM, Wong Ming Tat  wrote:
> Hi,
>
>
>
> I receive the MDS replaying journal error as below.
>
> Hope anyone can give some information to solve this problem.
>
>
>
> # ceph healt

Re: [ceph-users] firefly timing

2014-03-18 Thread Mark Nelson

On 03/18/2014 02:07 AM, Stefan Priebe - Profihost AG wrote:

Hi Sage,

i really would like to test the tiering. Is there any detailed
documentation about it and how it works?


Just for a simple test, you can start out with something approximately like:

# Create the pools (might want to do something more sophisticated)
ceph osd pool create cache 1024
ceph osd pool create base 1024
ceph osd pool set cache size 2
ceph osd pool set base size 3

# Add cache pool to base pool
ceph osd tier add base cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay base cache

# Set various tunables
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 8
ceph osd pool set cache hit_set_period 60
ceph osd pool set cache target_max_objects 102400



Greets,
Stefan

Am 18.03.2014 05:45, schrieb Sage Weil:

Hi everyone,

It's taken longer than expected, but the tests for v0.78 are calming down
and it looks like we'll be able to get the release out this week.

However, we've decided NOT to make this release firefly.  It will be a
normal development release.  This will be the first release that includes
some key new functionality (erasure coding and cache tiering) and although
it is passing our tests we'd like to have some operational experience with
it in more users' hands before we commit to supporting it long term.

The tentative plan is to freeze and then release v0.79 after a normal two
week cycle.  This will serve as a 'release candidate' that shaves off a
few rough edges from the pending release (including some improvements with
the API for setting up erasure coded pools).  It is possible that 0.79
will turn into firefly, but more likely that we will opt for another two
weeks of hardening and make 0.80 the release we name firefly and maintain
for the long term.

Long story short: 0.78 will be out soon, and you should test it!  It is
will vary from the final firefly in a few subtle ways, but any feedback or
usability and bug reports at this point will be very helpful in shaping
things.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

2014-03-18 Thread Quenten Grasso
Hi All,

I'm trying to troubleshoot a strange issue with my Ceph cluster.

We're Running Ceph Version 0.72.2
All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS 
Drives and 2 x 100GB Intel DC S3700 SSD's for Journals.
All Pools have a replica of 2 or better. I.e. metadata replica of 3.

I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on a 
single node (any node) the load average of that node shoots up to 230+ and the 
whole cluster starts blocking IO requests until it settles down and its fine 
again.

Any ideas on why the load average goes so crazy & starts to block IO?



[osd]
osd data = /var/ceph/osd.$id
osd journal size = 15000
osd mkfs type = xfs
osd mkfs options xfs = "-i size=2048 -f"
osd mount options xfs = 
"rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k"
osd max backfills = 5
osd recovery max active = 3

[osd.0]
host = pbnerbd01
public addr = 10.100.96.10
cluster addr = 10.100.128.10
osd journal = 
/dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1
devs = /dev/sda4


Thanks,
Quenten

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] integration of keystone with radosgw not working

2014-03-18 Thread Ashish Chandra

Hi everyone,

I am trying to integrate Openstack Keystone with radosgw using the doc :

http://ceph.com/docs/master/radosgw/config/#integrating-with-openstack-keystone

I have made all the necessary changes and was successfully able to use 
swift client to connect and use the Ceph Object Gateway via 
Swift-compatible API.


But, issue arises when I want to use Keystone as my authenticationg 
mechanism.


I have created keystone service and endpoint.

But while running the command :

openssl x509 -in /etc/keystone/ssl/certs/ca.pem -pubkey | certutil -d /var/lib/ceph/nss 
-A -n ca -t "TCu,Cu,Tuw"

gives me error as:

certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The 
certificate/key database is in an old, unsupported format.


Here is my ceph.conf:

[global]
fsid = 30040254-7177-4a08-8d31-9be2a8b4bac7
mon_initial_members = ceph-node1
mon_host = 10.0.1.11
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

[client.radosgw.gateway]
host = ceph-node1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log
rgw keystone url = http://10.0.1.11:35357
rgw keystone admin token = ashish
rgw keystone accepted roles = admin, Member
rgw keystone token cache size = 100
rgw keystone revocation interval = 300
rgw s3 auth use keystone = true
nss db path = /var/lib/ceph/nss

Please let me know what I could be doing wrong.

Thanks and Regards
Ashish Chandra
Openstack Developer, Cloud Engineering
Reliance Jio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon servers

2014-03-18 Thread Jonathan Gowar
The cluster is not functioning without mon servers, and I've not the
technical ability to fix it.

So, in the absense of a fix in it's current state, how can I wipe all
mon stuff and start again?  It's a test system, with no data on the
cluster.  I'd just like something working again, what's the best way to
achieve that?

Kind regards,
Jon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon servers

2014-03-18 Thread Alfredo Deza
On Tue, Mar 18, 2014 at 8:55 AM, Jonathan Gowar  wrote:
> The cluster is not functioning without mon servers, and I've not the
> technical ability to fix it.
>
> So, in the absense of a fix in it's current state, how can I wipe all
> mon stuff and start again?  It's a test system, with no data on the
> cluster.  I'd just like something working again, what's the best way to
> achieve that?

With ceph-deploy you would do the following (keep in mind this gets
rid of all data as well):

ceph-deploy purge {nodes}
ceph-deploy purgedata {nodes}


>
> Kind regards,
> Jon
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon servers

2014-03-18 Thread Jonathan Gowar
On Tue, 2014-03-18 at 09:14 -0400, Alfredo Deza wrote:
> With ceph-deploy you would do the following (keep in mind this gets
> rid of all data as well):
> 
> ceph-deploy purge {nodes}
> ceph-deploy purgedata {nodes}

Awesome!  Nice new clean cluster, with all the rights bits :)

Thanks for the assist.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Sage Weil
On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> i really would like to test the tiering. Is there any detailed
> documentation about it and how it works?

Great!  Here is a quick synopiss on how to set it up:

http://ceph.com/docs/master/dev/cache-pool/

sage



> 
> Greets,
> Stefan
> 
> Am 18.03.2014 05:45, schrieb Sage Weil:
> > Hi everyone,
> > 
> > It's taken longer than expected, but the tests for v0.78 are calming down 
> > and it looks like we'll be able to get the release out this week.
> > 
> > However, we've decided NOT to make this release firefly.  It will be a 
> > normal development release.  This will be the first release that includes 
> > some key new functionality (erasure coding and cache tiering) and although 
> > it is passing our tests we'd like to have some operational experience with 
> > it in more users' hands before we commit to supporting it long term.
> > 
> > The tentative plan is to freeze and then release v0.79 after a normal two 
> > week cycle.  This will serve as a 'release candidate' that shaves off a 
> > few rough edges from the pending release (including some improvements with 
> > the API for setting up erasure coded pools).  It is possible that 0.79 
> > will turn into firefly, but more likely that we will opt for another two 
> > weeks of hardening and make 0.80 the release we name firefly and maintain 
> > for the long term.
> > 
> > Long story short: 0.78 will be out soon, and you should test it!  It is 
> > will vary from the final firefly in a few subtle ways, but any feedback or 
> > usability and bug reports at this point will be very helpful in shaping 
> > things.
> > 
> > Thanks!
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Stefan Priebe - Profihost AG

> Am 18.03.2014 um 17:06 schrieb Sage Weil :
> 
>> On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>> 
>> i really would like to test the tiering. Is there any detailed
>> documentation about it and how it works?
> 
> Great!  Here is a quick synopiss on how to set it up:
> 
>http://ceph.com/docs/master/dev/cache-pool/

What I'm missing is a documentation about the cache settings?

> 
> sage
> 
> 
> 
>> 
>> Greets,
>> Stefan
>> 
>> Am 18.03.2014 05:45, schrieb Sage Weil:
>>> Hi everyone,
>>> 
>>> It's taken longer than expected, but the tests for v0.78 are calming down 
>>> and it looks like we'll be able to get the release out this week.
>>> 
>>> However, we've decided NOT to make this release firefly.  It will be a 
>>> normal development release.  This will be the first release that includes 
>>> some key new functionality (erasure coding and cache tiering) and although 
>>> it is passing our tests we'd like to have some operational experience with 
>>> it in more users' hands before we commit to supporting it long term.
>>> 
>>> The tentative plan is to freeze and then release v0.79 after a normal two 
>>> week cycle.  This will serve as a 'release candidate' that shaves off a 
>>> few rough edges from the pending release (including some improvements with 
>>> the API for setting up erasure coded pools).  It is possible that 0.79 
>>> will turn into firefly, but more likely that we will opt for another two 
>>> weeks of hardening and make 0.80 the release we name firefly and maintain 
>>> for the long term.
>>> 
>>> Long story short: 0.78 will be out soon, and you should test it!  It is 
>>> will vary from the final firefly in a few subtle ways, but any feedback or 
>>> usability and bug reports at this point will be very helpful in shaping 
>>> things.
>>> 
>>> Thanks!
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Milosz Tanski
Is this statement in the documentation still valid: "Stale data is
expired from the cache pools based on some as-yet undetermined
policy." As that sounds a bit scary.

- Milosz

On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil  wrote:
> On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> i really would like to test the tiering. Is there any detailed
>> documentation about it and how it works?
>
> Great!  Here is a quick synopiss on how to set it up:
>
> http://ceph.com/docs/master/dev/cache-pool/
>
> sage
>
>
>
>>
>> Greets,
>> Stefan
>>
>> Am 18.03.2014 05:45, schrieb Sage Weil:
>> > Hi everyone,
>> >
>> > It's taken longer than expected, but the tests for v0.78 are calming down
>> > and it looks like we'll be able to get the release out this week.
>> >
>> > However, we've decided NOT to make this release firefly.  It will be a
>> > normal development release.  This will be the first release that includes
>> > some key new functionality (erasure coding and cache tiering) and although
>> > it is passing our tests we'd like to have some operational experience with
>> > it in more users' hands before we commit to supporting it long term.
>> >
>> > The tentative plan is to freeze and then release v0.79 after a normal two
>> > week cycle.  This will serve as a 'release candidate' that shaves off a
>> > few rough edges from the pending release (including some improvements with
>> > the API for setting up erasure coded pools).  It is possible that 0.79
>> > will turn into firefly, but more likely that we will opt for another two
>> > weeks of hardening and make 0.80 the release we name firefly and maintain
>> > for the long term.
>> >
>> > Long story short: 0.78 will be out soon, and you should test it!  It is
>> > will vary from the final firefly in a few subtle ways, but any feedback or
>> > usability and bug reports at this point will be very helpful in shaping
>> > things.
>> >
>> > Thanks!
>> > sage
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Sage Weil
On Tue, 18 Mar 2014, Milosz Tanski wrote:
> Is this statement in the documentation still valid: "Stale data is
> expired from the cache pools based on some as-yet undetermined
> policy." As that sounds a bit scary.

I'll update the docs :).  The policy is pretty simply but not described 
anywhere yet.

sage


> 
> - Milosz
> 
> On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil  wrote:
> > On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> i really would like to test the tiering. Is there any detailed
> >> documentation about it and how it works?
> >
> > Great!  Here is a quick synopiss on how to set it up:
> >
> > http://ceph.com/docs/master/dev/cache-pool/
> >
> > sage
> >
> >
> >
> >>
> >> Greets,
> >> Stefan
> >>
> >> Am 18.03.2014 05:45, schrieb Sage Weil:
> >> > Hi everyone,
> >> >
> >> > It's taken longer than expected, but the tests for v0.78 are calming down
> >> > and it looks like we'll be able to get the release out this week.
> >> >
> >> > However, we've decided NOT to make this release firefly.  It will be a
> >> > normal development release.  This will be the first release that includes
> >> > some key new functionality (erasure coding and cache tiering) and 
> >> > although
> >> > it is passing our tests we'd like to have some operational experience 
> >> > with
> >> > it in more users' hands before we commit to supporting it long term.
> >> >
> >> > The tentative plan is to freeze and then release v0.79 after a normal two
> >> > week cycle.  This will serve as a 'release candidate' that shaves off a
> >> > few rough edges from the pending release (including some improvements 
> >> > with
> >> > the API for setting up erasure coded pools).  It is possible that 0.79
> >> > will turn into firefly, but more likely that we will opt for another two
> >> > weeks of hardening and make 0.80 the release we name firefly and maintain
> >> > for the long term.
> >> >
> >> > Long story short: 0.78 will be out soon, and you should test it!  It is
> >> > will vary from the final firefly in a few subtle ways, but any feedback 
> >> > or
> >> > usability and bug reports at this point will be very helpful in shaping
> >> > things.
> >> >
> >> > Thanks!
> >> > sage
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> > the body of a message to majord...@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
> 
> p: 646-253-9055
> e: mil...@adfin.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS replaying journal

2014-03-18 Thread John Spray
Hi Luke,

(copying list back in)

You should stop all MDS services before attempting to use
--reset-journal (but make sure mons and OSDs are running).  The status
of the mds map shouldn't make a difference.

John

On Tue, Mar 18, 2014 at 5:23 PM, Luke Jing Yuan  wrote:
> Hi John,
>
> I noticed that while we was running with the --reset-journal option, ceph.log 
> keep show something like the following lines:
>
> 2014-03-19 01:17:09.977892 mon.0 10.4.118.21:6789/0 192 : [INF] mdsmap 
> e29851: 1/1/1 up {0=mon01=up:replay(laggy or crashed)}
>
> And the mdsmap epock just keep increasing, is this what we should be 
> expecting? Also should we consider using "ceph mds" command to fail the mds 
> before the running with --reset-journal.
>
> Apologize for being asking so many times. Thanks in advance.
>
> Regards,
> Luke
>
> -Original Message-
> From: John Spray [mailto:john.sp...@inktank.com]
> Sent: Tuesday, 18 March, 2014 8:12 PM
> To: Luke Jing Yuan
> Cc: Wong Ming Tat; Mohd Bazli Ab Karim
> Subject: Re: [ceph-users] Ceph MDS replaying journal
>
> That command should be almost instant, so it sounds like it has become stuck. 
>  Run with "-d --debug-mds=20" to get more output.  One way it can get stuck 
> is if the "-i" argument doesn't correspond to the host you're running on, in 
> which case it gets stuck trying to find keys.  I put "-i mon0" in the example 
> command because that looked like the host you were running on, but perhaps 
> you're running from somewhere else.
>
> I must emphasize that this is all very unsupported.  If you have data that is 
> critical for your users you should preferably restore it from backups.
>
> John
>
> On Tue, Mar 18, 2014 at 12:04 PM, Luke Jing Yuan  wrote:
>> Hi John,
>>
>> We are using the 2nd option you mentioned, but after more than 10hours of 
>> running we have no idea whether its working nor when it will complete. Are 
>> there any way for us to further monitor the progress? We dare not use the 
>> newfs option as there are data that are critical to our user. Kindly advice.
>>
>> Thanks.
>>
>> Regards,
>> Luke
>>
>> -Original Message-
>> From: ceph-users-boun...@lists.ceph.com
>> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Luke Jing Yuan
>> Sent: Tuesday, 18 March, 2014 2:33 PM
>> To: John Spray
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph MDS replaying journal
>>
>> Hi John,
>>
>> Is there a way for us to verify that step 2 is working properly? We are 
>> seeing the process running for almost 4 hours but there is no indication 
>> when it will end. Thanks.
>>
>> Regards,
>> Luke
>>
>> -Original Message-
>> From: John Spray [mailto:john.sp...@inktank.com]
>> Sent: Tuesday, 18 March, 2014 5:13 AM
>> To: Luke Jing Yuan
>> Cc: Wong Ming Tat; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph MDS replaying journal
>>
>> Thanks for sending the logs so quickly.
>>
>> 626 2014-03-18 00:58:01.009623 7fba5cbbe700 10 mds.0.journal 
>> EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7235981 prealloc
>> [141df86~1] used 141db9e
>> 627 2014-03-18 00:58:01.009627 7fba5cbbe700 20 mds.0.journal  (session
>> prealloc [1373451~3e8])
>> 628 2014-03-18 00:58:01.010696 7fba5cbbe700 -1 mds/journal.cc: In function 
>> 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)'
>> thread 7fba5cbbe700 time 2014-03-18 00:58:01.009644
>>
>> The first line indicates that the version of SessionMap loaded from disk is 
>> 7235981 while the version updated in the journal is 8632368.
>> The difference is much larger than one would expect, as we are only a few 
>> events into the journal at the point of the failure.  The assertion is 
>> checking that the inode claimed by the journal is in the range allocated to 
>> the client session, and it is failing because the stale sessionmap version 
>> is in use.
>>
>> In version 0.72.2, there was a bug in the MDS that caused failures to
>> write the SessionMap object to disk to be ignored.  This could result
>> in a situation where there is an inconsistency between the contents of
>> the log and the contents of the SessionMap object.  A check was added
>> to avoid this in the latest code (b0dce8a0)
>>
>> In a future release we will be adding tools for repairing damaged systems in 
>> cases like this, but at the moment your options are quite limited.
>>  * If the data is replaceable then you might simply use "ceph mds newfs" to 
>> start from scratch.
>>  * If you can cope with losing some of the most recent modifications but 
>> keeping most of the filesystem, you could try the experimental journal reset 
>> function:
>>  ceph-mds -i mon0 -d  --reset-journal 0
>>This is destructive: it will discard any metadata updates that have been 
>> written to the journal but not to the backing store.  However, it is less 
>> destructive than newfs.  It may crash when it completes, look for output 
>> like this at the beginning before any stack trace to indicate succe

[ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread Matt . Latter

I am a novice ceph user creating a simple 4 OSD default cluster (initially)
and experimenting with RADOS BENCH to understand basic HDD (OSD)
performance. Each interation of rados bench -p data I want the cluster OSDs
in initial state  i.e. 0 objects . I assumed the easiest way was to remove
and re-create the data pool each time.

While this appears to work , when I run ceph -s it shows me the pool count
is incrementing each time:

matt@redstar9:~$ sudo ceph -s
cluster c677f4c3-46a5-4ae1-b8aa-b070326c3b24
 health HEALTH_WARN clock skew detected on mon.redstar10, mon.redstar11
 monmap e1: 3 mons at
{redstar10=192.168.5.40:6789/0,redstar11=192.168.5.41:6789/0,redstar9=192.168.5.39:6789/0},
 election epoch 6, quorum 0,1,2 redstar10,redstar11,redstar9
 osdmap e52: 4 osds: 4 up, 4 in
  pgmap v5240: 136 pgs, 14 pools, 768 MB data, 194 objects
1697 MB used, 14875 GB / 14876 GB avail
 136 active+clean


even though lspools still only shows me the 3 default pools (metadata, rbd,
data)

Is this a bug, AND/OR, is there a better way to zero my cluster for these
experiments?

Thanks,

Matt Latter

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread John Spray
Hi Matt,

This is expected behaviour: pool IDs are not reused.

Cheers,
John

On Tue, Mar 18, 2014 at 6:53 PM,   wrote:
>
> I am a novice ceph user creating a simple 4 OSD default cluster (initially)
> and experimenting with RADOS BENCH to understand basic HDD (OSD)
> performance. Each interation of rados bench -p data I want the cluster OSDs
> in initial state  i.e. 0 objects . I assumed the easiest way was to remove
> and re-create the data pool each time.
>
> While this appears to work , when I run ceph -s it shows me the pool count
> is incrementing each time:
>
> matt@redstar9:~$ sudo ceph -s
> cluster c677f4c3-46a5-4ae1-b8aa-b070326c3b24
>  health HEALTH_WARN clock skew detected on mon.redstar10, mon.redstar11
>  monmap e1: 3 mons at
> {redstar10=192.168.5.40:6789/0,redstar11=192.168.5.41:6789/0,redstar9=192.168.5.39:6789/0},
>  election epoch 6, quorum 0,1,2 redstar10,redstar11,redstar9
>  osdmap e52: 4 osds: 4 up, 4 in
>   pgmap v5240: 136 pgs, 14 pools, 768 MB data, 194 objects
> 1697 MB used, 14875 GB / 14876 GB avail
>  136 active+clean
>
>
> even though lspools still only shows me the 3 default pools (metadata, rbd,
> data)
>
> Is this a bug, AND/OR, is there a better way to zero my cluster for these
> experiments?
>
> Thanks,
>
> Matt Latter
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread McNamara, Bradley
What you are seeing is expected behavior.  Pool numbers do not get reused; they 
increment up.  Pool names can be reused once they are deleted.  One note, 
though, if you delete and recreate the data pool, and want to use cephfs, 
you'll need to run 'ceph mds newfs   
--yes-i-really-mean-it' before mounting it.

Brad

-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of matt.lat...@hgst.com
Sent: Tuesday, March 18, 2014 11:53 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Pool Count incrementing on each create even though I 
removed the pool each time


I am a novice ceph user creating a simple 4 OSD default cluster (initially) and 
experimenting with RADOS BENCH to understand basic HDD (OSD) performance. Each 
interation of rados bench -p data I want the cluster OSDs in initial state  
i.e. 0 objects . I assumed the easiest way was to remove and re-create the data 
pool each time.

While this appears to work , when I run ceph -s it shows me the pool count is 
incrementing each time:

matt@redstar9:~$ sudo ceph -s
cluster c677f4c3-46a5-4ae1-b8aa-b070326c3b24
 health HEALTH_WARN clock skew detected on mon.redstar10, mon.redstar11
 monmap e1: 3 mons at
{redstar10=192.168.5.40:6789/0,redstar11=192.168.5.41:6789/0,redstar9=192.168.5.39:6789/0},
 election epoch 6, quorum 0,1,2 redstar10,redstar11,redstar9
 osdmap e52: 4 osds: 4 up, 4 in
  pgmap v5240: 136 pgs, 14 pools, 768 MB data, 194 objects
1697 MB used, 14875 GB / 14876 GB avail
 136 active+clean


even though lspools still only shows me the 3 default pools (metadata, rbd,
data)

Is this a bug, AND/OR, is there a better way to zero my cluster for these 
experiments?

Thanks,

Matt Latter

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread John Spray
Belay that, I misread your mail and thought you were talking about the
counter used to assign IDs to new pools rather than the pool count
reported from the PG map.

John

On Tue, Mar 18, 2014 at 7:12 PM, John Spray  wrote:
> Hi Matt,
>
> This is expected behaviour: pool IDs are not reused.
>
> Cheers,
> John
>
> On Tue, Mar 18, 2014 at 6:53 PM,   wrote:
>>
>> I am a novice ceph user creating a simple 4 OSD default cluster (initially)
>> and experimenting with RADOS BENCH to understand basic HDD (OSD)
>> performance. Each interation of rados bench -p data I want the cluster OSDs
>> in initial state  i.e. 0 objects . I assumed the easiest way was to remove
>> and re-create the data pool each time.
>>
>> While this appears to work , when I run ceph -s it shows me the pool count
>> is incrementing each time:
>>
>> matt@redstar9:~$ sudo ceph -s
>> cluster c677f4c3-46a5-4ae1-b8aa-b070326c3b24
>>  health HEALTH_WARN clock skew detected on mon.redstar10, mon.redstar11
>>  monmap e1: 3 mons at
>> {redstar10=192.168.5.40:6789/0,redstar11=192.168.5.41:6789/0,redstar9=192.168.5.39:6789/0},
>>  election epoch 6, quorum 0,1,2 redstar10,redstar11,redstar9
>>  osdmap e52: 4 osds: 4 up, 4 in
>>   pgmap v5240: 136 pgs, 14 pools, 768 MB data, 194 objects
>> 1697 MB used, 14875 GB / 14876 GB avail
>>  136 active+clean
>>
>>
>> even though lspools still only shows me the 3 default pools (metadata, rbd,
>> data)
>>
>> Is this a bug, AND/OR, is there a better way to zero my cluster for these
>> experiments?
>>
>> Thanks,
>>
>> Matt Latter
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread Sage Weil
On Tue, 18 Mar 2014, John Spray wrote:
> Hi Matt,
> 
> This is expected behaviour: pool IDs are not reused.

The IDs go up, but I think the 'count' shown there should not.. i.e. 
num_pools != max_pool_id.  So probably a subtle bug, I expect in the 
print_summary or similar method in PGMonitor.cc?

sage

> 
> Cheers,
> John
> 
> On Tue, Mar 18, 2014 at 6:53 PM,   wrote:
> >
> > I am a novice ceph user creating a simple 4 OSD default cluster (initially)
> > and experimenting with RADOS BENCH to understand basic HDD (OSD)
> > performance. Each interation of rados bench -p data I want the cluster OSDs
> > in initial state  i.e. 0 objects . I assumed the easiest way was to remove
> > and re-create the data pool each time.
> >
> > While this appears to work , when I run ceph -s it shows me the pool count
> > is incrementing each time:
> >
> > matt@redstar9:~$ sudo ceph -s
> > cluster c677f4c3-46a5-4ae1-b8aa-b070326c3b24
> >  health HEALTH_WARN clock skew detected on mon.redstar10, mon.redstar11
> >  monmap e1: 3 mons at
> > {redstar10=192.168.5.40:6789/0,redstar11=192.168.5.41:6789/0,redstar9=192.168.5.39:6789/0},
> >  election epoch 6, quorum 0,1,2 redstar10,redstar11,redstar9
> >  osdmap e52: 4 osds: 4 up, 4 in
> >   pgmap v5240: 136 pgs, 14 pools, 768 MB data, 194 objects
> > 1697 MB used, 14875 GB / 14876 GB avail
> >  136 active+clean
> >
> >
> > even though lspools still only shows me the 3 default pools (metadata, rbd,
> > data)
> >
> > Is this a bug, AND/OR, is there a better way to zero my cluster for these
> > experiments?
> >
> > Thanks,
> >
> > Matt Latter
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Count incrementing on each create even though I removed the pool each time

2014-03-18 Thread Gregory Farnum
On Tue, Mar 18, 2014 at 12:20 PM, Sage Weil  wrote:
> On Tue, 18 Mar 2014, John Spray wrote:
>> Hi Matt,
>>
>> This is expected behaviour: pool IDs are not reused.
>
> The IDs go up, but I think the 'count' shown there should not.. i.e.
> num_pools != max_pool_id.  So probably a subtle bug, I expect in the
> print_summary or similar method in PGMonitor.cc?

I had assumed this was the result of having PGs stick around after the
pool had been deleted, but in testing that's not the case -- the pool
count outlasts the PGs. And I didn't track it down to the actual bug,
but there is definitely code trying to remove pools from the
PGMap::pg_pool_sum map. :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Sage Weil
On Tue, 18 Mar 2014, Sage Weil wrote:
> On Tue, 18 Mar 2014, Milosz Tanski wrote:
> > Is this statement in the documentation still valid: "Stale data is
> > expired from the cache pools based on some as-yet undetermined
> > policy." As that sounds a bit scary.
> 
> I'll update the docs :).  The policy is pretty simply but not described 
> anywhere yet.

I've updated the doc; please let me know what is/isn't clear so we can 
make sure the final doc is useful.

John, we need to figure out where this is going to fit in the overall 
IA...

sage



> 
> sage
> 
> 
> > 
> > - Milosz
> > 
> > On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil  wrote:
> > > On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
> > >> Hi Sage,
> > >>
> > >> i really would like to test the tiering. Is there any detailed
> > >> documentation about it and how it works?
> > >
> > > Great!  Here is a quick synopiss on how to set it up:
> > >
> > > http://ceph.com/docs/master/dev/cache-pool/
> > >
> > > sage
> > >
> > >
> > >
> > >>
> > >> Greets,
> > >> Stefan
> > >>
> > >> Am 18.03.2014 05:45, schrieb Sage Weil:
> > >> > Hi everyone,
> > >> >
> > >> > It's taken longer than expected, but the tests for v0.78 are calming 
> > >> > down
> > >> > and it looks like we'll be able to get the release out this week.
> > >> >
> > >> > However, we've decided NOT to make this release firefly.  It will be a
> > >> > normal development release.  This will be the first release that 
> > >> > includes
> > >> > some key new functionality (erasure coding and cache tiering) and 
> > >> > although
> > >> > it is passing our tests we'd like to have some operational experience 
> > >> > with
> > >> > it in more users' hands before we commit to supporting it long term.
> > >> >
> > >> > The tentative plan is to freeze and then release v0.79 after a normal 
> > >> > two
> > >> > week cycle.  This will serve as a 'release candidate' that shaves off a
> > >> > few rough edges from the pending release (including some improvements 
> > >> > with
> > >> > the API for setting up erasure coded pools).  It is possible that 0.79
> > >> > will turn into firefly, but more likely that we will opt for another 
> > >> > two
> > >> > weeks of hardening and make 0.80 the release we name firefly and 
> > >> > maintain
> > >> > for the long term.
> > >> >
> > >> > Long story short: 0.78 will be out soon, and you should test it!  It is
> > >> > will vary from the final firefly in a few subtle ways, but any 
> > >> > feedback or
> > >> > usability and bug reports at this point will be very helpful in shaping
> > >> > things.
> > >> >
> > >> > Thanks!
> > >> > sage
> > >> > --
> > >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > >> > in
> > >> > the body of a message to majord...@vger.kernel.org
> > >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >> the body of a message to majord...@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>
> > >>
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > 
> > -- 
> > Milosz Tanski
> > CTO
> > 10 East 53rd Street, 37th floor
> > New York, NY 10022
> > 
> > p: 646-253-9055
> > e: mil...@adfin.com
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Karan Singh
Hello Everyone

I am looking forward to test new features of 0.78 , it would be nice if erasure 
coding and tiering implementation notes available in ceph documentation.

Ceph documentation is in a good shape and its always nice to follow.

A humble request to add erasure coding and tiering in documentation if its not 
already in your pipeline. 


Karan Singh
CSC - IT Center for Science Ltd.
P.O. Box 405, FI-02101 Espoo, FINLAND
http://www.csc.fi/ | +358 (0) 503 812758

On 18 Mar 2014, at 09:13, Ирек Фасихов  wrote:

> I'm ready to test the tiering.
> 
> 
> 2014-03-18 11:07 GMT+04:00 Stefan Priebe - Profihost AG 
> :
> Hi Sage,
> 
> i really would like to test the tiering. Is there any detailed
> documentation about it and how it works?
> 
> Greets,
> Stefan
> 
> Am 18.03.2014 05:45, schrieb Sage Weil:
> > Hi everyone,
> >
> > It's taken longer than expected, but the tests for v0.78 are calming down
> > and it looks like we'll be able to get the release out this week.
> >
> > However, we've decided NOT to make this release firefly.  It will be a
> > normal development release.  This will be the first release that includes
> > some key new functionality (erasure coding and cache tiering) and although
> > it is passing our tests we'd like to have some operational experience with
> > it in more users' hands before we commit to supporting it long term.
> >
> > The tentative plan is to freeze and then release v0.79 after a normal two
> > week cycle.  This will serve as a 'release candidate' that shaves off a
> > few rough edges from the pending release (including some improvements with
> > the API for setting up erasure coded pools).  It is possible that 0.79
> > will turn into firefly, but more likely that we will opt for another two
> > weeks of hardening and make 0.80 the release we name firefly and maintain
> > for the long term.
> >
> > Long story short: 0.78 will be out soon, and you should test it!  It is
> > will vary from the final firefly in a few subtle ways, but any feedback or
> > usability and bug reports at this point will be very helpful in shaping
> > things.
> >
> > Thanks!
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

2014-03-18 Thread Quenten Grasso
Hi All,

I'm trying to troubleshoot a strange issue with my Ceph cluster.

We're Running Ceph Version 0.72.2
All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS 
Drives and 2 x 100GB Intel DC S3700 SSD's for Journals.
All Pools have a replica of 2 or better. I.e. metadata replica of 3.

I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on a 
single node (any node) the load average of that node shoots up to 230+ and the 
whole cluster starts blocking IO requests until it settles down and its fine 
again.

Any ideas on why the load average goes so crazy & starts to block IO?



[osd]
osd data = /var/ceph/osd.$id
osd journal size = 15000
osd mkfs type = xfs
osd mkfs options xfs = "-i size=2048 -f"
osd mount options xfs = 
"rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k"
osd max backfills = 5
osd recovery max active = 3

[osd.0]
host = pbnerbd01
public addr = 10.100.96.10
cluster addr = 10.100.128.10
osd journal = 
/dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1
devs = /dev/sda4


Thanks,
Quenten

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Milosz Tanski
I think it's good now, explicit (and detailed).

On Tue, Mar 18, 2014 at 4:12 PM, Sage Weil  wrote:
> On Tue, 18 Mar 2014, Sage Weil wrote:
>> On Tue, 18 Mar 2014, Milosz Tanski wrote:
>> > Is this statement in the documentation still valid: "Stale data is
>> > expired from the cache pools based on some as-yet undetermined
>> > policy." As that sounds a bit scary.
>>
>> I'll update the docs :).  The policy is pretty simply but not described
>> anywhere yet.
>
> I've updated the doc; please let me know what is/isn't clear so we can
> make sure the final doc is useful.
>
> John, we need to figure out where this is going to fit in the overall
> IA...
>
> sage
>
>
>
>>
>> sage
>>
>>
>> >
>> > - Milosz
>> >
>> > On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil  wrote:
>> > > On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
>> > >> Hi Sage,
>> > >>
>> > >> i really would like to test the tiering. Is there any detailed
>> > >> documentation about it and how it works?
>> > >
>> > > Great!  Here is a quick synopiss on how to set it up:
>> > >
>> > > http://ceph.com/docs/master/dev/cache-pool/
>> > >
>> > > sage
>> > >
>> > >
>> > >
>> > >>
>> > >> Greets,
>> > >> Stefan
>> > >>
>> > >> Am 18.03.2014 05:45, schrieb Sage Weil:
>> > >> > Hi everyone,
>> > >> >
>> > >> > It's taken longer than expected, but the tests for v0.78 are calming 
>> > >> > down
>> > >> > and it looks like we'll be able to get the release out this week.
>> > >> >
>> > >> > However, we've decided NOT to make this release firefly.  It will be a
>> > >> > normal development release.  This will be the first release that 
>> > >> > includes
>> > >> > some key new functionality (erasure coding and cache tiering) and 
>> > >> > although
>> > >> > it is passing our tests we'd like to have some operational experience 
>> > >> > with
>> > >> > it in more users' hands before we commit to supporting it long term.
>> > >> >
>> > >> > The tentative plan is to freeze and then release v0.79 after a normal 
>> > >> > two
>> > >> > week cycle.  This will serve as a 'release candidate' that shaves off 
>> > >> > a
>> > >> > few rough edges from the pending release (including some improvements 
>> > >> > with
>> > >> > the API for setting up erasure coded pools).  It is possible that 0.79
>> > >> > will turn into firefly, but more likely that we will opt for another 
>> > >> > two
>> > >> > weeks of hardening and make 0.80 the release we name firefly and 
>> > >> > maintain
>> > >> > for the long term.
>> > >> >
>> > >> > Long story short: 0.78 will be out soon, and you should test it!  It 
>> > >> > is
>> > >> > will vary from the final firefly in a few subtle ways, but any 
>> > >> > feedback or
>> > >> > usability and bug reports at this point will be very helpful in 
>> > >> > shaping
>> > >> > things.
>> > >> >
>> > >> > Thanks!
>> > >> > sage
>> > >> > --
>> > >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> > >> > in
>> > >> > the body of a message to majord...@vger.kernel.org
>> > >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >> >
>> > >> --
>> > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > >> the body of a message to majord...@vger.kernel.org
>> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >>
>> > >>
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > the body of a message to majord...@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> > --
>> > Milosz Tanski
>> > CTO
>> > 10 East 53rd Street, 37th floor
>> > New York, NY 10022
>> >
>> > p: 646-253-9055
>> > e: mil...@adfin.com
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Replication

2014-03-18 Thread Craig Lewis
For the record, I have one bucket in my slave zone that caught up to the 
master zone.  I stopped adding new data to my first bucket, and 
replication stopped.  I started tickling the bucket by uploading and 
deleting a 0 byte file every 5 minutes.  Now the slave has all of the 
files in that bucket.


I didn't need to use --sync-scope=full .

I'm still importing faster than I can replicate, but I know how to deal 
with it now.  The master zone has nearly completed it's import.  Once 
that happens, replication should be able to catch up in a couple of 
weeks, and stay caught up.




Thanks for all the help!





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 2/7/14 11:38 , Craig Lewis wrote:

I have confirmed this in production, with the default max-entries.


I have a bucket that I'm no longer writing to.  Radosgw-agent had 
stopped replicating this bucket.  radosgw-admin bucket stats shows 
that the slave is missing ~600k objects.


I uploaded a 1 byte file to the bucket.  On the next pass, 
radosgw-agent replicated 1000 entries.



I'm uploading and deleting the same file every 5 minutes.  I'm using 
more inter-colo bandwidth now.  This bucket is catching up, slowly.



For now, I'm going to graph the delta of the total number of objects 
in both clusters.  If the slave is higher, it's catching up.  If it's 
lower, it's falling behind.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 2/6/14 18:32 , Craig Lewis wrote:

On 2/4/14 17:06 , Craig Lewis wrote:


Now that I've started seeing missing objects, I'm not able to 
download objects that should be on the slave if replication is up to 
date.  Either it's not up to date, or it's skipping objects every pass.




Using my --max-entries fix 
(https://github.com/ceph/radosgw-agent/pull/8), I think I see what's 
happening.



Shut down replication
Upload 6 objects to an empty bucket on the master:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test1.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test2.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test3.jpg
2014-02-07 02:0310k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test4.jpg
2014-02-07 02:0310k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test5.jpg

None show on the slave, because replication is down.

Start radosgw-agent --max-entries=2 (1 doesn't seem to replicate 
anything)

Check contents of slave after pass #1:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg


Check contents of slave after pass #10:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg


Leave replication running
Upload 1 object, test6.jpg, to the master.  Check the master:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test1.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test2.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test3.jpg
2014-02-07 02:0310k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test4.jpg
2014-02-07 02:0310k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test5.jpg
2014-02-07 02:0610k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test6.jpg


Check contents of slave after next pass:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test1.jpg


Upload another file, test7.jpg, to the master:
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test0.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test1.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test2.jpg
2014-02-07 02:0210k dc5674336e2212a0819b7abcb811e323  
s3://bucket1/test3.jpg
2014-02-07 02:0310k dc5674336e2212a0819b7abcb811e323  
s3://bucket1

Re: [ceph-users] why objects are still in .rgw.buckets after deleted

2014-03-18 Thread Craig Lewis
I recall hearing that RGW GC waits 2 hours before garbage collecting 
deleted chunks.


Take a look at https://ceph.com/docs/master/radosgw/config-ref/, the rgw 
gc * settings.  rgw gc obj min wait is 2 hours.






*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 3/16/14 23:20 , ljm??? wrote:


Hi all,

I have a question about the pool .rgw.buckets, when I upload a 
file(has been stripped because it is bigger than 4M) through swift 
API, it is stored in .rgw.buckets,


if I upload it again, why the objects in .rgw.buckets are not 
override? It is stored again and have different name. and when I 
delete the file, all of the objects in .rgw.buckets


aren't delete even though I execute radosgw-admin gc process.

I also want to know something about the pool created for object 
gateway, why are they created and which role they will play? If anyone 
know about these,


please give me a guide, thanks.

Thanks & Regards

Li JiaMin

System Cloud Platform

3#4F108



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS replaying journal

2014-03-18 Thread Luke Jing Yuan
Hi John and all,

On the matter of the stuck --reset-journal, we had enable the debug and saw the 
following:

2014-03-19 10:02:14.205646 7fd545180780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 24197
2014-03-19 10:02:14.207653 7fd545180780  1 -- 10.4.118.21:0/0 learned my addr 
10.4.118.21:0/0
2014-03-19 10:02:14.207669 7fd545180780  1 accepter.accepter.bind my_inst.addr 
is 10.4.118.21:6800/24197 need_addr=0
2014-03-19 10:02:14.207715 7fd545180780  0 resetting journal
2014-03-19 10:02:14.207774 7fd545180780  1 accepter.accepter.start
2014-03-19 10:02:14.207800 7fd545180780  1 -- 10.4.118.21:6800/24197 
messenger.start
2014-03-19 10:02:14.208167 7fd545180780  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x30e6000 con 
0x3102420
2014-03-19 10:02:14.209953 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 1  mon_map v1  485+0+0 (3841125561 0 0) 0x311c400 
con 0x3102420
2014-03-19 10:02:14.210115 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 2  auth_reply(proto 2 0 Success) v1  33+0+0 
(3223086224 0 0) 0x311c200 con 0x3102420
2014-03-19 10:02:14.210380 7fd540a06700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x30e6900 con 
0x3102420
2014-03-19 10:02:14.211411 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 3  auth_reply(proto 2 0 Success) v1  206+0+0 
(3976044864 0 0) 0x311c600 con 0x3102420
2014-03-19 10:02:14.211611 7fd540a06700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x30e6d80 con 
0x3102420
2014-03-19 10:02:14.212814 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 4  auth_reply(proto 2 0 Success) v1  580+0+0 
(3211663831 0 0) 0x311c400 con 0x3102420
2014-03-19 10:02:14.212996 7fd540a06700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x30f01c0 con 
0x3102420
2014-03-19 10:02:14.213084 7fd540a06700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x30e6b40 con 
0x3102420
2014-03-19 10:02:14.213276 7fd545180780  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x30f08c0 
con 0x3102420
2014-03-19 10:02:14.213760 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 5  mon_map v1  485+0+0 (3841125561 0 0) 0x311c800 
con 0x3102420
2014-03-19 10:02:14.213856 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (2112033603 0 
0) 0x30f01c0 con 0x3102420
2014-03-19 10:02:14.213930 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 7  auth_reply(proto 2 0 Success) v1  194+0+0 
(105581 0 0) 0x311c400 con 0x3102420
2014-03-19 10:02:14.215100 7fd540a06700  1 -- 10.4.118.21:6800/24197 <== mon.1 
10.4.118.22:6789/0 8  osd_map(28871..28871 src has 28357..28871) v3  
16928+0+0 (1452464160 0 0) 0x312 con 0x3102420
2014-03-19 10:04:47.211324 7fd53f203700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x30f0c40 
con 0x3102420
2014-03-19 10:04:57.211549 7fd53f203700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x30f0e00 
con 0x3102420
2014-03-19 10:05:07.211796 7fd53f203700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x313c8c0 
con 0x3102420
2014-03-19 10:05:17.212008 7fd53f203700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x313c700 
con 0x3102420
2014-03-19 10:05:27.212227 7fd53f203700  1 -- 10.4.118.21:6800/24197 --> 
10.4.118.22:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x313c540 
con 0x3102420


The process looked to have stuck at the osd_map line for like 2 minutes before 
the mon_subscribe lines came out. Would someone be kind enough to assist? 
Thanks in advance.

Regards,
Luke

-Original Message-
From: John Spray [mailto:john.sp...@inktank.com]
Sent: Wednesday, 19 March, 2014 1:33 AM
To: Luke Jing Yuan; ceph-users@lists.ceph.com
Cc: Wong Ming Tat; Mohd Bazli Ab Karim
Subject: Re: [ceph-users] Ceph MDS replaying journal

Hi Luke,

(copying list back in)

You should stop all MDS services before attempting to use --reset-journal (but 
make sure mons and OSDs are running).  The status of the mds map shouldn't make 
a difference.

John

On Tue, Mar 18, 2014 at 5:23 PM, Luke Jing Yuan  wrote:
> Hi John,
>
> I noticed that while we was running with the --reset-journal option, ceph.log 
> keep show something like the following lines:
>
> 2014-03-19 01:17:09.977892 mon.0 10.4.118.21:6789/0 192 : [INF] mdsmap
> e29851: 1/1/1 up {0=mon01=up:replay(laggy or crashed)}
>
> And t