[ceph-users] ?????? about rgw region sync

2015-05-08 Thread TERRY
I build two ceph clusters. 
for the first cluster, I do the follow steps
1??create pools
sudo ceph osd pool create .us-east.rgw.root 64  64
sudo ceph osd pool create .us-east.rgw.control 64 64
sudo ceph osd pool create .us-east.rgw.gc 64 64
sudo ceph osd pool create .us-east.rgw.buckets 64 64
sudo ceph osd pool create .us-east.rgw.buckets.index 64 64
sudo ceph osd pool create .us-east.rgw.buckets.extra 64 64
sudo ceph osd pool create .us-east.log 64 64
sudo ceph osd pool create .us-east.intent-log 64 64
sudo ceph osd pool create .us-east.usage 64 64
sudo ceph osd pool create .us-east.users 64 64
sudo ceph osd pool create .us-east.users.email 64 64
sudo ceph osd pool create .us-east.users.swift 64 64
sudo ceph osd pool create .us-east.users.uid 64 64
  
 2??create a keyring
sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring
sudo chmod +r /etc/ceph/ceph.client.radosgw.keyring
sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n 
client.radosgw.us-east-1 --gen-key
sudo ceph-authtool -n client.radosgw.us-east-1 --cap osd 'allow rwx' --cap mon 
'allow rwx' /etc/ceph
sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add 
client.radosgw.us-east-1 -i /etc/ceph/ceph.client.radosgw.keyring
 
3??create a region
sudo radosgw-admin region set --infile us.json --name client.radosgw.us-east-1
sudo radosgw-admin region default --rgw-region=us --name 
client.radosgw.us-east-1
sudo radosgw-admin regionmap update --name client.radosgw.us-east-1
   the content of us.json:
cat us.json 
{ "name": "us",
  "api_name": "us",
  "is_master": "true",
  "endpoints": [
"http:\/\/WH-CEPH-TEST01.MATRIX.CTRIPCORP.COM:80\/", 
"http:\/\/WH-CEPH-TEST02.MATRIX.CTRIPCORP.COM:80\/"],
  "master_zone": "us-east",
  "zones": [
{ "name": "us-east",
  "endpoints": [
"http:\/\/WH-CEPH-TEST01.MATRIX.CTRIPCORP.COM:80\/"],
  "log_meta": "true",
  "log_data": "true"},
{ "name": "us-west",
  "endpoints": [
"http:\/\/WH-CEPH-TEST02.MATRIX.CTRIPCORP.COM:80\/"],
  "log_meta": "true",
  "log_data": "true"}],
  "placement_targets": [
   {
 "name": "default-placement",
 "tags": []
   }
  ],
  "default_placement": "default-placement"}
 4??create zones
sudo radosgw-admin zone set --rgw-zone=us-east --infile us-east-secert.json 
--name client.radosgw.us-east-1
sudo radosgw-admin regionmap update --name client.radosgw.us-east-1
cat us-east-secert.json 
{ "domain_root": ".us-east.domain.rgw",
  "control_pool": ".us-east.rgw.control",
  "gc_pool": ".us-east.rgw.gc",
  "log_pool": ".us-east.log",
  "intent_log_pool": ".us-east.intent-log",
  "usage_log_pool": ".us-east.usage",
  "user_keys_pool": ".us-east.users",
  "user_email_pool": ".us-east.users.email",
  "user_swift_pool": ".us-east.users.swift",
  "user_uid_pool": ".us-east.users.uid",
  "system_key": { "access_key": "XNK0ST8WXTMWZGN29NF9", "secret_key": 
"7VJm8uAp71xKQZkjoPZmHu4sACA1SY8jTjay9dP5"},
  "placement_pools": [
{ "key": "default-placement",
  "val": { "index_pool": ".us-east.rgw.buckets.index",
   "data_pool": ".us-east.rgw.buckets"}
}
  ]
}
 
#5 Create Zone Users system user
sudo radosgw-admin user create --uid="us-east" --display-name="Region-US 
Zone-East" --name client.radosgw.us-east-1 --access_key="XNK0ST8WXTMWZGN29NF9" 
--secret="7VJm8uAp71xKQZkjoPZmHu4sACA1SY8jTjay9dP5" --system
 sudo radosgw-admin user create --uid="us-west" --display-name="Region-US 
Zone-West" --name client.radosgw.us-east-1 --access_key="AAK0ST8WXTMWZGN29NF9" 
--secret="AAJm8uAp71xKQZkjoPZmHu4sACA1SY8jTjay9dP5" --system
 #6 creat zone users not system user
sudo radosgw-admin user create --uid="us-test-east" --display-name="Region-US 
Zone-East-test" --name client.radosgw.us-east-1 
--access_key="DDK0ST8WXTMWZGN29NF9" 
--secret="DDJm8uAp71xKQZkjoPZmHu4sACA1SY8jTjay9dP5" 
 #7 subuser create
sudo radosgw-admin subuser create --uid="us-test-east"  
--subuser="us-test-east:swift" --access=full --name client.radosgw.us-east-1 
--key-type swift --secret="ffJm8uAp71xKQZkjoPZmHu4sACA1SY8jTjay9dP5"
 sudo /etc/init.d/ceph -a restart
sudo /etc/init.d/httpd re
sudo /etc/init.d/ceph-radosgw restart
 for the  second cluster, I do the follow steps
1??create pools
sudo ceph osd pool create .us-west.rgw.root 64  64
sudo ceph osd pool create .us-west.rgw.control 64 64
sudo ceph osd pool create .us-west.rgw.gc 64 64
sudo ceph osd pool create .us-west.rgw.buckets 64 64
sudo ceph osd pool create .us-west.rgw.buckets.index 64 64
sudo ceph osd pool create .us-west.rgw.buckets.extra 64 64
sudo ceph osd pool create .us-west.log 64 64
sudo ceph osd pool create .us-west.intent-log 64 64
sudo ceph osd pool create .us-west.usage 64 64
sudo ceph osd pool create .us-west.users 64 64
sudo ceph osd pool create .us-west.users.email 64 64
sudo ceph osd pool create .us-west.users.swift 64 64
sudo ceph osd pool create .us-west.users.uid 64 64
 2??create a keyring
sudo ceph-authtool --create-k

Re: [ceph-users] [cephfs][ceph-fuse] cache size or memory leak?

2015-05-08 Thread Yan, Zheng
On Fri, May 8, 2015 at 11:15 AM, Dexter Xiong  wrote:
> I tried "echo 3 > /proc/sys/vm/drop_caches" and dentry_pinned_count dropped.
>
> Thanks for your help.
>

could you please try the attached patch


patch
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Vandeir Eduardo
This causes an annoying problem with rbd resource agent in pacemaker. In a
situation where pacemaker needs to stop a rbd resource agent on a node
where there is no network connection, the rbd unmap command hangs. This
causes the resource agent stop command to timeout and the node is fenced.

On Thu, May 7, 2015 at 4:37 PM, Ilya Dryomov  wrote:

> On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo
>  wrote:
> > Hi,
> >
> > when issuing rbd unmap command when there is no network connection with
> mons
> > and osds, the command hangs. Isn't there a option to force unmap even on
> > this situation?
>
> No, but you can Ctrl-C the unmap command and that should do it.  In the
> dmesg you'll see something like
>
>   rbd: unable to tear down watch request
>
> and you may have to wait for the cluster to timeout the watch.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Ilya Dryomov
On Fri, May 8, 2015 at 1:18 PM, Vandeir Eduardo
 wrote:
> This causes an annoying problem with rbd resource agent in pacemaker. In a
> situation where pacemaker needs to stop a rbd resource agent on a node where
> there is no network connection, the rbd unmap command hangs. This causes the
> resource agent stop command to timeout and the node is fenced.
>
> On Thu, May 7, 2015 at 4:37 PM, Ilya Dryomov  wrote:
>>
>> On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo
>>  wrote:
>> > Hi,
>> >
>> > when issuing rbd unmap command when there is no network connection with
>> > mons
>> > and osds, the command hangs. Isn't there a option to force unmap even on
>> > this situation?
>>
>> No, but you can Ctrl-C the unmap command and that should do it.  In the
>> dmesg you'll see something like
>>
>>   rbd: unable to tear down watch request
>>
>> and you may have to wait for the cluster to timeout the watch.

We can probably add a --force to rbd unmap.  That would require extending our
sysfs interface but I don't see any obstacles.  Sage?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Ilya Dryomov
On Fri, May 8, 2015 at 3:59 PM, Ilya Dryomov  wrote:
> On Fri, May 8, 2015 at 1:18 PM, Vandeir Eduardo
>  wrote:
>> This causes an annoying problem with rbd resource agent in pacemaker. In a
>> situation where pacemaker needs to stop a rbd resource agent on a node where
>> there is no network connection, the rbd unmap command hangs. This causes the
>> resource agent stop command to timeout and the node is fenced.
>>
>> On Thu, May 7, 2015 at 4:37 PM, Ilya Dryomov  wrote:
>>>
>>> On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo
>>>  wrote:
>>> > Hi,
>>> >
>>> > when issuing rbd unmap command when there is no network connection with
>>> > mons
>>> > and osds, the command hangs. Isn't there a option to force unmap even on
>>> > this situation?
>>>
>>> No, but you can Ctrl-C the unmap command and that should do it.  In the
>>> dmesg you'll see something like
>>>
>>>   rbd: unable to tear down watch request
>>>
>>> and you may have to wait for the cluster to timeout the watch.
>
> We can probably add a --force to rbd unmap.  That would require extending our
> sysfs interface but I don't see any obstacles.  Sage?

On a second thought, we can timeout our wait for a reply to a watch
teardown request with a configurable timeout (mount_timeout).  We might
still need --force for more in the future, but for this particular
problem the timeout is a better solution I think.  I'll take care of
it.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Vandeir Eduardo
Wouldn't be better a configuration named (map|unmap)_timeout? Cause we are
talking about a map/unmap of a RBD device, not a mount/unmount of a file
system.

On Fri, May 8, 2015 at 10:04 AM, Ilya Dryomov  wrote:

> On Fri, May 8, 2015 at 3:59 PM, Ilya Dryomov  wrote:
> > On Fri, May 8, 2015 at 1:18 PM, Vandeir Eduardo
> >  wrote:
> >> This causes an annoying problem with rbd resource agent in pacemaker.
> In a
> >> situation where pacemaker needs to stop a rbd resource agent on a node
> where
> >> there is no network connection, the rbd unmap command hangs. This
> causes the
> >> resource agent stop command to timeout and the node is fenced.
> >>
> >> On Thu, May 7, 2015 at 4:37 PM, Ilya Dryomov 
> wrote:
> >>>
> >>> On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo
> >>>  wrote:
> >>> > Hi,
> >>> >
> >>> > when issuing rbd unmap command when there is no network connection
> with
> >>> > mons
> >>> > and osds, the command hangs. Isn't there a option to force unmap
> even on
> >>> > this situation?
> >>>
> >>> No, but you can Ctrl-C the unmap command and that should do it.  In the
> >>> dmesg you'll see something like
> >>>
> >>>   rbd: unable to tear down watch request
> >>>
> >>> and you may have to wait for the cluster to timeout the watch.
> >
> > We can probably add a --force to rbd unmap.  That would require
> extending our
> > sysfs interface but I don't see any obstacles.  Sage?
>
> On a second thought, we can timeout our wait for a reply to a watch
> teardown request with a configurable timeout (mount_timeout).  We might
> still need --force for more in the future, but for this particular
> problem the timeout is a better solution I think.  I'll take care of
> it.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Ilya Dryomov
On Fri, May 8, 2015 at 4:13 PM, Vandeir Eduardo
 wrote:
> Wouldn't be better a configuration named (map|unmap)_timeout? Cause we are
> talking about a map/unmap of a RBD device, not a mount/unmount of a file
> system.

The mount_timeout option is already there and is used in cephfs.  We
could certainly add an unmap-specific option, I'm just not convinced it's
worth a separate option.  Some people still use those two terms
interchangeably, and the meaning of mount_timeout when applied to rbd
map/unmap would be sufficiently clear.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-08 Thread Ilya Dryomov
On Fri, May 8, 2015 at 4:25 PM, Ilya Dryomov  wrote:
> On Fri, May 8, 2015 at 4:13 PM, Vandeir Eduardo
>  wrote:
>> Wouldn't be better a configuration named (map|unmap)_timeout? Cause we are
>> talking about a map/unmap of a RBD device, not a mount/unmount of a file
>> system.
>
> The mount_timeout option is already there and is used in cephfs.  We
> could certainly add an unmap-specific option, I'm just not convinced it's
> worth a separate option.  Some people still use those two terms
> interchangeably, and the meaning of mount_timeout when applied to rbd
> map/unmap would be sufficiently clear.

I'm not married to it though, more opinions are welcome!

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] export-diff exported only 4kb instead of 200-600gb

2015-05-08 Thread Jason Dillaman
There is probably something that I am not understanding, but to me it sounds 
like you are saying that there are no/minimal deltas between snapshots 
move2db24-20150428 and 2015-05-05 (both from the export-diff and from your 
clone).  Are you certain that you made 700-800GBs of changes between the two 
snapshots and no trim operations released your changes back?  If you diff from 
move2db24-20150428 to HEAD, do you see all your changes?

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: "Ultral"  
To: "ceph-users"  
Sent: Thursday, May 7, 2015 11:45:46 AM 
Subject: [ceph-users] export-diff exported only 4kb instead of 200-600gb 

Hi all, 


Something strange occurred. 
I have ceph 0.87 version and 2048gb image format 1. I decided to made 
incremental backups between clusters 

i've made initial copy, 

time bbcp -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io "rbd export-diff 
--cluster cluster1 --pool RBD-01 --image 
CEPH_006__01__NA__0003__ESX__ALL_EXT --snap move2db24-20150428 -" 1.1.1.1 
:"rbd import-diff - --cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp 
and decide to move incremental(it should be about 200-600gb of changes) 

time bbcp -c -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io "rbd --cluster 
cluster1 --pool RBD-01 --image CEPH_006__01__NA__0003__ESX__ALL_EXT 
--from-snap move2db24-20150428 --snap 2015-05-05 -" 1.1.1.1 :"rbd import-diff - 
--cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp " 
it itakes about 30min(it was too fast because i have limitation 7M bettween 
clusters) and i decided to check how much data was transfered 
time rbd export-diff --cluster cluster1 --pool RBD-01 --image 
CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap move2db24-20150428 --snap 
2015-05-05 -|wc -c 
4753
Exporting image: 100% complete...done. 
i've double checked it.. it was really 4753 bytes, i've decided to check 
export-diff file 

000: 7262 6420 6469  2076 310a 6612   rbd diff v1.f...
010: 006d 6f76 6532 6462 3234 2d32 3031 3530  .move2db24-20150
020: 3432 3874 0a00  3230 3135 2d30 352d  428t2015-05-
030: 3035 7300   0200 0077 0080 5501  05sw..U.
040:   0002    02ef cdab  
050: 0080 3500   0d00     ..5.
060: 2d58 3aff 5002  bc56 2255 08fc 14a9  -X:.PV"U
070: e6c0 e839 351a 942c 01de 4603 0e00   ...95..,..F.
080: 3a00         :...
090:          
0a0:          

..
0001270:          
0001280:          
0001290: 65   e 
it look like correct format.. 

i've made clone(like flex clone) from snapshot 2015-05-05 & found that it 
doesn't have changes from snap move2db24-20150428 

Do you have any ideas, what should i check? why is it hapend? 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd does not start when object store is set to "newstore"

2015-05-08 Thread Srikanth Madugundi
I tried adding "enable experimental unrecoverable data corrupting features
= newstore rocksdb" but no luck.

Here is the config I am using.

[global]
.
.
.

osd objectstore = newstore

newstore backend = rocksdb

enable experimental unrecoverable data corrupting features = newstore
rocksdb


Regards

-Srikanth

On Thu, May 7, 2015 at 10:59 PM, Somnath Roy 
wrote:

>  I think you need to add the following..
>
>
>
> enable experimental unrecoverable data corrupting features = newstore
> rocksdb
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Srikanth Madugundi
> *Sent:* Thursday, May 07, 2015 10:56 PM
> *To:* ceph-us...@ceph.com
> *Subject:* [ceph-users] osd does not start when object store is set to
> "newstore"
>
>
>
> Hi,
>
>
>
> I built and installed ceph source from (wip-newstore) branch and could not
> start osd with "newstore" as osd objectstore.
>
>
>
> $ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c
> /etc/ceph/ceph.conf --cluster ceph -f
>
> 2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
>
> $
>
>
>
>  ceph.config ( I have the following settings in ceph.conf)
>
>
>
> [global]
>
> osd objectstore = newstore
>
> newstore backend = rocksdb
>
>
>
> enable experimental unrecoverable data corrupting features = newstore
>
>
>
> The logs does not show much details.
>
>
>
> $ tail -f /var/log/ceph/ceph-osd.0.log
>
> 2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process
> ceph-osd, pid 23514
>
> 2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store
>
>
>
> Am I missing something?
>
>
>
> Regards
>
> Srikanth
>
> --
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd does not start when object store is set to "newstore"

2015-05-08 Thread Somnath Roy
I think you need to build code with rocksdb enabled if you are not already 
doing this.

Go to root folder and try this..

./do_autogen.sh –r

Thanks & Regards
Somnath

From: Srikanth Madugundi [mailto:srikanth.madugu...@gmail.com]
Sent: Friday, May 08, 2015 10:33 AM
To: Somnath Roy
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd does not start when object store is set to 
"newstore"

I tried adding "enable experimental unrecoverable data corrupting features = 
newstore rocksdb" but no luck.

Here is the config I am using.

[global]
.
.
.
osd objectstore = newstore
newstore backend = rocksdb
enable experimental unrecoverable data corrupting features = newstore rocksdb

Regards
-Srikanth

On Thu, May 7, 2015 at 10:59 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
I think you need to add the following..

enable experimental unrecoverable data corrupting features = newstore rocksdb

Thanks & Regards
Somnath


From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Srikanth Madugundi
Sent: Thursday, May 07, 2015 10:56 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] osd does not start when object store is set to "newstore"

Hi,

I built and installed ceph source from (wip-newstore) branch and could not 
start osd with "newstore" as osd objectstore.

$ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c 
/etc/ceph/ceph.conf --cluster ceph -f
2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
$

 ceph.config ( I have the following settings in ceph.conf)

[global]
osd objectstore = newstore
newstore backend = rocksdb

enable experimental unrecoverable data corrupting features = newstore

The logs does not show much details.

$ tail -f /var/log/ceph/ceph-osd.0.log
2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process ceph-osd, 
pid 23514
2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store

Am I missing something?

Regards
Srikanth



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "too many PGs per OSD" in Hammer

2015-05-08 Thread Chris Armstrong
We actually have 3 OSDs by default, but some users run 5. Typically we're
not looking at more than that. Should we try 64? I suppose I still don't
understand the tradeoffs here - using fewer PGs definitely makes platform
start faster (and replication when adding new hosts), but not sure what
having more PGs buys us.

We use the default pools plus radosgw, so we have 12 pools in total.

12 pools, 3 OSDs, with a size set to 3 (so each OSD has all PGs).

On Thu, May 7, 2015 at 11:49 PM, Somnath Roy 
wrote:

>  Sorry, I didn’t read through all..It seems you have 6 OSDs, so, I would
> say 128 PGs per pool is not bad !
>
> But, if you keep on adding pools, you need to lower this number, generally
> ~64 PGs per pool should achieve good parallelism with lower number of
> OSDs..If you grow your cluster , create pools with more PGs..
>
> Again, the warning number is a ballpark number, if you have more powerful
> compute and fast disk , you can safely ignore this warning.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* Somnath Roy
> *Sent:* Thursday, May 07, 2015 11:44 PM
> *To:* 'Chris Armstrong'
> *Cc:* Stuart Longland; ceph-users@lists.ceph.com
> *Subject:* RE: [ceph-users] "too many PGs per OSD" in Hammer
>
>
>
> Nope, 16 seems way too less for performance.
>
> How many OSDs you have ? And how many pools are you planning to create ?
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* Chris Armstrong [mailto:carmstr...@engineyard.com
> ]
> *Sent:* Thursday, May 07, 2015 11:34 PM
> *To:* Somnath Roy
> *Cc:* Stuart Longland; ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] "too many PGs per OSD" in Hammer
>
>
>
> Thanks for the details, Somnath.
>
>
>
> So it definitely sounds like 128 pgs per pool is way too many? I lowered
> ours to 16 on a new deploy and the warning is gone. I'm not sure if this
> number is sufficient, though...
>
>
>
> On Wed, May 6, 2015 at 4:10 PM, Somnath Roy 
> wrote:
>
> Just checking, are you aware of this ?
>
> http://ceph.com/pgcalc/
>
> FYI, the warning is given based on the following logic.
>
> int per = sum_pg_up / num_in;
> if (per > g_conf->mon_pg_warn_max_per_osd) {
> //raise warning..
>}
>
> This is not considering any resources..It is solely depends on number of
> in OSDs and total number of PGs in the cluster. Default
> mon_pg_warn_max_per_osd = 300, so, in your cluster per OSD is serving > 300
> PGs it seems.
> It will be good if you assign PGs in your pool keeping the above
> calculation in mind i.e no more than 300 PGs/ OSD..
> But, if you feel you OSD is in fast disk and box has lot of compute power,
> you may want to try out with more number of PGs/OSD. In this case, raise
> the mon_pg_warn_max_per_osd to something big and warning should go away.
>
> Hope this helps,
>
> Thanks & Regards
> Somnath
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Stuart Longland
> Sent: Wednesday, May 06, 2015 3:48 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer
>
> On 07/05/15 07:53, Chris Armstrong wrote:
> > Thanks for the feedback. That language is confusing to me, then, since
> > the first paragraph seems to suggest using a pg_num of 128 in cases
> > where we have less than 5 OSDs, as we do here.
> >
> > The warning below that is: "As the number of OSDs increases, chosing
> > the right value for pg_num becomes more important because it has a
> > significant influence on the behavior of the cluster as well as the
> > durability of the data when something goes wrong (i.e. the probability
> > that a catastrophic event leads to data loss).", which suggests that
> > this could be an issue with more OSDs, which doesn't apply here.
> >
> > Do we know if this warning is calculated based on the resources of the
> > host? If I try with larger machines, will this warning change?
>
> I'd be interested in an answer here too.  I just did an update from Giant
> to Hammer and struck the same dreaded error message.
>
> When I initially deployed Ceph (with Emperor), I worked out according to
> the formula given on the site:
>
> > # We have: 3 OSD nodes with 2 OSDs each
> > # giving us 6 OSDs total.
> > # There are 3 replicas, so the recommended number of
> > # placement groups is:
> > #  6 * 100 / 3
> > # which gives: 200 placement groups.
> > # Rounding this up to the nearest power of two gives:
> > osd pool default pg num = 256
> > osd pool default pgp num = 256
>
> It seems this was a bad value to use.  I now have a problem of a biggish
> lump of data sitting in a pool with an inappropriate number of placement
> groups.  It seems I needed to divide this number by the number of pools.
>
> For now I've shut it up with the following:
>
> > [mon]
> > mon warn on legacy crush tunables = false
> > # New warning on move to Hammer
> > mon pg warn max per osd = 2048
>
> Question is, how does one go about fixing this? 

Re: [ceph-users] osd does not start when object store is set to "newstore"

2015-05-08 Thread Srikanth Madugundi
I tried with leveldb, osd does not start either

osd objectstore = newstore

newstore backend = leveldb

enable experimental unrecoverable data corrupting features = newstore
leveldb


-Srikanth


On Fri, May 8, 2015 at 10:41 AM, Somnath Roy 
wrote:

>  I think you need to build code with rocksdb enabled if you are not
> already doing this.
>
>
>
> Go to root folder and try this..
>
>
>
> ./do_autogen.sh –r
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* Srikanth Madugundi [mailto:srikanth.madugu...@gmail.com]
> *Sent:* Friday, May 08, 2015 10:33 AM
> *To:* Somnath Roy
> *Cc:* ceph-us...@ceph.com
> *Subject:* Re: [ceph-users] osd does not start when object store is set
> to "newstore"
>
>
>
> I tried adding "enable experimental unrecoverable data corrupting features
> = newstore rocksdb" but no luck.
>
>
>
> Here is the config I am using.
>
>
>
> [global]
>
> .
>
> .
>
> .
>
> osd objectstore = newstore
>
> newstore backend = rocksdb
>
> enable experimental unrecoverable data corrupting features = newstore
> rocksdb
>
>
>
> Regards
>
> -Srikanth
>
>
>
> On Thu, May 7, 2015 at 10:59 PM, Somnath Roy 
> wrote:
>
> I think you need to add the following..
>
>
>
> enable experimental unrecoverable data corrupting features = newstore
> rocksdb
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Srikanth Madugundi
> *Sent:* Thursday, May 07, 2015 10:56 PM
> *To:* ceph-us...@ceph.com
> *Subject:* [ceph-users] osd does not start when object store is set to
> "newstore"
>
>
>
> Hi,
>
>
>
> I built and installed ceph source from (wip-newstore) branch and could not
> start osd with "newstore" as osd objectstore.
>
>
>
> $ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c
> /etc/ceph/ceph.conf --cluster ceph -f
>
> 2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
>
> $
>
>
>
>  ceph.config ( I have the following settings in ceph.conf)
>
>
>
> [global]
>
> osd objectstore = newstore
>
> newstore backend = rocksdb
>
>
>
> enable experimental unrecoverable data corrupting features = newstore
>
>
>
> The logs does not show much details.
>
>
>
> $ tail -f /var/log/ceph/ceph-osd.0.log
>
> 2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process
> ceph-osd, pid 23514
>
> 2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store
>
>
>
> Am I missing something?
>
>
>
> Regards
>
> Srikanth
>
>
>  --
>
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd does not start when object store is set to "newstore"

2015-05-08 Thread Krishna Mohan
re-init the osd with the following command
sudo ceph-osd --id 0 --mkkey —mkfs
and then restart the osd again.

I hope it works.
-Krishna

> On May 8, 2015, at 11:56 AM, Srikanth Madugundi 
>  wrote:
> 
> I tried with leveldb, osd does not start either
> 
> osd objectstore = newstore
> 
> newstore backend = leveldb
> 
> enable experimental unrecoverable data corrupting features = newstore leveldb
> 
> 
> 
> -Srikanth
> 
> 
> 
> On Fri, May 8, 2015 at 10:41 AM, Somnath Roy  > wrote:
> I think you need to build code with rocksdb enabled if you are not already 
> doing this.
> 
>  
> 
> Go to root folder and try this..
> 
>  
> 
> ./do_autogen.sh –r
> 
>  
> 
> Thanks & Regards
> 
> Somnath
> 
>  
> 
> From: Srikanth Madugundi [mailto:srikanth.madugu...@gmail.com 
> ] 
> Sent: Friday, May 08, 2015 10:33 AM
> To: Somnath Roy
> Cc: ceph-us...@ceph.com 
> Subject: Re: [ceph-users] osd does not start when object store is set to 
> "newstore"
> 
>  
> 
> I tried adding "enable experimental unrecoverable data corrupting features = 
> newstore rocksdb" but no luck.
> 
>  
> 
> Here is the config I am using.
> 
>  
> 
> [global]
> 
> .
> 
> .
> 
> .
> 
> osd objectstore = newstore
> 
> newstore backend = rocksdb
> 
> enable experimental unrecoverable data corrupting features = newstore rocksdb
> 
>  
> 
> Regards
> 
> -Srikanth
> 
>  
> 
> On Thu, May 7, 2015 at 10:59 PM, Somnath Roy  > wrote:
> 
> I think you need to add the following..
> 
>  
> 
> enable experimental unrecoverable data corrupting features = newstore rocksdb
> 
>  
> 
> Thanks & Regards
> 
> Somnath
> 
>  
> 
>  
> 
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> ] On Behalf Of Srikanth Madugundi
> Sent: Thursday, May 07, 2015 10:56 PM
> To: ceph-us...@ceph.com 
> Subject: [ceph-users] osd does not start when object store is set to 
> "newstore"
> 
>  
> 
> Hi,
> 
>  
> 
> I built and installed ceph source from (wip-newstore) branch and could not 
> start osd with "newstore" as osd objectstore. 
> 
>  
> 
> $ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c 
> /etc/ceph/ceph.conf --cluster ceph -f 
> 
> 2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
> 
> $
> 
>  
> 
>  ceph.config ( I have the following settings in ceph.conf)
> 
>  
> 
> [global]
> 
> osd objectstore = newstore
> 
> newstore backend = rocksdb
> 
>  
> 
> enable experimental unrecoverable data corrupting features = newstore
> 
>  
> 
> The logs does not show much details.
> 
>  
> 
> $ tail -f /var/log/ceph/ceph-osd.0.log 
> 
> 2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process 
> ceph-osd, pid 23514
> 
> 2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store
> 
>  
> 
> Am I missing something?
> 
>  
> 
> Regards
> 
> Srikanth
> 
>  
> 
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 
>  
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Missing /etc/init.d/ceph file

2015-05-08 Thread Srikanth Madugundi
Hi,

I am setting up ceph from git master branch (g...@github.com:ceph/ceph.git)
and followed the steps listed at
http://docs.ceph.com/docs/master/install/build-ceph/

The build was successful on my RHEL6 host and used "make install" to
install the packages as described here.

http://docs.ceph.com/docs/master/install/install-storage-cluster/#installing-a-build

I am trying to start the osd server and the install did not create the
/etc/init.d/ceph file, so I could not run the "sudo /etc/init.d/ceph start
osd.0" command. Did I missing something?

Regards
-Srikanth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Find out the location of OSD Journal

2015-05-08 Thread Patrik Plank
Hi,



i cant remember on which drive I install which OSD journal :-||

Is there any command to show this?




thanks

regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "too many PGs per OSD" in Hammer

2015-05-08 Thread Daniel Hoffman
Is there a way to shrink/merge PG's on a pool without removing it?

I have a pool with some data in it but the PG's were miscalculated and just
wondering the best way to resolve it.

On Fri, May 8, 2015 at 4:49 PM, Somnath Roy  wrote:

>  Sorry, I didn’t read through all..It seems you have 6 OSDs, so, I would
> say 128 PGs per pool is not bad !
>
> But, if you keep on adding pools, you need to lower this number, generally
> ~64 PGs per pool should achieve good parallelism with lower number of
> OSDs..If you grow your cluster , create pools with more PGs..
>
> Again, the warning number is a ballpark number, if you have more powerful
> compute and fast disk , you can safely ignore this warning.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* Somnath Roy
> *Sent:* Thursday, May 07, 2015 11:44 PM
> *To:* 'Chris Armstrong'
> *Cc:* Stuart Longland; ceph-users@lists.ceph.com
> *Subject:* RE: [ceph-users] "too many PGs per OSD" in Hammer
>
>
>
> Nope, 16 seems way too less for performance.
>
> How many OSDs you have ? And how many pools are you planning to create ?
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* Chris Armstrong [mailto:carmstr...@engineyard.com
> ]
> *Sent:* Thursday, May 07, 2015 11:34 PM
> *To:* Somnath Roy
> *Cc:* Stuart Longland; ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] "too many PGs per OSD" in Hammer
>
>
>
> Thanks for the details, Somnath.
>
>
>
> So it definitely sounds like 128 pgs per pool is way too many? I lowered
> ours to 16 on a new deploy and the warning is gone. I'm not sure if this
> number is sufficient, though...
>
>
>
> On Wed, May 6, 2015 at 4:10 PM, Somnath Roy 
> wrote:
>
> Just checking, are you aware of this ?
>
> http://ceph.com/pgcalc/
>
> FYI, the warning is given based on the following logic.
>
> int per = sum_pg_up / num_in;
> if (per > g_conf->mon_pg_warn_max_per_osd) {
> //raise warning..
>}
>
> This is not considering any resources..It is solely depends on number of
> in OSDs and total number of PGs in the cluster. Default
> mon_pg_warn_max_per_osd = 300, so, in your cluster per OSD is serving > 300
> PGs it seems.
> It will be good if you assign PGs in your pool keeping the above
> calculation in mind i.e no more than 300 PGs/ OSD..
> But, if you feel you OSD is in fast disk and box has lot of compute power,
> you may want to try out with more number of PGs/OSD. In this case, raise
> the mon_pg_warn_max_per_osd to something big and warning should go away.
>
> Hope this helps,
>
> Thanks & Regards
> Somnath
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Stuart Longland
> Sent: Wednesday, May 06, 2015 3:48 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer
>
> On 07/05/15 07:53, Chris Armstrong wrote:
> > Thanks for the feedback. That language is confusing to me, then, since
> > the first paragraph seems to suggest using a pg_num of 128 in cases
> > where we have less than 5 OSDs, as we do here.
> >
> > The warning below that is: "As the number of OSDs increases, chosing
> > the right value for pg_num becomes more important because it has a
> > significant influence on the behavior of the cluster as well as the
> > durability of the data when something goes wrong (i.e. the probability
> > that a catastrophic event leads to data loss).", which suggests that
> > this could be an issue with more OSDs, which doesn't apply here.
> >
> > Do we know if this warning is calculated based on the resources of the
> > host? If I try with larger machines, will this warning change?
>
> I'd be interested in an answer here too.  I just did an update from Giant
> to Hammer and struck the same dreaded error message.
>
> When I initially deployed Ceph (with Emperor), I worked out according to
> the formula given on the site:
>
> > # We have: 3 OSD nodes with 2 OSDs each
> > # giving us 6 OSDs total.
> > # There are 3 replicas, so the recommended number of
> > # placement groups is:
> > #  6 * 100 / 3
> > # which gives: 200 placement groups.
> > # Rounding this up to the nearest power of two gives:
> > osd pool default pg num = 256
> > osd pool default pgp num = 256
>
> It seems this was a bad value to use.  I now have a problem of a biggish
> lump of data sitting in a pool with an inappropriate number of placement
> groups.  It seems I needed to divide this number by the number of pools.
>
> For now I've shut it up with the following:
>
> > [mon]
> > mon warn on legacy crush tunables = false
> > # New warning on move to Hammer
> > mon pg warn max per osd = 2048
>
> Question is, how does one go about fixing this?  I'd rather not blow away
> production pools just at this point although right now we only have one
> major production load, so if we're going to do it at any time, now is the
> time to do it.
>
> Worst bit is this will probably change: so I can see me hitting this
> proble

Re: [ceph-users] "too many PGs per OSD" in Hammer

2015-05-08 Thread Somnath Roy
> wrote:
Sorry, I didn’t read through all..It seems you have 6 OSDs, so, I would say 128 
PGs per pool is not bad !
But, if you keep on adding pools, you need to lower this number, generally ~64 
PGs per pool should achieve good parallelism with lower number of OSDs..If you 
grow your cluster , create pools with more PGs..
Again, the warning number is a ballpark number, if you have more powerful 
compute and fast disk , you can safely ignore this warning.

Thanks & Regards
Somnath

From: Somnath Roy
Sent: Thursday, May 07, 2015 11:44 PM
To: 'Chris Armstrong'
Cc: Stuart Longland; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] "too many PGs per OSD" in Hammer

Nope, 16 seems way too less for performance.
How many OSDs you have ? And how many pools are you planning to create ?

Thanks & Regards
Somnath

From: Chris Armstrong [mailto:carmstr...@engineyard.com]
Sent: Thursday, May 07, 2015 11:34 PM
To: Somnath Roy
Cc: Stuart Longland; ceph-users@lists.ceph.com

Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer

Thanks for the details, Somnath.

So it definitely sounds like 128 pgs per pool is way too many? I lowered ours 
to 16 on a new deploy and the warning is gone. I'm not sure if this number is 
sufficient, though...

On Wed, May 6, 2015 at 4:10 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
Just checking, are you aware of this ?

http://ceph.com/pgcalc/

FYI, the warning is given based on the following logic.

int per = sum_pg_up / num_in;
if (per > g_conf->mon_pg_warn_max_per_osd) {
//raise warning..
   }

This is not considering any resources..It is solely depends on number of in 
OSDs and total number of PGs in the cluster. Default mon_pg_warn_max_per_osd = 
300, so, in your cluster per OSD is serving > 300 PGs it seems.
It will be good if you assign PGs in your pool keeping the above calculation in 
mind i.e no more than 300 PGs/ OSD..
But, if you feel you OSD is in fast disk and box has lot of compute power, you 
may want to try out with more number of PGs/OSD. In this case, raise the 
mon_pg_warn_max_per_osd to something big and warning should go away.

Hope this helps,

Thanks & Regards
Somnath

-Original Message-
From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Stuart Longland
Sent: Wednesday, May 06, 2015 3:48 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer

On 07/05/15 07:53, Chris Armstrong wrote:
> Thanks for the feedback. That language is confusing to me, then, since
> the first paragraph seems to suggest using a pg_num of 128 in cases
> where we have less than 5 OSDs, as we do here.
>
> The warning below that is: "As the number of OSDs increases, chosing
> the right value for pg_num becomes more important because it has a
> significant influence on the behavior of the cluster as well as the
> durability of the data when something goes wrong (i.e. the probability
> that a catastrophic event leads to data loss).", which suggests that
> this could be an issue with more OSDs, which doesn't apply here.
>
> Do we know if this warning is calculated based on the resources of the
> host? If I try with larger machines, will this warning change?

I'd be interested in an answer here too.  I just did an update from Giant to 
Hammer and struck the same dreaded error message.

When I initially deployed Ceph (with Emperor), I worked out according to the 
formula given on the site

Re: [ceph-users] osd does not start when object store is set to "newstore"

2015-05-08 Thread Somnath Roy
Changing this setting alone will not work…Did you do mkfs ?

From: Srikanth Madugundi [mailto:srikanth.madugu...@gmail.com]
Sent: Friday, May 08, 2015 11:56 AM
To: Somnath Roy
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd does not start when object store is set to 
"newstore"

I tried with leveldb, osd does not start either

osd objectstore = newstore
newstore backend = leveldb
enable experimental unrecoverable data corrupting features = newstore leveldb

-Srikanth


On Fri, May 8, 2015 at 10:41 AM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
I think you need to build code with rocksdb enabled if you are not already 
doing this.

Go to root folder and try this..

./do_autogen.sh –r

Thanks & Regards
Somnath

From: Srikanth Madugundi 
[mailto:srikanth.madugu...@gmail.com]
Sent: Friday, May 08, 2015 10:33 AM
To: Somnath Roy
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd does not start when object store is set to 
"newstore"

I tried adding "enable experimental unrecoverable data corrupting features = 
newstore rocksdb" but no luck.

Here is the config I am using.

[global]
.
.
.
osd objectstore = newstore
newstore backend = rocksdb
enable experimental unrecoverable data corrupting features = newstore rocksdb

Regards
-Srikanth

On Thu, May 7, 2015 at 10:59 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
I think you need to add the following..

enable experimental unrecoverable data corrupting features = newstore rocksdb

Thanks & Regards
Somnath


From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Srikanth Madugundi
Sent: Thursday, May 07, 2015 10:56 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] osd does not start when object store is set to "newstore"

Hi,

I built and installed ceph source from (wip-newstore) branch and could not 
start osd with "newstore" as osd objectstore.

$ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c 
/etc/ceph/ceph.conf --cluster ceph -f
2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
$

 ceph.config ( I have the following settings in ceph.conf)

[global]
osd objectstore = newstore
newstore backend = rocksdb

enable experimental unrecoverable data corrupting features = newstore

The logs does not show much details.

$ tail -f /var/log/ceph/ceph-osd.0.log
2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process ceph-osd, 
pid 23514
2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store

Am I missing something?

Regards
Srikanth



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] accepter.accepter.bind unable to bind to IP on any port in range 6800-7300:

2015-05-08 Thread Bruce McFarland
I've run into an issue starting OSD's where I'm running out of ports. I've 
increased the port range with "ms bind port max" and on the next attempt to 
start the osd it reports no ports in the new range. I am only running 1 osd on 
the node and rarely restart the osd. I've increased the debug level to 20 and 
the only additional information in the log file is the PID for the process that 
can't get a port. IPtables is not loaded. This has just recently started 
occurring on multiple osd's and might possibly be releated to my issues with 
salt and debugging of the calamari master not recognizing ceph-mon even though 
'salt \* ceph.get_heartbeats' returns info for all nodes, monmap etc.

2015-05-08 10:52:17.861855 773b7000  0 ceph version 0.86 
(97dcc0539dfa7dac3de74852305d51580b7b1f82), process ceph-osd, pid 4629
2015-05-08 10:52:17.864413 773b7000 -1 accepter.accepter.bind unable to bind to 
192.168.2.102:7370 on any port in range 6800-7370: (126) Cannot assign 
requested address
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "too many PGs per OSD" in Hammer

2015-05-08 Thread Somnath Roy
There are 2 parameters in a pool, pg_num and pgp_num.
Pg_num you can’t decrease, but, pgp_num you can. This is the total number of PG 
for placement purpose. If you reduce that, you will see rebalancing will start 
and things should settle down after it is done.

But, I am not aware of any other impact of this. Generally, it is recommended 
to keep pg_num and pgp_num same.

Thanks & Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daniel 
Hoffman
Sent: Friday, May 08, 2015 4:49 AM
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer

Is there a way to shrink/merge PG's on a pool without removing it?
I have a pool with some data in it but the PG's were miscalculated and just 
wondering the best way to resolve it.

On Fri, May 8, 2015 at 4:49 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
Sorry, I didn’t read through all..It seems you have 6 OSDs, so, I would say 128 
PGs per pool is not bad !
But, if you keep on adding pools, you need to lower this number, generally ~64 
PGs per pool should achieve good parallelism with lower number of OSDs..If you 
grow your cluster , create pools with more PGs..
Again, the warning number is a ballpark number, if you have more powerful 
compute and fast disk , you can safely ignore this warning.

Thanks & Regards
Somnath

From: Somnath Roy
Sent: Thursday, May 07, 2015 11:44 PM
To: 'Chris Armstrong'
Cc: Stuart Longland; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] "too many PGs per OSD" in Hammer

Nope, 16 seems way too less for performance.
How many OSDs you have ? And how many pools are you planning to create ?

Thanks & Regards
Somnath

From: Chris Armstrong [mailto:carmstr...@engineyard.com]
Sent: Thursday, May 07, 2015 11:34 PM
To: Somnath Roy
Cc: Stuart Longland; ceph-users@lists.ceph.com

Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer

Thanks for the details, Somnath.

So it definitely sounds like 128 pgs per pool is way too many? I lowered ours 
to 16 on a new deploy and the warning is gone. I'm not sure if this number is 
sufficient, though...

On Wed, May 6, 2015 at 4:10 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
Just checking, are you aware of this ?

http://ceph.com/pgcalc/

FYI, the warning is given based on the following logic.

int per = sum_pg_up / num_in;
if (per > g_conf->mon_pg_warn_max_per_osd) {
//raise warning..
   }

This is not considering any resources..It is solely depends on number of in 
OSDs and total number of PGs in the cluster. Default mon_pg_warn_max_per_osd = 
300, so, in your cluster per OSD is serving > 300 PGs it seems.
It will be good if you assign PGs in your pool keeping the above calculation in 
mind i.e no more than 300 PGs/ OSD..
But, if you feel you OSD is in fast disk and box has lot of compute power, you 
may want to try out with more number of PGs/OSD. In this case, raise the 
mon_pg_warn_max_per_osd to something big and warning should go away.

Hope this helps,

Thanks & Regards
Somnath

-Original Message-
From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Stuart Longland
Sent: Wednesday, May 06, 2015 3:48 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] "too many PGs per OSD" in Hammer

On 07/05/15 07:53, Chris Armstrong wrote:
> Thanks for the feedback. That language is confusing to me, then, since
> the first paragraph seems to suggest using a pg_num of 128 in cases
> where we have less than 5 OSDs, as we do here.
>
> The warning below that is: "As the number of OSDs increases, chosing
> the right value for pg_num becomes more important because it has a
> significant influence on the behavior of the cluster as well as the
> durability of the data when something goes wrong (i.e. the probability
> that a catastrophic event leads to data loss).", which suggests that
> this could be an issue with more OSDs, which doesn't apply here.
>
> Do we know if this warning is calculated based on the resources of the
> host? If I try with larger machines, will this warning change?

I'd be interested in an answer here too.  I just did an update from Giant to 
Hammer and struck the same dreaded error message.

When I initially deployed Ceph (with Emperor), I worked out according to the 
formula given on the site:

> # We have: 3 OSD nodes with 2 OSDs each
> # giving us 6 OSDs total.
> # There are 3 replicas, so the recommended number of
> # placement groups is:
> #  6 * 100 / 3
> # which gives: 200 placement groups.
> # Rounding this up to the nearest power of two gives:
> osd pool default pg num = 256
> osd pool default pgp num = 256

It seems this was a bad value to use.  I now have a problem of a biggish lump 
of data sitting in a pool with an inappropriate number of placement groups

[ceph-users] RFC: Deprecating ceph-tool commands

2015-05-08 Thread Joao Eduardo Luis
All,

While working on #11545 (mon: have mon-specific commands under 'ceph mon
...') I crashed into a slightly tough brick wall.

The purpose of #11545 is to move certain commands, such as 'ceph scrub',
'ceph compact' and 'ceph sync force' to the 'mon' module of the ceph-tool.

These commands have long stood in this format because 'mon'-module
commands have been traditionally considered as being somehow related
with monmaps and/or the MonmapMonitor.  However, from a user
perspective, if they relate to the monitor itself (and not cluster-wide)
they should reside under the 'mon'-module.

As such, I decided they should be moved to 'ceph mon scrub', 'ceph mon
compact' and 'ceph mon sync force'.

Adding these commands and doing the correct mapping is not hard at all.
 However, backward compatibility must be maintained, and simply dropping
the old style commands doesn't seem reasonable at all.

Keeping the old style commands alongside with the new commands is
trivial enough to not pose a problem, but they must go away at some
point.  After everyone get used to the new commands, and as soon as the
vast majority of deployments support the new commands, the old style
commands will simply be clutter.

And while these commands are not widely used, and while most people
certainly have not ever needed to use them, this sort of thing can at
some point be required for any other (most commonly used) command.

As I have not been able to find any mentions to guidelines to
deprecating commands, I thus propose the following:

A command being DEPRECATED must be:

 - clearly marked as DEPRECATED in usage;
 - kept around for at least 2 major releases;
 - kept compatible for the duration of the deprecation period.

Once two major releases go by, the command will then enter the OBSOLETE
period.  This would be one major release, during which the command would
no longer work although still acknowledged.  A simple message down the
lines of 'This command is now obsolete; please check the docs' would
suffice to inform the user.

The command would no longer exist in the next major release.

This approach gives a lifespan of roughly 3 releases (at current rate,
roughly 1.5 years) before being completely dropped.  This should give
enough time to people to realize what has happened and adjust any
scripts they may have.

E.g., a command being deprecated in Infernallis would be completely
dropped in the L-release, spanning its existence to at least one
long-term stable (i.e., jewel) and being dropped as soon as the first
dev cycle for the L-release begins.

Any thoughts and comments are welcome.

Cheers!

  -Joao

p.s., If you want to take a look at how this would translate in terms of
code on the monitor, please check [1].

[1] - https://github.com/ceph/ceph/pull/4595
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RFC: Deprecating ceph-tool commands

2015-05-08 Thread Gregory Farnum
On Fri, May 8, 2015 at 4:55 PM, Joao Eduardo Luis  wrote:
> All,
>
> While working on #11545 (mon: have mon-specific commands under 'ceph mon
> ...') I crashed into a slightly tough brick wall.
>
> The purpose of #11545 is to move certain commands, such as 'ceph scrub',
> 'ceph compact' and 'ceph sync force' to the 'mon' module of the ceph-tool.
>
> These commands have long stood in this format because 'mon'-module
> commands have been traditionally considered as being somehow related
> with monmaps and/or the MonmapMonitor.  However, from a user
> perspective, if they relate to the monitor itself (and not cluster-wide)
> they should reside under the 'mon'-module.
>
> As such, I decided they should be moved to 'ceph mon scrub', 'ceph mon
> compact' and 'ceph mon sync force'.
>
> Adding these commands and doing the correct mapping is not hard at all.
>  However, backward compatibility must be maintained, and simply dropping
> the old style commands doesn't seem reasonable at all.
>
> Keeping the old style commands alongside with the new commands is
> trivial enough to not pose a problem, but they must go away at some
> point.  After everyone get used to the new commands, and as soon as the
> vast majority of deployments support the new commands, the old style
> commands will simply be clutter.
>
> And while these commands are not widely used, and while most people
> certainly have not ever needed to use them, this sort of thing can at
> some point be required for any other (most commonly used) command.
>
> As I have not been able to find any mentions to guidelines to
> deprecating commands, I thus propose the following:
>
> A command being DEPRECATED must be:
>
>  - clearly marked as DEPRECATED in usage;
>  - kept around for at least 2 major releases;
>  - kept compatible for the duration of the deprecation period.
>
> Once two major releases go by, the command will then enter the OBSOLETE
> period.  This would be one major release, during which the command would
> no longer work although still acknowledged.  A simple message down the
> lines of 'This command is now obsolete; please check the docs' would
> suffice to inform the user.
>
> The command would no longer exist in the next major release.
>
> This approach gives a lifespan of roughly 3 releases (at current rate,
> roughly 1.5 years) before being completely dropped.  This should give
> enough time to people to realize what has happened and adjust any
> scripts they may have.
>
> E.g., a command being deprecated in Infernallis would be completely
> dropped in the L-release, spanning its existence to at least one
> long-term stable (i.e., jewel) and being dropped as soon as the first
> dev cycle for the L-release begins.

Well, this is an interesting dilemma. "As a user", I think I'd want it
to be deprecated and warning me for one release I run — I'll see it
when I upgrade and then know I need to fix it — and then it can be
obsoleted in the next release I run.

It's that "release I run" bit that's tricky though, right? I presume
you made it two big releases because that should capture at least one
release supported by downstream providers? But the release marking it
as obsoleted will often not be captured by downstreams. So it seems
like either we should do two releases in each state, or else we should
as a community do one release in each state and if the downstreams
want to keep command mappings around for longer they can do that. In
general it's not like those are going to be tricky patches to apply...

Also, I think this set of commands is sufficiently special-purpose
that you could probably just make the swap and have the top-level ones
spit out an error referring to the move; we don't necessarily need to
make decisions on this right now.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com