Re: [ceph-users] PG Selection Criteria for Deep-Scrub

2014-06-11 Thread Dan Van Der Ster
Hi Greg,
This tracker issue is relevant: http://tracker.ceph.com/issues/7288
Cheers, Dan

On 11 Jun 2014, at 00:30, Gregory Farnum  wrote:

> Hey Mike, has your manual scheduling resolved this? I think I saw
> another similar-sounding report, so a feature request to improve scrub
> scheduling would be welcome. :)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Tue, May 20, 2014 at 5:46 PM, Mike Dawson  wrote:
>> I tend to set it whenever I don't want to be bothered by storage performance
>> woes (nights I value sleep, etc).
>> 
>> This cluster is bounded by relentless small writes (it has a couple dozen
>> rbd volumes backing video surveillance DVRs). Some of the software we run is
>> completely unaffected whereas other software falls apart during periods of
>> deep-scrubs. I theorize it has to do with the individual software's attitude
>> about flushing to disk / buffering.
>> 
>> - Mike
>> 
>> 
>> 
>> On 5/20/2014 8:31 PM, Aaron Ten Clay wrote:
>>> 
>>> For what it's worth, version 0.79 has different headers, and the awk
>>> command needs $19 instead of $20. But here is the output I have on a
>>> small cluster that I recently rebuilt:
>>> 
>>> $ ceph pg dump all | grep active | awk '{ print $19}' | sort -k1 | uniq -c
>>> dumped all in format plain
>>>   1 2014-05-15
>>>   2 2014-05-17
>>>  19 2014-05-18
>>> 193 2014-05-19
>>> 105 2014-05-20
>>> 
>>> I have set noscrub and nodeep-scrub, as well as noout and nodown off and
>>> on while I performed various maintenance, but that hasn't (apparently)
>>> impeded the regular schedule.
>>> 
>>> With what frequency are you setting the nodeep-scrub flag?
>>> 
>>> -Aaron
>>> 
>>> 
>>> On Tue, May 20, 2014 at 5:21 PM, Mike Dawson >> > wrote:
>>> 
>>>Today I noticed that deep-scrub is consistently missing some of my
>>>Placement Groups, leaving me with the following distribution of PGs
>>>and the last day they were successfully deep-scrubbed.
>>> 
>>># ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 |
>>>uniq -c
>>>   5 2013-11-06
>>> 221 2013-11-20
>>>   1 2014-02-17
>>>  25 2014-02-19
>>>  60 2014-02-20
>>>   4 2014-03-06
>>>   3 2014-04-03
>>>   6 2014-04-04
>>>   6 2014-04-05
>>>  13 2014-04-06
>>>   4 2014-04-08
>>>   3 2014-04-10
>>>   2 2014-04-11
>>>  50 2014-04-12
>>>  28 2014-04-13
>>>  14 2014-04-14
>>>   3 2014-04-15
>>>  78 2014-04-16
>>>  44 2014-04-17
>>>   8 2014-04-18
>>>   1 2014-04-20
>>>  16 2014-05-02
>>>  69 2014-05-04
>>> 140 2014-05-05
>>> 569 2014-05-06
>>>9231 2014-05-07
>>> 103 2014-05-08
>>> 514 2014-05-09
>>>1593 2014-05-10
>>> 393 2014-05-16
>>>2563 2014-05-17
>>>1283 2014-05-18
>>>1640 2014-05-19
>>>1979 2014-05-20
>>> 
>>>I have been running the default "osd deep scrub interval" of once
>>>per week, but have disabled deep-scrub on several occasions in an
>>>attempt to avoid the associated degraded cluster performance I have
>>>written about before.
>>> 
>>>To get the PGs longest in need of a deep-scrub started, I set the
>>>nodeep-scrub flag, and wrote a script to manually kick off
>>>deep-scrub according to age. It is processing as expected.
>>> 
>>>Do you consider this a feature request or a bug? Perhaps the code
>>>that schedules PGs to deep-scrub could be improved to prioritize PGs
>>>that have needed a deep-scrub the longest.
>>> 
>>>Thanks,
>>>Mike Dawson
>>>_
>>>ceph-users mailing list
>>>ceph-users@lists.ceph.com 
>>>http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>>
>>> 
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-11 Thread Dan Van Der Ster
On 10 Jun 2014, at 11:59, Dan Van Der Ster  wrote:

> One idea I had was to check the behaviour under different disk io schedulers, 
> trying exploit thread io priorities with cfq. So I have a question for the 
> developers about using ionice or ioprio_set to lower the IO priorities of the 
> threads responsible for scrubbing: 
>   - Are there dedicated threads always used for scrubbing only, and never for 
> client IOs? If so, can an admin identify the thread IDs so he can ionice 
> those? 
>   - If OTOH a disk/op thread is switching between scrubbing and client IO 
> responsibilities, could Ceph use ioprio_set to change the io priorities on 
> the fly??

I just submitted a feature request for this:  
http://tracker.ceph.com/issues/8580

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] failed when activate the OSD

2014-06-11 Thread jiangdahui
after I created 1 mon and prepared  2 osds,I checked and found that the fsid of 
the three are same,but when I input *ceph-deploy osd activate 
node2:/var/local/osd0 node3:/var/local/osd1*, the error output were as follows:
node2][WARNIN] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid 
3e68a2b5-cbf3-4149-9462-b89e2a40236e


It was strange that the fsid in the output is different with that of the three 
node,and if I modified the three nodes' , another error happend as 
"[node2][WARNIN] 2014-06-11 01:39:17.738451 b63cfb40  0 librados: 
client.bootstrap-osd authentication error (1) Operation not permitted"


what should I do?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it still unsafe to map a RBD device on an OSD server?

2014-06-11 Thread Mikaël Cluseau

On 06/11/2014 08:20 AM, Sebastien Han wrote:

Thanks for your answers


u I have that for an apt-cache since more than 1 year now, never had 
an issue. Of course, your question is not about having a krbd device 
backing an OSD of the same cluster ;-)
<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitor down

2014-06-11 Thread yalla.gnan.kumar
I have a four node ceph storage cluster. Ceph -s  is showing one monitor as 
down . How to start it and in which server do I have to start it ?


---
root@cephadmin:/home/oss# ceph -w
cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674
 health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2
 monmap e3: 3 mons at 
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
 election epoch 844, quorum 0,1 cephnode1,cephnode2
 mdsmap e225: 1/1/1 up {0=cephnode1=up:active}
 osdmap e297: 3 osds: 3 up, 3 in
  pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects
21881 MB used, 51663 MB / 77501 MB avail
 448 active+clean
--

Thanks
Kumar



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is CRUSH used on reading ?

2014-06-11 Thread Wido den Hollander

On 06/11/2014 12:51 PM, Florent B wrote:

Hi,

I would like to know if Ceph uses CRUSH algorithm when a read operation
occurs, for example to select the nearest OSD storing the asked object.


CRUSH is used when reading since it's THE algorithm inside Ceph to 
determine data placement.


CRUSH doesn't support reading the nearest object, it will always read 
from the primary OSD for a PG, but you can influence the primary affinity.




Thank you :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor down

2014-06-11 Thread Wido den Hollander

On 06/11/2014 01:23 PM, yalla.gnan.ku...@accenture.com wrote:

I have a four node ceph storage cluster. Ceph –s  is showing one monitor
as down . How to start it and in which server do I have to start it ?



It's cephnode3 which is down. Log in and do:

$ start ceph-mon-all


---

root@cephadmin:/home/oss# ceph -w

 cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674

  health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2

  monmap e3: 3 mons at
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
election epoch 844, quorum 0,1 cephnode1,cephnode2

  mdsmap e225: 1/1/1 up {0=cephnode1=up:active}

  osdmap e297: 3 osds: 3 up, 3 in

   pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects

 21881 MB used, 51663 MB / 77501 MB avail

  448 active+clean

--

Thanks

Kumar




This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and
its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of information
security and assessment of internal compliance with Accenture policy.
__

www.accenture.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to remove mds

2014-06-11 Thread yalla.gnan.kumar
Hi All,

I have a four node ceph cluster. The metadata service is showing as degraded in 
health. How to remove the mds service from ceph ?


=-
root@cephadmin:/home/oss# ceph -s
cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674
 health HEALTH_WARN mds cluster is degraded
 monmap e3: 3 mons at 
{cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0},
 election epoch 874, quorum 0,1,2 cephnode1,cephnode2,cephnode3
 mdsmap e227: 1/1/1 up {0=cephnode1=up:replay}
 osdmap e299: 3 osds: 3 up, 3 in
  pgmap v214988: 448 pgs, 5 pools, 9495 bytes data, 30 objects
22693 MB used, 50851 MB / 77501 MB avail
 448 active+clean
--

Thanks
Kumar





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-11 Thread Davide Fanciola
Hi,

we have a similar setup where we have SSD and HDD in the same hosts.
Our very basic crushmap is configured as follows:

# ceph osd tree
# id weight type name up/down reweight
-6 3 root ssd
3 1 osd.3 up 1
4 1 osd.4 up 1
5 1 osd.5 up 1
-5 3 root platters
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1
-1 3 root default
-2 1 host chgva-srv-stor-001
0 1 osd.0 up 1
3 1 osd.3 up 1
-3 1 host chgva-srv-stor-002
1 1 osd.1 up 1
4 1 osd.4 up 1
-4 1 host chgva-srv-stor-003
2 1 osd.2 up 1
5 1 osd.5 up 1


We do not seem to have problems with this setup, but i'm not sure if it's a
good practice to have elements appearing multiple times in different
branches.
On the other hand, I see no way to follow the physical hierarchy of a
datacenter for pools, since a pool can be spread among
servers/racks/rooms...

Can someone confirm this crushmap is any good for our configuration?

Thanks is advance.

BR
Davide



On Mon, Mar 3, 2014 at 12:48 PM, Wido den Hollander  wrote:

> On 03/03/2014 12:45 PM, Vikrant Verma wrote:
>
>> Hi All,
>>
>> Is it possible to map OSDs from different hosts (servers) to a Pool in
>> ceph cluster?
>>
>> In Crush Map we can add a bucket mentioning the host details (hostname
>> and its weight).
>>
>> Is it possible to configure a bucket  which contains OSDs from different
>> hosts?
>>
>>
> I think it's possible.
>
> But you can always try it and afterwards run crushtool with tests:
>
> $ crushtool -i mycrushmap --test --rule 0 --num-rep 3 --show-statistics
>
> That will run some tests on your compiled crushmap
>
>
>> if possible please let me know how to configure it.
>>
>> Regards,
>> Vikrant
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I have PGs that I can't deep-scrub

2014-06-11 Thread Sage Weil
Hi Craig,

It's hard to say what is going wrong with that level of logs.  Can you 
reproduce with debug ms = 1 and debug osd = 20?

There were a few things fixed in scrub between emperor and firefly.  Are 
you planning on upgrading soon?

sage


On Tue, 10 Jun 2014, Craig Lewis wrote:

> Every time I deep-scrub one PG, all of the OSDs responsible get kicked
> out of the cluster.  I've deep-scrubbed this PG 4 times now, and it
> fails the same way every time.  OSD logs are linked at the bottom.
> 
> What can I do to get this deep-scrub to complete cleanly?
> 
> This is the first time I've deep-scrubbed these PGs since Sage helped
> me recover from some OSD problems
> (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html)
> 
> I can trigger the issue easily in this cluster, but have not been able
> to re-create in other clusters.
> 
> 
> 
> 
> 
> 
> The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp
> are 48009'1904117 2014-05-21 07:28:01.315996 respectively.  This PG is
> owned by OSDs [11,0]
> 
> This is a secondary cluster, so I stopped all external I/O on it.  I
> set nodeep-scrub, and restarted both OSDs with:
>   debug osd = 5/5
>   debug filestore = 5/5
>   debug journal = 1
>   debug monc = 20/20
> 
> then I ran a deep-scrub on this PG.
> 
> 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555
> active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used,
> 77870 GB / 130 TB avail
> 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554
> active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail
> 
> 
> At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU
> (100.3% +/- 1.0%).  Prior to this, they were both using ~30% CPU.  It
> might've started a few seconds sooner, I'm watching top.
> 
> I forgot to watch IO stat until 10:56.  At this point, both OSDs are
> reading.  iostat reports that they're both doing ~100
> transactions/sec, reading ~1 MiBps, 0 writes.
> 
> 
> At 11:01:26, iostat reports that both osds are no longer consuming any
> disk I/O.  They both go for > 30 seconds with 0 transactions, and 0
> kiB read/write.  There are small bumps of 2 transactions/sec for one
> second, then it's back to 0.
> 
> 
> At 11:02:41, the primary OSD gets kicked out by the monitors:
> 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555
> active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2
> op/s
> 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg
> stats for 903.825187seconds
> 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in
> 
> Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range).
> 
> 
> At ~11:10, I see that osd.11 has resumed reading from disk at the
> original levels (~100 tps, ~1MiBps read, 0 MiBps write).  Since it's
> down, but doing something, I let it run.
> 
> Both the osd.11 and osd.0 repeat this pattern.  Reading for a while at
> ~1 MiBps, then nothing.  The duty cycle seems about 50%, with a 20
> minute period, but I haven't timed anything.  CPU usage remains at
> 100%, regardless of whether IO is happening or not.
> 
> 
> At 12:24:15, osd.11 rejoins the cluster:
> 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot
> 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in
> 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1
> stale+active+clean+scrubbing+deep, 2266 active+clean, 5
> stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing;
> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd,
> 18 op/s; 3617854/61758142 objects degraded (5.858%)
> 
> 
> osd.0's CPU usage drops back to normal when osd.11 rejoins the
> cluster.  The PG stats have not changed.   The last_deep_scrub and
> deep_scrub_stamp are still 48009'1904117 2014-05-21 07:28:01.315996
> respectively.
> 
> 
> This time, osd.0 did not get kicked out by the monitors.  In previous
> attempts, osd.0 was kicked out 5-10 minutes after osd.11.  When that
> happens, osd.0 rejoins the cluster after osd.11.
> 
> 
> I have several more PGs exhibiting the same behavior.  At least 3 that
> I know of, and many more that I haven't attempted to deep-scrub.
> 
> 
> 
> 
> 
> 
> ceph -v: ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
> ceph.conf: https://cd.centraldesktop.com/p/eAAADvxuAHJRUk4
> ceph-osd.11.log (5.7 MiB):
> https://cd.centraldesktop.com/p/eAAADvxyABPwaeM
> ceph-osd.0.log (6.3 MiB):
> https://cd.centraldesktop.com/p/eAAADvx0ADWEGng
> ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADvxvAAylTW0
> 
> (the pg query was collected at 13:24, after the above events)
> 
> 
> 
> 
> Things that probably don't matter:
> The OSD partitio

[ceph-users] ceph-deploy - problem creating an osd

2014-06-11 Thread Markus Goldberg

Hi,
ceph-deploy-1.5.3 can make trouble, if a reboot is done between 
preparation and aktivation of an osd:


The osd-disk was /dev/sdb at this time, osd itself should go to sdb1, 
formatted to cleared, journal should go to sdb2, formatted to btrfs

I prepared an osd:

root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs 
prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v 
--overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
bd-1:/dev/sdb1:/dev/sdb2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
[bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal 
/dev/sdb2 activate False
[bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs 
--cluster ceph -- /dev/sdb1 /dev/sdb2

[bd-1][DEBUG ]
[bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
[bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
[bd-1][DEBUG ]
[bd-1][DEBUG ] fs created label (null) on /dev/sdb1
[bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
[bd-1][DEBUG ] Btrfs v3.12
[bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data
[bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink 
limit per file to 65536
[bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but 
we have been unable to inform the kernel of the change, probably because 
it/they are in use.  As a result, the old partition(s) will remain in 
use.  You should reboot now before making further changes.

[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
Unhandled exception in thread started by
sys.excepthook is missing
lost sys.stderr

ceph-deploy told me to do a reboot, so i did.
After the reboot the osd-disk changed from sdb to sda. This is a known 
problem of linux (ubuntu)


root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd 
activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
bd-1:/dev/sda1:/dev/sda2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart 
--mount /dev/sda1

[bd-1][WARNIN] got monmap epoch 1
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check: 
ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected 
fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal

[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1 
filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object 
store /var/lib/ceph/tmp/mnt.LryOxo journal 
/var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid 
08066b4a-3f36-4e3f-bd1e-15c006a09057
[bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: error 
reading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open 
/var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created new 
key in keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring

[bd-1][WARNIN] added key for osd.4
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph -s
cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
 health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; 
recovery 19/60 objects degraded (31.667%); clock skew detected on mon.bd-1
 monmap e1: 3 mons at 
{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.x

[ceph-users] pid_max value?

2014-06-11 Thread Cao, Buddy
Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into "create thread fail" problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pid_max value?

2014-06-11 Thread Maciej Bonin
Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535
net.core.netdev_max_backlog = 3
net.core.somaxconn = 16384
net.ipv4.tcp_max_syn_backlog = 252144
net.ipv4.tcp_max_tw_buckets = 36
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem = 8388608 8388608 8388608
net.ipv4.route.flush = 1
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements
ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 
500 EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into "create thread fail" problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy - problem creating an osd

2014-06-11 Thread Alfredo Deza
On Wed, Jun 11, 2014 at 9:29 AM, Markus Goldberg
 wrote:
> Hi,
> ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation
> and aktivation of an osd:
>
> The osd-disk was /dev/sdb at this time, osd itself should go to sdb1,
> formatted to cleared, journal should go to sdb2, formatted to btrfs
> I prepared an osd:
>
> root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs
> prepare bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v
> --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
> bd-1:/dev/sdb1:/dev/sdb2
> [bd-1][DEBUG ] connected to host: bd-1
> [bd-1][DEBUG ] detect platform information from remote host
> [bd-1][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
> [bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block
> --action=add
> [ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal
> /dev/sdb2 activate False
> [bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs --cluster
> ceph -- /dev/sdb1 /dev/sdb2
> [bd-1][DEBUG ]
> [bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
> [bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
> [bd-1][DEBUG ]
> [bd-1][DEBUG ] fs created label (null) on /dev/sdb1
> [bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
> [bd-1][DEBUG ] Btrfs v3.12
> [bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is
> not the same device as the osd data
> [bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink
> limit per file to 65536
> [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
> have been unable to inform the kernel of the change, probably because
> it/they are in use.  As a result, the old partition(s) will remain in use.
> You should reboot now before making further changes.
> [bd-1][INFO  ] checking OSD status...
> [bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
> [ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
> Unhandled exception in thread started by
> sys.excepthook is missing
> lost sys.stderr
>
> ceph-deploy told me to do a reboot, so i did.

This is actually not ceph-deploy asking you for a reboot but the
stderr captured from the
remote node (bd-1 in your case).

ceph-deploy will log output from remote nodes and will preface the
logs with the hostname when
the output happens remotely. stderr will be used as WARNING level and
stdout as DEBUG.

So in your case this line is output from ceph-disk-prepare/btrfs:

> [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
> have been unable to inform the kernel of the change, probably because
> it/they are in use.  As a result, the old partition(s) will remain in use.
> You should reboot now before making further changes.

Have you tried 'create' instead of 'prepare' and 'activate' ?

> After the reboot the osd-disk changed from sdb to sda. This is a known
> problem of linux (ubuntu)
>
> root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
> activate bd-1:/dev/sda1:/dev/sda2
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> bd-1:/dev/sda1:/dev/sda2
> [bd-1][DEBUG ] connected to host: bd-1
> [bd-1][DEBUG ] detect platform information from remote host
> [bd-1][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
> [ceph_deploy.osd][DEBUG ] will use init type: upstart
> [bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
> --mount /dev/sda1
> [bd-1][WARNIN] got monmap epoch 1
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check:
> ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected
> fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1
> filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> [bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object
> store /var/lib/ceph/tmp/mnt.LryOxo journal
> /var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fs

Re: [ceph-users] pid_max value?

2014-06-11 Thread Cao, Buddy
Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each storage 
node?  Do you think "kernel.pid_max = 4194303" is reasonable since it increase 
a lot from the default OS setting.


Wei Cao (Buddy)

-Original Message-
From: Maciej Bonin [mailto:maciej.bo...@m247.com] 
Sent: Wednesday, June 11, 2014 10:07 PM
To: Cao, Buddy; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 
net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 
net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 
net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 
net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 
8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 
net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 
net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements ISO 
27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 
EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into "create thread fail" problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash dump ?

2014-06-11 Thread Gregory Farnum
On Wednesday, June 11, 2014, Florent B  wrote:

> Hi every one,
>
> Sometimes my MDS crashes... sometimes after a few hours, sometimes after
> a few days.
>
> I know I could enable debugging and so on to get more information. But
> if it crashes after a few days, it generates gigabytes of debugging data
> that are not related to the crash.
>
> Is it possible to get just a crash dump when MDS is crashing, to see
> what's wrong ?


You should be getting a backtrace regardless of what debugging levels are
enabled, so I assume you mean having it dump out prior log lines when that
happens. And indeed you can.
Normally you specify something like
debug mds =10
And that dumps out the log. You can instead specify two values, separated
by a slash, and the daemon will take the time to generate all the log lines
at the second value but only dump to disk the first value:
debug mds = 0/10
That will put nothing in the log, but will generate debug output level 10
in a memory ring buffer (1 entries), and dump it on a crash. You can do
this with any debug setting.
-Greg



>


> Thank you.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] umount gets stuck when umounting a cloned rbd image

2014-06-11 Thread Alphe Salas
Hello I address you with this issue i noticed it with ceph 072.2 and 
linux ubuntu 13.10  and with 0.80.1 with ubuntu 14.04.

here is what i do:
1) I create and format to ext4 or xfs a rbd image of 4 TB . the image 
has --order 25 and --image-format 2

2) I create a snapshot of that rbd image
3) I protect that snapshot
4) I create a clone image of that inicial rbd image using the protected 
snapshot as reference.
5) I insert the line in /etc/ceph/rbdmap I map the new image. I mount 
the new image to my ceph client server.


Until here all is fine cool and dandy

6) I umount the /dev/rbd1 which is the previous mounted rbd clone image. 
and umount is stuck


in the client server with the umount stuck i have this message in the 
/var/log/syslog


Jun 11 12:26:10 tesla kernel: [63365.178657] libceph: osd8 
20.10.10.105:6803 socket error on read


as it seems the problem is somehow related to osd8 on my 20.10.10.105 
ceph node then i go there to get more information from log


in the /var/log/ceph-osd.8.log there is this message comming in endlessly

2014-06-11 12:31:51.692031 7fa26085c700  0 -- 20.10.10.105:6805/23321 >> 
20.10.10.12:0/2563935849 pipe(0x9dd6780 sd=231 :6805 s=0 pgs=0 cs=0 l=0 
c=0x7ed6840).accept peer addr is really 20.10.10.12:0/2563935849 (socket 
is 20.10.10.12:33056/0)




Can anyone help me solve this issue ?

--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to remove mds

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 4:56 AM,   wrote:
> Hi All,
>
>
>
> I have a four node ceph cluster. The metadata service is showing as degraded
> in health. How to remove the mds service from ceph ?

Unfortunately you can't remove it entirely right now, but if you
create a new filesystem using the "newfs" command, and don't turn on
an MDS daemon after that, it won't report a health error.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola  wrote:
> Hi,
>
> we have a similar setup where we have SSD and HDD in the same hosts.
> Our very basic crushmap is configured as follows:
>
> # ceph osd tree
> # id weight type name up/down reweight
> -6 3 root ssd
> 3 1 osd.3 up 1
> 4 1 osd.4 up 1
> 5 1 osd.5 up 1
> -5 3 root platters
> 0 1 osd.0 up 1
> 1 1 osd.1 up 1
> 2 1 osd.2 up 1
> -1 3 root default
> -2 1 host chgva-srv-stor-001
> 0 1 osd.0 up 1
> 3 1 osd.3 up 1
> -3 1 host chgva-srv-stor-002
> 1 1 osd.1 up 1
> 4 1 osd.4 up 1
> -4 1 host chgva-srv-stor-003
> 2 1 osd.2 up 1
> 5 1 osd.5 up 1
>
>
> We do not seem to have problems with this setup, but i'm not sure if it's a
> good practice to have elements appearing multiple times in different
> branches.
> On the other hand, I see no way to follow the physical hierarchy of a
> datacenter for pools, since a pool can be spread among
> servers/racks/rooms...
>
> Can someone confirm this crushmap is any good for our configuration?

If you accidentally use the "default" node anywhere, you'll get data
scattered across both classes of device. If you try and use both the
"platters" and "ssd" nodes within a single CRUSH rule, you might end
up with copies of data on the same host (reducing your data
resiliency). Otherwise this is just fine.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pid_max value?

2014-06-11 Thread Maciej Bonin
We have not experienced any downsides to this approach performance or 
stability-wise, if you prefer you can experiment with the values, but I see no 
real advantage in doing so.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements
ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 
500 EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 


-Original Message-
From: Cao, Buddy [mailto:buddy@intel.com] 
Sent: 11 June 2014 17:00
To: Maciej Bonin; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each storage 
node?  Do you think "kernel.pid_max = 4194303" is reasonable since it increase 
a lot from the default OS setting.


Wei Cao (Buddy)

-Original Message-
From: Maciej Bonin [mailto:maciej.bo...@m247.com] 
Sent: Wednesday, June 11, 2014 10:07 PM
To: Cao, Buddy; ceph-users@lists.ceph.com
Subject: RE: pid_max value?

Hello,

The values we use are as follows:
# sysctl -p
net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 
net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 
net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 
net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 
net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 
8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 
net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 
net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 
kernel.pid_max = 4194303

The timeouts don't really make sense without tw reuse/recycling but we found 
increasing the max and letting the old ones hang gives better performance.
Somaxconn was the most important value we had to increase as with 3 mons, 3 
storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into 
major problems with servers dying left and right.
Most of those values are lifted from some openstack python script IIRC, please 
let us know if you find a more efficient/stable configuration, however we're 
quite happy with this one.

Regards,
Maciej Bonin
Systems Engineer | M247 Limited
M247.com  Connected with our Customers
Contact us today to discuss your hosting and connectivity requirements ISO 
27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 
EMEA | Sunday Times Tech Track 100
M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra Court, 
Manchester, M32 0QT
 
ISO 27001 Data Protection Classification: A - Public
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, 
Buddy
Sent: 11 June 2014 15:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] pid_max value?

Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough 
for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already 
run into "create thread fail" problem in osd log which root cause at pid_max.


Wei Cao (Buddy)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Moving Ceph cluster to different network segment

2014-06-11 Thread Fred Yang
We need to move Ceph cluster to different network segment for
interconnectivity between mon and osc, anybody has the procedure regarding
how that can be done? Note that the host name reference will be changed, so
originally the osd host referenced as cephnode1, in the new segment it will
be cephnode1-n.

Thanks,
Fred

Sent from my Samsung Galaxy S3
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Alexandre DERUMIER
Hi,

I'm reading tiering doc here
http://ceph.com/docs/firefly/dev/cache-pool/

"
The hit_set_count and hit_set_period define how much time each HitSet should 
cover, and how many such HitSets to store. Binning accesses over time allows 
Ceph to independently determine whether an object was accessed at least once 
and whether it was accessed more than once over some time period (“age” vs 
“temperature”). Note that the longer the period and the higher the count the 
more RAM will be consumed by the ceph-osd process. In particular, when the 
agent is active to flush or evict cache objects, all hit_set_count HitSets are 
loaded into RAM"

about how much memory do we talk here ? any formula ? (nr object x ? )

I'm looking for hit_set_period like 12h or 24h


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER
 wrote:
> Hi,
>
> I'm reading tiering doc here
> http://ceph.com/docs/firefly/dev/cache-pool/
>
> "
> The hit_set_count and hit_set_period define how much time each HitSet should 
> cover, and how many such HitSets to store. Binning accesses over time allows 
> Ceph to independently determine whether an object was accessed at least once 
> and whether it was accessed more than once over some time period (“age” vs 
> “temperature”). Note that the longer the period and the higher the count the 
> more RAM will be consumed by the ceph-osd process. In particular, when the 
> agent is active to flush or evict cache objects, all hit_set_count HitSets 
> are loaded into RAM"
>
> about how much memory do we talk here ? any formula ? (nr object x ? )

We haven't really quantified that yet. In particular, it's going to
depend on how many objects are accessed within a period; the OSD sizes
them based on the previous access count and the false positive
probability that you give it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH_WARN pool has too few pgs

2014-06-11 Thread Eric Eastman

Hi,

I am seeing the following warning on one of my test clusters:

# ceph health detail
HEALTH_WARN pool Ray has too few pgs
pool Ray objects per pg (24) is more than 12 times cluster average (2)

This is a reported issue and is set to "Won't Fix" at:
http://tracker.ceph.com/issues/8103

My test cluster has a mix of test data, and the pool showing the 
warning is used for RBD Images.



# ceph df detail
GLOBAL:
   SIZE  AVAIL RAW USED %RAW USED OBJECTS
   1009G 513G  496G 49.14 33396
POOLS:
NAME   ID CATEGORY USED   %USED 
OBJECTS DIRTY READ   WRITE
data   0  -0  0 0   
   0 0  0
metadata   1  -0  0 0   
   0 0  0
rbd2  -0  0 0   
   0 0  0
iscsi  3  -847M   0.08  241 
   211   11839k 10655k
cinder 4  -305M   0.03  53  
   2 51579  31584
glance 5  -65653M 6.35  
82227 512k   10405
.users.swift   7  -0  0 0   
   0 0  4
.rgw.root  8  -1045   0 4   
   4 23 5
.rgw.control   9  -0  0 8   
   8 0  0
.rgw   10 -2520 2   
   2 3  11
.rgw.gc11 -0  0 32  
   324958   3328
.users.uid 12 -5750 3   
   3 70 23
.users 13 -9  0 1   
   1 0  9
.users.email   14 -0  0 0   
   0 0  0
.rgw.buckets   15 -0  0 0   
   0 0  0
.rgw.buckets.index 16 -0  0 1   
   1 1  1
Ray17 -99290M 9.61  
24829   24829 0  0



It would be nice if we could turn off this message.

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] For ceph firefly, which version kernel client should be used?

2014-06-11 Thread Liu Baogang
Dear Sir,

In our test, we use ceph firefly to build a cluster. On a node with kernel 
3.10.xx, if using kernel client to mount cephfs, when use ‘ls’ command, 
sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far it 
seems it work well.

I guess that the kernel 3.10.xx is too old, so the kernel client does not work 
well. If it is right, which version of kernel shall we use?

Thanks,
Baogang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph pgs stuck inactive since forever

2014-06-11 Thread Akhil.Labudubariki
I installed ceph and then I was ceph health it gives me the following output

HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs 
stuck unclean; 2 near full osd(s)

This is the output of a single pg when I use ceph health detail

pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; 
search ceph.com/docs for 'incomplete')

and similar line comes up for all the pgs.

This is the output of ceph - s

cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a
 health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs 
stuck unclean; 2 near full osd(s)
 monmap e1: 1 mons at {a=100.112.12.28:6789/0}, election epoch 2, quorum 0 a
 osdmap e5: 2 osds: 2 up, 2 in
  pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects
111 GB used, 8346 MB / 125 GB avail
 384 incomplete
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem installing ceph from package manager / ceph repositories

2014-06-11 Thread Dimitri Maziuk
On 06/09/2014 03:08 PM, Karan Singh wrote:

> 1. When installing Ceph using package manger and ceph repositores , the
> package manager i.e YUM does not respect the ceph.repo file and takes ceph
> package directly from EPEL .

Option 1: install yum-plugin-priorities, add priority = X to ceph.repo.
X should be less than EPEL's priority, the default is I believe 99.

Option 2: add exclude = ceph_package(s) to epel.repo.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN pool has too few pgs

2014-06-11 Thread Jean-Charles LOPEZ
Hi Eric,

increase the number of PGs in your pool with 
Step 1: ceph osd pool set  pg_num  
Step 2: ceph osd pool set  pgp_num  

You can check the number of PGs in your pool with ceph osd dump | grep ^pool

See documentation: http://ceph.com/docs/master/rados/operations/pools/

JC



On Jun 11, 2014, at 12:59, Eric Eastman  wrote:

> Hi,
> 
> I am seeing the following warning on one of my test clusters:
> 
> # ceph health detail
> HEALTH_WARN pool Ray has too few pgs
> pool Ray objects per pg (24) is more than 12 times cluster average (2)
> 
> This is a reported issue and is set to "Won't Fix" at:
> http://tracker.ceph.com/issues/8103
> 
> My test cluster has a mix of test data, and the pool showing the warning is 
> used for RBD Images.
> 
> 
> # ceph df detail
> GLOBAL:
>   SIZE  AVAIL RAW USED %RAW USED OBJECTS
>   1009G 513G  496G 49.14 33396
> POOLS:
>NAME   ID CATEGORY USED   %USED OBJECTS
>  DIRTY READ   WRITE
>data   0  -0  0 0  
> 0 0  0
>metadata   1  -0  0 0  
> 0 0  0
>rbd2  -0  0 0  
> 0 0  0
>iscsi  3  -847M   0.08  241
> 211   11839k 10655k
>cinder 4  -305M   0.03  53 
> 2 51579  31584
>glance 5  -65653M 6.35  8222   
>  7 512k   10405
>.users.swift   7  -0  0 0  
> 0 0  4
>.rgw.root  8  -1045   0 4  
> 4 23 5
>.rgw.control   9  -0  0 8  
> 8 0  0
>.rgw   10 -2520 2  
> 2 3  11
>.rgw.gc11 -0  0 32 
> 324958   3328
>.users.uid 12 -5750 3  
> 3 70 23
>.users 13 -9  0 1  
> 1 0  9
>.users.email   14 -0  0 0  
> 0 0  0
>.rgw.buckets   15 -0  0 0  
> 0 0  0
>.rgw.buckets.index 16 -0  0 1  
> 1 1  1
>Ray17 -99290M 9.61  24829  
>  24829 0  0
> 
> 
> It would be nice if we could turn off this message.
> 
> Eric
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem installing ceph from package manager / ceph repositories

2014-06-11 Thread Karan Singh
Hi Dimitri

It was already resolved , moderator took a long time to approve my email to get 
posted to mailing list.

Thanks for your solution .

- Karan -

On 12 Jun 2014, at 00:02, Dimitri Maziuk  wrote:

> On 06/09/2014 03:08 PM, Karan Singh wrote:
> 
>>1. When installing Ceph using package manger and ceph repositores , the
>>package manager i.e YUM does not respect the ceph.repo file and takes ceph
>>package directly from EPEL .
> 
> Option 1: install yum-plugin-priorities, add priority = X to ceph.repo.
> X should be less than EPEL's priority, the default is I believe 99.
> 
> Option 2: add exclude = ceph_package(s) to epel.repo.
> 
> -- 
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I have PGs that I can't deep-scrub

2014-06-11 Thread Craig Lewis
New logs, with debug ms = 1, debug osd = 20.


In this timeline, I started the deep-scrub at 11:04:00  Ceph start
deep-scrubing at 11:04:03.

osd.11 started consuming 100% CPU around 11:07.  Same for osd.0.  CPU
usage is all user; iowait is < 0.10%.  There is more variance in the
CPU usage now, ranging between 98.5% and 101.2%

This time, I didn't see any major IO, read or write.

osd.11 was marked down at 11:22:00:
2014-06-11 11:22:00.820118 mon.0 [INF] osd.11 marked down after no pg
stats for 902.656777seconds

osd.0 was marked down at 11:36:00:
 2014-06-11 11:36:00.890869 mon.0 [INF] osd.0 marked down after no pg
stats for 902.498894seconds




ceph.conf: https://cd.centraldesktop.com/p/eAAADwbcABIDZuE
ceph-osd.0.log.gz (140MiB, 18MiB compressed):
https://cd.centraldesktop.com/p/eAAADwbdAHnmhFQ
ceph-osd.11.log.gz (131MiB, 17MiB compressed):
https://cd.centraldesktop.com/p/eAAADwbeAEUR9AI
ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADwbfAEJcwvc





On Wed, Jun 11, 2014 at 5:42 AM, Sage Weil  wrote:
> Hi Craig,
>
> It's hard to say what is going wrong with that level of logs.  Can you
> reproduce with debug ms = 1 and debug osd = 20?
>
> There were a few things fixed in scrub between emperor and firefly.  Are
> you planning on upgrading soon?
>
> sage
>
>
> On Tue, 10 Jun 2014, Craig Lewis wrote:
>
>> Every time I deep-scrub one PG, all of the OSDs responsible get kicked
>> out of the cluster.  I've deep-scrubbed this PG 4 times now, and it
>> fails the same way every time.  OSD logs are linked at the bottom.
>>
>> What can I do to get this deep-scrub to complete cleanly?
>>
>> This is the first time I've deep-scrubbed these PGs since Sage helped
>> me recover from some OSD problems
>> (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html)
>>
>> I can trigger the issue easily in this cluster, but have not been able
>> to re-create in other clusters.
>>
>>
>>
>>
>>
>>
>> The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp
>> are 48009'1904117 2014-05-21 07:28:01.315996 respectively.  This PG is
>> owned by OSDs [11,0]
>>
>> This is a secondary cluster, so I stopped all external I/O on it.  I
>> set nodeep-scrub, and restarted both OSDs with:
>>   debug osd = 5/5
>>   debug filestore = 5/5
>>   debug journal = 1
>>   debug monc = 20/20
>>
>> then I ran a deep-scrub on this PG.
>>
>> 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555
>> active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used,
>> 77870 GB / 130 TB avail
>> 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554
>> active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
>> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail
>>
>>
>> At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU
>> (100.3% +/- 1.0%).  Prior to this, they were both using ~30% CPU.  It
>> might've started a few seconds sooner, I'm watching top.
>>
>> I forgot to watch IO stat until 10:56.  At this point, both OSDs are
>> reading.  iostat reports that they're both doing ~100
>> transactions/sec, reading ~1 MiBps, 0 writes.
>>
>>
>> At 11:01:26, iostat reports that both osds are no longer consuming any
>> disk I/O.  They both go for > 30 seconds with 0 transactions, and 0
>> kiB read/write.  There are small bumps of 2 transactions/sec for one
>> second, then it's back to 0.
>>
>>
>> At 11:02:41, the primary OSD gets kicked out by the monitors:
>> 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555
>> active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep;
>> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2
>> op/s
>> 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg
>> stats for 903.825187seconds
>> 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in
>>
>> Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range).
>>
>>
>> At ~11:10, I see that osd.11 has resumed reading from disk at the
>> original levels (~100 tps, ~1MiBps read, 0 MiBps write).  Since it's
>> down, but doing something, I let it run.
>>
>> Both the osd.11 and osd.0 repeat this pattern.  Reading for a while at
>> ~1 MiBps, then nothing.  The duty cycle seems about 50%, with a 20
>> minute period, but I haven't timed anything.  CPU usage remains at
>> 100%, regardless of whether IO is happening or not.
>>
>>
>> At 12:24:15, osd.11 rejoins the cluster:
>> 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot
>> 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in
>> 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1
>> stale+active+clean+scrubbing+deep, 2266 active+clean, 5
>> stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing;
>> 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd,
>> 18 op/s; 3617854/61758142 objects

Re: [ceph-users] Ceph pgs stuck inactive since forever

2014-06-11 Thread John Wilkins
I'll update the docs to incorporate the term "incomplete." I believe this
is due to an inability to complete backfilling. Your cluster is nearly
full. You indicated that you installed Ceph. Did you store data in the
cluster? Your usage indicates that you have used 111GB of 125GB. So you
only have about 8GB left. Did it ever get to an "active + clean" state?


On Wed, Jun 11, 2014 at 6:08 AM,  wrote:

>  I installed ceph and then I was ceph health it gives me the following
> output
>
>
>
> *HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive;
> 384 pgs stuck unclean; 2 near full osd(s)*
>
>
>
> This is the output of a single pg when I use ceph health detail
>
>
>
> *pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may
> help; search ceph.com/docs  for 'incomplete')*
>
>
>
> and similar line comes up for all the pgs.
>
>
>
> This is the output of ceph - s
>
>
>
> *cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a*
>
> * health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384
> pgs stuck unclean; 2 near full osd(s)*
>
> * monmap e1: 1 mons at {a=100.112.12.28:6789/0
> }, election epoch 2, quorum 0 a*
>
> * osdmap e5: 2 osds: 2 up, 2 in*
>
> *  pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects*
>
> *111 GB used, 8346 MB / 125 GB avail*
>
> * 384 incomplete*
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift API Authentication Failure

2014-06-11 Thread Yehuda Sadeh
(resending also to list)
Right. So Basically the swift subuser wasn't created correctly. I created
issue #8587. Can you try creating a second subuser, see if it's created
correctly the second time?


On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss 
wrote:

> Hmm Using that method, the subuser object appears to be an empty
> string.
>
> First, note that I skipped the "Create Pools" step:
> http://ceph.com/docs/master/radosgw/config/#create-pools
> because it says "If the user you created has permissions, the gateway will
> create the pools automatically."
>
> And indeed, the .users.swift pool is there:
>
> $ rados lspools
> data
> metadata
> rbd
> .rgw.root
> .rgw.control
> .rgw
> .rgw.gc
> .users.uid
> .users.email
> .users
> .users.swift
>
> But the only entry in that pool is an empty string.
>
> $ rados ls -p .users.swift
> 
>
> And that is indeed a blank line (as opposed to 0 lines), because there is
> 1 object in that pool:
> $ rados df
> pool name   category KB  objects   clones
> degraded  unfound   rdrd KB   wrwr KB
> ...
> .users.swift-  110
>0   00011
>
> For comparison, the 'df' line for the .users pool lists 2 objects, which
> are as follows:
>
> $ rados ls -p .users
> 4U5H60BMDL7OSI5ZBL8P
> F7HZCI4SL12KVVSJ9UVZ
>
> - David
>
>
> On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh  wrote:
>
>> Can you verify that the subuser object actually exist? Try doing:
>>
>> $ rados ls -p .users.swift
>>
>> (unless you have non default pools set)
>>
>> Yehuda
>>
>> On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss
>>  wrote:
>> > No good. In fact, for some reason when I tried to load up my cluster VMs
>> > today, I couldnt't get them to work (something to do with a pipe
>> fault), so
>> > I recreated my VMs nearly from scratch, to no avail.
>> >
>> > Here are the commands I used to create the user and subuser:
>> > radosgw-admin user create --uid=hive_cache --display-name="Hive Cache"
>> > --email=pds.supp...@ni.com
>> > radosgw-admin subuser create --uid=hive_cache --subuser=hive_cache:swift
>> > --access=full
>> > radosgw-admin key create --subuser=hive_cache:swift --key-type=swift
>> > --secret=QFAMEDSJP5DEKJO0DDXY
>> >
>> > - David
>> >
>> >
>> > On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh 
>> wrote:
>> >>
>> >> It seems that the subuser object was not created for some reason. Can
>> >> you try recreating it?
>> >>
>> >> Yehuda
>> >>
>> >> On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss
>> >>  wrote:
>> >> > Here's the log: http://pastebin.com/bRt9kw9C
>> >> >
>> >> > Thanks,
>> >> > David
>> >> >
>> >> >
>> >> > On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh 
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss
>> >> >>  wrote:
>> >> >> > Over the last two days, I set up ceph on a set of ubuntu 12.04 VMs
>> >> >> > (my
>> >> >> > first
>> >> >> > time working with ceph), and it seems to be working fine (I have
>> >> >> > HEALTH_OK,
>> >> >> > and can create a test document via the rados commandline tool),
>> but I
>> >> >> > can't
>> >> >> > authenticate with the swift API.
>> >> >> >
>> >> >> > I followed the quickstart guides to get ceph and radosgw
>> installed.
>> >> >> > (Listed
>> >> >> > here, if you want to check my work: http://pastebin.com/nfPWCn9P
>> )
>> >> >> >
>> >> >> > Visiting the root of the web server shows the
>> ListAllMyBucketsResult
>> >> >> > XML, as
>> >> >> > expected, but trying to authenticate always gives me "403
>> Forbidden"
>> >> >> > errors.
>> >> >> >
>> >> >> > Here's the output of "radosgw-admin user info --uid=hive_cache":
>> >> >> > http://pastebin.com/vwwbyd4c
>> >> >> > And here's my curl invocation: http://pastebin.com/EfQ8nw8a
>> >> >> >
>> >> >> > Any ideas on what might be wrong?
>> >> >> >
>> >> >>
>> >> >> Not sure. Can you try reproducing it with 'debug rgw = 20' and
>> 'debug
>> >> >> ms = 1' on rgw and provide the log?
>> >> >>
>> >> >> Thanks,
>> >> >> Yehuda
>> >> >
>> >> >
>> >
>> >
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Current OS & kernel recommendations

2014-06-11 Thread Blair Bethwaite
This http://ceph.com/docs/master/start/os-recommendations/ appears to be a
bit out of date only goes to Ceph 0.72). Presumably Ubuntu Trusty should
now be on that list in some form, e.g., for Firefly?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] For ceph firefly, which version kernel client should be used?

2014-06-11 Thread Yan, Zheng
On Mon, Jun 9, 2014 at 3:49 PM, Liu Baogang  wrote:
> Dear Sir,
>
> In our test, we use ceph firefly to build a cluster. On a node with kernel
> 3.10.xx, if using kernel client to mount cephfs, when use 'ls' command,
> sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far
> it seems it work well.
>
> I guess that the kernel 3.10.xx is too old, so the kernel client does not
> work well. If it is right, which version of kernel shall we use?

3.14

>
> Thanks,
> Baogang
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Selection Criteria for Deep-Scrub

2014-06-11 Thread David Zafman

The code checks the pg with the oldest scrub_stamp/deep_scrub_stamp to see 
whether the osd_scrub_min_interval/osd_deep_scrub_interval time has elapsed.  
So the output you are showing with the very old scrub stamps shouldn’t happen 
under default settings.  As soon set deep-scrub is re-enabled, the 5 pgs with 
that old stamp should be the first to get run.

A PG needs to have active and clean set to be scrubbed.   If any weren’t 
active+clean, then even a manual scrub would do nothing.

Now that I’m looking at the code I see that your symptom is possible if the 
values of osd_scrub_min_interval or osd_scrub_max_interval are larger than your 
osd_deep_scrub_interval.  Should the osd_scrub_min_interval be greater than 
osd_deep_scrub_interval, there won't be a deep scrub until the 
osd_scrub_min_interval has elapsed.  If an OSD is under load and the 
osd_scrub_max_interval is greater than the osd_deep_scrub_interval, there won't 
be a deep scrub until osd_scrub_max_interval has elapsed.

Please check the 3 interval config values.  Verify that your PGs are 
active+clean just to be sure.

David


On May 20, 2014, at 5:21 PM, Mike Dawson  wrote:

> Today I noticed that deep-scrub is consistently missing some of my Placement 
> Groups, leaving me with the following distribution of PGs and the last day 
> they were successfully deep-scrubbed.
> 
> # ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c
>  5 2013-11-06
>221 2013-11-20
>  1 2014-02-17
> 25 2014-02-19
> 60 2014-02-20
>  4 2014-03-06
>  3 2014-04-03
>  6 2014-04-04
>  6 2014-04-05
> 13 2014-04-06
>  4 2014-04-08
>  3 2014-04-10
>  2 2014-04-11
> 50 2014-04-12
> 28 2014-04-13
> 14 2014-04-14
>  3 2014-04-15
> 78 2014-04-16
> 44 2014-04-17
>  8 2014-04-18
>  1 2014-04-20
> 16 2014-05-02
> 69 2014-05-04
>140 2014-05-05
>569 2014-05-06
>   9231 2014-05-07
>103 2014-05-08
>514 2014-05-09
>   1593 2014-05-10
>393 2014-05-16
>   2563 2014-05-17
>   1283 2014-05-18
>   1640 2014-05-19
>   1979 2014-05-20
> 
> I have been running the default "osd deep scrub interval" of once per week, 
> but have disabled deep-scrub on several occasions in an attempt to avoid the 
> associated degraded cluster performance I have written about before.
> 
> To get the PGs longest in need of a deep-scrub started, I set the 
> nodeep-scrub flag, and wrote a script to manually kick off deep-scrub 
> according to age. It is processing as expected.
> 
> Do you consider this a feature request or a bug? Perhaps the code that 
> schedules PGs to deep-scrub could be improved to prioritize PGs that have 
> needed a deep-scrub the longest.
> 
> Thanks,
> Mike Dawson
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift API Authentication Failure

2014-06-11 Thread David Curtiss
Success! You nailed it. Thanks, Yehuda.

I can successfully use the second subuser.

Given this success, I also tried the following:

$ rados -p .users.swift get '' tmp
$ rados -p .users.swift put hive_cache:swift tmp
$ rados -p .users.swift rm ''
$ rados -p .users.swift ls
hive_cache:swift2
hive_cache:swift

So everything looked good, as far as I can tell, but I still can't
authenticate with the first subuser. (But at least the second one still
works.)

- David


On Wed, Jun 11, 2014 at 5:38 PM, Yehuda Sadeh  wrote:

>  (resending also to list)
> Right. So Basically the swift subuser wasn't created correctly. I created
> issue #8587. Can you try creating a second subuser, see if it's created
> correctly the second time?
>
>
> On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss  > wrote:
>
>> Hmm Using that method, the subuser object appears to be an empty
>> string.
>>
>> First, note that I skipped the "Create Pools" step:
>> http://ceph.com/docs/master/radosgw/config/#create-pools
>> because it says "If the user you created has permissions, the gateway
>> will create the pools automatically."
>>
>> And indeed, the .users.swift pool is there:
>>
>> $ rados lspools
>> data
>> metadata
>> rbd
>> .rgw.root
>> .rgw.control
>> .rgw
>> .rgw.gc
>> .users.uid
>> .users.email
>> .users
>> .users.swift
>>
>> But the only entry in that pool is an empty string.
>>
>> $ rados ls -p .users.swift
>> 
>>
>> And that is indeed a blank line (as opposed to 0 lines), because there is
>> 1 object in that pool:
>> $ rados df
>> pool name   category KB  objects   clones
>> degraded  unfound   rdrd KB   wrwr KB
>> ...
>> .users.swift-  110
>>  0   00011
>>
>> For comparison, the 'df' line for the .users pool lists 2 objects, which
>> are as follows:
>>
>> $ rados ls -p .users
>> 4U5H60BMDL7OSI5ZBL8P
>> F7HZCI4SL12KVVSJ9UVZ
>>
>> - David
>>
>>
>> On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh 
>> wrote:
>>
>>> Can you verify that the subuser object actually exist? Try doing:
>>>
>>> $ rados ls -p .users.swift
>>>
>>> (unless you have non default pools set)
>>>
>>> Yehuda
>>>
>>> On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss
>>>  wrote:
>>> > No good. In fact, for some reason when I tried to load up my cluster
>>> VMs
>>> > today, I couldnt't get them to work (something to do with a pipe
>>> fault), so
>>> > I recreated my VMs nearly from scratch, to no avail.
>>> >
>>> > Here are the commands I used to create the user and subuser:
>>> > radosgw-admin user create --uid=hive_cache --display-name="Hive Cache"
>>> > --email=pds.supp...@ni.com
>>> > radosgw-admin subuser create --uid=hive_cache
>>> --subuser=hive_cache:swift
>>> > --access=full
>>> > radosgw-admin key create --subuser=hive_cache:swift --key-type=swift
>>> > --secret=QFAMEDSJP5DEKJO0DDXY
>>> >
>>> > - David
>>> >
>>> >
>>> > On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh 
>>> wrote:
>>> >>
>>> >> It seems that the subuser object was not created for some reason. Can
>>> >> you try recreating it?
>>> >>
>>> >> Yehuda
>>> >>
>>> >> On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss
>>> >>  wrote:
>>> >> > Here's the log: http://pastebin.com/bRt9kw9C
>>> >> >
>>> >> > Thanks,
>>> >> > David
>>> >> >
>>> >> >
>>> >> > On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh 
>>> >> > wrote:
>>> >> >>
>>> >> >> On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss
>>> >> >>  wrote:
>>> >> >> > Over the last two days, I set up ceph on a set of ubuntu 12.04
>>> VMs
>>> >> >> > (my
>>> >> >> > first
>>> >> >> > time working with ceph), and it seems to be working fine (I have
>>> >> >> > HEALTH_OK,
>>> >> >> > and can create a test document via the rados commandline tool),
>>> but I
>>> >> >> > can't
>>> >> >> > authenticate with the swift API.
>>> >> >> >
>>> >> >> > I followed the quickstart guides to get ceph and radosgw
>>> installed.
>>> >> >> > (Listed
>>> >> >> > here, if you want to check my work: http://pastebin.com/nfPWCn9P
>>> )
>>> >> >> >
>>> >> >> > Visiting the root of the web server shows the
>>> ListAllMyBucketsResult
>>> >> >> > XML, as
>>> >> >> > expected, but trying to authenticate always gives me "403
>>> Forbidden"
>>> >> >> > errors.
>>> >> >> >
>>> >> >> > Here's the output of "radosgw-admin user info --uid=hive_cache":
>>> >> >> > http://pastebin.com/vwwbyd4c
>>> >> >> > And here's my curl invocation: http://pastebin.com/EfQ8nw8a
>>> >> >> >
>>> >> >> > Any ideas on what might be wrong?
>>> >> >> >
>>> >> >>
>>> >> >> Not sure. Can you try reproducing it with 'debug rgw = 20' and
>>> 'debug
>>> >> >> ms = 1' on rgw and provide the log?
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Yehuda
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Alexandre DERUMIER
>>We haven't really quantified that yet. In particular, it's going to
>>depend on how many objects are accessed within a period; the OSD sizes
>>them based on the previous access count and the false positive
>>probability that you give it

Ok, thanks Greg.



Another question, the doc describe how the objects are going from cache tier to 
base tier.
But how does it work from base tier to cache tier ? (cache-mode writeback)
Does any read on base tier promote the object in the cache tier ?
Or they are also statistics on the base tier ?

(I tell the question, because I have cold datas, but I have full backups jobs 
running each week, reading all theses cold datas)



- Mail original - 

De: "Gregory Farnum"  
À: "Alexandre DERUMIER"  
Cc: "ceph-users"  
Envoyé: Mercredi 11 Juin 2014 21:56:29 
Objet: Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage 
? 

On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER 
 wrote: 
> Hi, 
> 
> I'm reading tiering doc here 
> http://ceph.com/docs/firefly/dev/cache-pool/ 
> 
> " 
> The hit_set_count and hit_set_period define how much time each HitSet should 
> cover, and how many such HitSets to store. Binning accesses over time allows 
> Ceph to independently determine whether an object was accessed at least once 
> and whether it was accessed more than once over some time period (“age” vs 
> “temperature”). Note that the longer the period and the higher the count the 
> more RAM will be consumed by the ceph-osd process. In particular, when the 
> agent is active to flush or evict cache objects, all hit_set_count HitSets 
> are loaded into RAM" 
> 
> about how much memory do we talk here ? any formula ? (nr object x ? ) 

We haven't really quantified that yet. In particular, it's going to 
depend on how many objects are accessed within a period; the OSD sizes 
them based on the previous access count and the false positive 
probability that you give it. 
-Greg 
Software Engineer #42 @ http://inktank.com | http://ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Gregory Farnum
Any user access to an object promotes it into the cache pool.

On Wednesday, June 11, 2014, Alexandre DERUMIER  wrote:

> >>We haven't really quantified that yet. In particular, it's going to
> >>depend on how many objects are accessed within a period; the OSD sizes
> >>them based on the previous access count and the false positive
> >>probability that you give it
>
> Ok, thanks Greg.
>
>
>
> Another question, the doc describe how the objects are going from cache
> tier to base tier.
> But how does it work from base tier to cache tier ? (cache-mode writeback)
> Does any read on base tier promote the object in the cache tier ?
> Or they are also statistics on the base tier ?
>
> (I tell the question, because I have cold datas, but I have full backups
> jobs running each week, reading all theses cold datas)
>
>
>
> - Mail original -
>
> De: "Gregory Farnum" >
> À: "Alexandre DERUMIER" >
> Cc: "ceph-users" >
> Envoyé: Mercredi 11 Juin 2014 21:56:29
> Objet: Re: [ceph-users] tiering : hit_set_count && hit_set_period memory
> usage ?
>
> On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER
> > wrote:
> > Hi,
> >
> > I'm reading tiering doc here
> > http://ceph.com/docs/firefly/dev/cache-pool/
> >
> > "
> > The hit_set_count and hit_set_period define how much time each HitSet
> should cover, and how many such HitSets to store. Binning accesses over
> time allows Ceph to independently determine whether an object was accessed
> at least once and whether it was accessed more than once over some time
> period (“age” vs “temperature”). Note that the longer the period and the
> higher the count the more RAM will be consumed by the ceph-osd process. In
> particular, when the agent is active to flush or evict cache objects, all
> hit_set_count HitSets are loaded into RAM"
> >
> > about how much memory do we talk here ? any formula ? (nr object x ? )
>
> We haven't really quantified that yet. In particular, it's going to
> depend on how many objects are accessed within a period; the OSD sizes
> them based on the previous access count and the false positive
> probability that you give it.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com