[ceph-users] radosgw hanging - blocking "rgw.bucket_list" ops

2015-08-21 Thread Sam Wouters
Hi, We are running hammer 0.94.2 and have an increasing amount of "heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had timed out after 600" messages in our radosgw logs, with radosgw eventually stalling. A restart of the radosgw helps for a few minutes, but after that it hangs ag

Re: [ceph-users] radosgw hanging - blocking "rgw.bucket_list" ops

2015-08-21 Thread Sam Wouters
I suspect these to be the cause: rados ls -p .be-east.rgw.buckets | grep sanitybe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity be-east.5436.1__sanity be-east.5436.1__:2vBijaGnVQF4Q0IjZPeyZSKeUmBGn9X__sanity be-east.5436.1__sanity be-east.5436.1__:4JTCVFxB1qoDWPu1nhuMDuZ3QNPaq5n__san

Re: [ceph-users] Testing CephFS

2015-08-21 Thread Gregory Farnum
On Thu, Aug 20, 2015 at 11:07 AM, Simon Hallam wrote: > Hey all, > > > > We are currently testing CephFS on a small (3 node) cluster. > > > > The setup is currently: > > > > Each server has 12 OSDs, 1 Monitor and 1 MDS running on it: > > The servers are running: 0.94.2-0.el7 > > The clients are r

Re: [ceph-users] radosgw hanging - blocking "rgw.bucket_list" ops

2015-08-21 Thread Sam Wouters
tried removing, but no luck: rados -p .be-east.rgw.buckets rm "be-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity" error removing .be-east.rgw.buckets>be-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity: (2) anyone? On 21-08-15 13:06, Sam Wouters wrote: > I suspect these to be the cau

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread J-P Methot
Hi, First of all, we are sure that the return to the default configuration fixed it. As soon as we restarted only one of the ceph nodes with the default configuration, it sped up recovery tremedously. We had already restarted before with the old conf and recovery was never that fast. Regarding th

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread Jan Schermer
Thanks for the config, few comments inline:, not really related to the issue > On 21 Aug 2015, at 15:12, J-P Methot wrote: > > Hi, > > First of all, we are sure that the return to the default configuration > fixed it. As soon as we restarted only one of the ceph nodes with the > default config

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread Shinobu Kinjo
> filestore_fd_cache_random = true not true Shinobu On Fri, Aug 21, 2015 at 10:20 PM, Jan Schermer wrote: > Thanks for the config, > few comments inline:, not really related to the issue > > > On 21 Aug 2015, at 15:12, J-P Methot wrote: > > > > Hi, > > > > First of all, we are sure that the r

Re: [ceph-users] Rados: Undefined symbol error

2015-08-21 Thread Jason Dillaman
It sounds like you have rados CLI tool from an earlier Ceph release (< Hammer) installed and it is attempting to use the librados shared library from a newer (>= Hammer) version of Ceph. Jason - Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: ceph-us...@ceph.com > Sent:

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just
Odd, did you happen to capture osd logs? -Sam On Thu, Aug 20, 2015 at 8:10 PM, Ilya Dryomov wrote: > On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just wrote: >> What's supposed to happen is that the client transparently directs all >> requests to the cache pool rather than the cold pool when there is

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Ilya Dryomov
On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just wrote: > Odd, did you happen to capture osd logs? No, but the reproducer is trivial to cut & paste. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/l

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just
I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov wrote: > On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just wrote: >> Odd, did you happen to capture osd logs? > > No, but the re

[ceph-users] radosgw only delivers whats cached if latency between keyrequest and actual download is above 90s

2015-08-21 Thread Sean
We heavily use radosgw here for most of our work and we have seen a weird truncation issue with radosgw/s3 requests. We have noticed that if the time between the initial "ticket" to grab the object key and grabbing the data is greater than 90 seconds the object returned is truncated to whateve

[ceph-users] Object Storage and POSIX Mix

2015-08-21 Thread Scottix
I saw this article on Linux Today and immediately thought of Ceph. http://www.enterprisestorageforum.com/storage-management/object-storage-vs.-posix-storage-something-in-the-middle-please-1.html I was thinking would it theoretically be possible with RGW to do a GET and set a BEGIN_SEEK and OFFSET

Re: [ceph-users] Object Storage and POSIX Mix

2015-08-21 Thread Gregory Farnum
On Fri, Aug 21, 2015 at 10:27 PM, Scottix wrote: > I saw this article on Linux Today and immediately thought of Ceph. > > http://www.enterprisestorageforum.com/storage-management/object-storage-vs.-posix-storage-something-in-the-middle-please-1.html > > I was thinking would it theoretically be pos

Re: [ceph-users] Object Storage and POSIX Mix

2015-08-21 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shouldn't this already be possible with HTTP Range requests? I don't work with RGW or S3 so please ignore me if I'm talking crazy. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Aug 21,

Re: [ceph-users] radosgw only delivers whats cached if latency between keyrequest and actual download is above 90s

2015-08-21 Thread Ben Hines
I just tried this (with some smaller objects, maybe 4.5 MB, as well as with a 16 GB file and it worked fine. However, i am using apache + fastcgi interface to rgw, rather than civetweb. -Ben On Fri, Aug 21, 2015 at 12:19 PM, Sean wrote: > We heavily use radosgw here for most of our work and we

[ceph-users] OSD GHz vs. Cores Question

2015-08-21 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 We are looking to purchase our next round of Ceph hardware and based off the work by Nick Fisk [1] our previous thought of cores over clock is being revisited. I have two camps of thoughts and would like to get some feedback, even if it is only theo

Re: [ceph-users] OSD GHz vs. Cores Question

2015-08-21 Thread Mark Nelson
FWIW, we recently were looking at a couple of different options for the machines in our test lab that run the nightly QA suite jobs via teuthology. From a cost/benefit perspective, I think it really comes down to something like a XEON E3-12XXv3 or the new XEON D-1540, each of which have advant

Re: [ceph-users] Question about reliability model result

2015-08-21 Thread dahan
Hi, I have crosspost this issue here and in github, but no response yet. Any advice? On Mon, Aug 10, 2015 at 10:21 AM, dahan wrote: > > Hi all, I have tried the reliability model: > https://github.com/ceph/ceph-tools/tree/master/models/reliability > > I run the tool with default configuration,

[ceph-users] TRIM / DISCARD run at low priority by the OSDs?

2015-08-21 Thread Chad William Seys
Hi All, Is it possible to give TRIM / DISCARD initiated by krbd low priority on the OSDs? I know it is possible to run fstrim at Idle priority on the rbd mount point, e.g. ionice -c Idle fstrim -v $MOUNT . But this Idle priority (it appears) only is within the context of the node executing