Re: [ceph-users] Kernel Bug in 3.13.0-52

2015-05-13 Thread Gregory Farnum
On Wed, May 13, 2015 at 12:08 PM, Daniel Takatori Ohara wrote: > Hi, > > We have a small ceph cluster with 4 OSD's and 1 MDS. > > I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6 > with 2.6.32-504.16.2.el6.x86_64 in Servers. > > The version of Ceph is 0.94.1 > > Sometimes,

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-14 Thread Gregory Farnum
On Thu, May 14, 2015 at 10:15 AM, Francois Lafont wrote: > Hi, > > I had a problem with a cephfs freeze in a client. Impossible to > re-enable the mountpoint. A simple "ls /mnt" command totally > blocked (of course impossible to umount-remount etc.) and I had > to reboot the host. But even a "norm

Re: [ceph-users] RadosGW User Limit?

2015-05-15 Thread Gregory Farnum
On Fri, May 15, 2015 at 12:04 AM, Daniel Schneller wrote: > Hello! > > I am wondering if there is a limit to the number of (Swift) users that > should be observed when using RadosGW. > For example, if I were to offer storage via S3 or Swift APIs with Ceph and > RGW as the backing implementation an

Re: [ceph-users] rados_clone_range

2015-05-22 Thread Gregory Farnum
On Thu, May 21, 2015 at 3:09 AM, Michel Hollands wrote: > Hello, > > Is it possible to use the rados_clone_range() librados API call with an > erasure coded pool ? The documentation doesn’t mention it’s not possible. > However running the clonedata command from the rados utility (which seems to >

Re: [ceph-users] ceph.conf boolean value for mon_cluster_log_to_syslog

2015-05-22 Thread Gregory Farnum
On Thu, May 21, 2015 at 8:24 AM, Kenneth Waegeman wrote: > Hi, > > Some strange issue wrt boolean values in the config: > > this works: > > osd_crush_update_on_start = 0 -> osd not updated > osd_crush_update_on_start = 1 -> osd updated > > In a previous version we could set boolean values in the c

Re: [ceph-users] HDFS on Ceph (RBD)

2015-05-22 Thread Gregory Farnum
If you guys have stuff running on Hadoop, you might consider testing out CephFS too. Hadoop is a predictable workload that we haven't seen break at all in several years and the bindings handle data locality and such properly. :) -Greg On Thu, May 21, 2015 at 11:24 PM, Wang, Warren wrote: > > On 5

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
On Fri, May 22, 2015 at 11:34 AM, Adam Tygart wrote: > On Fri, May 22, 2015 at 11:47 AM, John Spray wrote: >> >> >> On 22/05/2015 15:33, Adam Tygart wrote: >>> >>> Hello all, >>> >>> The ceph-mds servers in our cluster are performing a constant >>> boot->replay->crash in our systems. >>> >>> I ha

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
gt; let them close in the cephfs session. This is tricker than it sounds with the kernel dcache, unfortunately. We improved it a bit just last week but we'll have to try and diagnose what happened in this case more before we can say if it was that issue or something else. -Greg > > --

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
On Fri, May 22, 2015 at 12:45 PM, Adam Tygart wrote: > Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' > to try and get the mds server functional again? Yep! Let us know how it goes... > > I realize there is nothing simple about the things that are being > accomplished her

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-27 Thread Gregory Farnum
On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman wrote: > We are also running a full backup sync to cephfs, using multiple distributed > rsync streams (with zkrsync), and also ran in this issue today on Hammer > 0.94.1 . > After setting the beacon higer, and eventually clearing the journal, it >

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-27 Thread Gregory Farnum
Sorry for the delay; I've been traveling. On Sun, May 17, 2015 at 3:49 PM, Francois Lafont wrote: > Hi, > > Sorry for my late answer. > > Gregory Farnum wrote: > >>> 1. Is this kind of freeze normal? Can I avoid these freezes with a >>> more recent versi

Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-27 Thread Gregory Farnum
On Sun, May 17, 2015 at 5:08 PM, Francois Lafont wrote: > Hi, > > Wido den Hollander wrote: > >> Aren't snapshots something that should protect you against removal? IF >> snapshots work properly in CephFS you could create a snapshot every hour. > > Are you talking about the .snap/ directory in a c

Re: [ceph-users] fix active+clean+inconsistent on cephfs when digest != digest

2015-05-27 Thread Gregory Farnum
Glad you figured it out! In the future you can also do repairs based on the underlying RADOS objects. Generally speaking errors like this mean that the replicas are storing objects that don't match, but if you go to each OSD storing the object and find the raw file you will generally find that two

Re: [ceph-users] Cache Pool Flush/Eviction Limits - Hard of Soft?

2015-05-27 Thread Gregory Farnum
The max target limit is a hard limit: the OSDs won't let more than that amount of data in the cache tier. They will start flushing and evicting based on the percentage ratios you can set (I don't remember the exact parameter names) and you may need to set these more aggressively for your given work

Re: [ceph-users] replication over slow uplink

2015-05-27 Thread Gregory Farnum
On Tue, May 19, 2015 at 7:35 PM, John Peebles wrote: > Hi, > > I'm hoping for advice on whether Ceph could be used in an atypical use case. > Specifically, I have about ~20TB of files that need replicated to 2 > different sites. Each site has its own internal gigabit ethernet network. > However, t

Re: [ceph-users] Hammer cache behavior

2015-05-27 Thread Gregory Farnum
On Mon, May 18, 2015 at 9:34 AM, Brian Rak wrote: > We just enabled a small cache pool on one of our clusters (v 0.94.1) and > have run into some issues: > > 1) Cache population appears to happen via the public network (not the > cluster network). We're seeing basically no traffic on the cluster

Re: [ceph-users] replication over slow uplink

2015-05-27 Thread Gregory Farnum
On Wed, May 27, 2015 at 6:57 PM, Christian Balzer wrote: > On Wed, 27 May 2015 14:06:43 -0700 Gregory Farnum wrote: > >> On Tue, May 19, 2015 at 7:35 PM, John Peebles wrote: >> > Hi, >> > >> > I'm hoping for advice on whether Ceph could be used in an

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-27 Thread Gregory Farnum
ven less desire to do so on this production cluster. ^_- > > Christian > > On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote: > >> On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: >> >> > On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer wr

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Gregory Farnum
On Thu, May 28, 2015 at 12:22 AM, Christian Balzer wrote: > > Hello Greg, > > On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: > >> The description of the logging abruptly ending and the journal being >> bad really sounds like part of the disk is going back in t

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-28 Thread Gregory Farnum
On Thu, May 28, 2015 at 1:04 AM, Kenneth Waegeman wrote: > > > On 05/27/2015 10:30 PM, Gregory Farnum wrote: >> >> On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman >> wrote: >>> >>> We are also running a full backup sync to cephfs, using multiple &g

Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Gregory Farnum
On Fri, May 29, 2015 at 2:47 PM, Samuel Just wrote: > Many people have reported that they need to lower the osd recovery config > options to minimize the impact of recovery on client io. We are talking > about changing the defaults as follows: > > osd_max_backfills to 1 (from 10) > osd_recovery

Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?

2015-06-01 Thread Gregory Farnum
On Mon, Jun 1, 2015 at 6:53 AM, Erik Logtenberg wrote: > Hi, > > I ran a config diff, like this: > > ceph --admin-daemon (...).asok config diff > > There are the obvious things like the fsid and IP-ranges, but two > settings stand out: > > - internal_safe_to_start_threads: true (default: false) H

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-01 Thread Gregory Farnum
On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz wrote: > On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum wrote: >> On Fri, May 29, 2015 at 2:47 PM, Samuel Just wrote: >> > Many people have reported that they need to lower the osd recovery config >> > option

Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?

2015-06-02 Thread Gregory Farnum
On Tue, Jun 2, 2015 at 6:47 AM, Erik Logtenberg wrote: >>> What does this do? >>> >>> - leveldb_compression: false (default: true) >>> - leveldb_block/cache/write_buffer_size (all bigger than default) >> >> I take it you're running these commands on a monitor (from I think the >> Dumpling timefram

Re: [ceph-users] Read Errors and OSD Flapping

2015-06-02 Thread Gregory Farnum
On Sat, May 30, 2015 at 2:23 PM, Nick Fisk wrote: > > Hi All, > > > > I was noticing poor performance on my cluster and when I went to investigate > I noticed OSD 29 was flapping up and down. On investigation it looks like it > has 2 pending sectors, kernel log is filled with the following > > >

Re: [ceph-users] apply/commit latency

2015-06-03 Thread Gregory Farnum
On Wed, Jun 3, 2015 at 5:19 AM, Xu (Simon) Chen wrote: > Hi folks, > > I've always been confused about the apply/commit latency numbers in "ceph > osd perf" output. I only know for sure that when they get too high, > performance is bad. > > My deployments have seen many different versions of ceph.

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-03 Thread Gregory Farnum
On Wed, Jun 3, 2015 at 3:44 PM, Sage Weil wrote: > On Mon, 1 Jun 2015, Gregory Farnum wrote: >> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz >> wrote: >> > On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum wrote: >> >> On Fri, May 29, 2015 at 2:47 PM, Sam

Re: [ceph-users] Old vs New pool on same OSDs - Performance Difference

2015-06-04 Thread Gregory Farnum
On Thu, Jun 4, 2015 at 6:31 AM, Nick Fisk wrote: > > Hi All, > > I have 2 pools both on the same set of OSD’s, 1st is the default rbd pool > created at installation 3 months ago, the other has just recently been > created, to verify performance problems. > > As mentioned both pools are on the sa

Re: [ceph-users] Cephfs: one ceph account per directory?

2015-06-04 Thread Gregory Farnum
On Thu, Jun 4, 2015 at 7:25 AM, François Lafont wrote: > Hi, > > A Hammer cluster can provide only one Cephfs and my problem is about > security. > Currently, if I want to share a Cephfs for 2 nodes foo-1 and foo-2 and > another Cephfs for > 2 another nodes bar-1 and bar-2, I just mount a dedicate

Re: [ceph-users] monitor election

2015-06-08 Thread Gregory Farnum
On Thu, Jun 4, 2015 at 1:13 AM, Luis Periquito wrote: > Hi all, > > I've seen several chats on the monitor elections, and how the one with the > lowest IP is always the master. > > Is there any way to change or influence this behaviour? Other than changing > the IP of the monitor themselves? Nope

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-06-09 Thread Gregory Farnum
On Mon, Jun 8, 2015 at 5:20 PM, Francois Lafont wrote: > Hi, > > On 27/05/2015 22:34, Gregory Farnum wrote: > >> Sorry for the delay; I've been traveling. > > No problem, me too, I'm not really fast to answer. ;) > >>> Ok, I see. According to th

Re: [ceph-users] apply/commit latency

2015-06-09 Thread Gregory Farnum
On Thu, Jun 4, 2015 at 3:57 AM, Межов Игорь Александрович wrote: > Hi! > >> My deployments have seen many different versions of ceph. Pre 0.80.7, I've >> seen those numbers being pretty high. After upgrading to 0.80.7, all of a >> sudden, commit latency of all OSDs drop to 0-1ms, and apply latency

Re: [ceph-users] ceph mount error

2015-06-11 Thread Gregory Farnum
You probably didn't turn on an MDS, as that isn't set up by default anymore. I believe the docs tell you how to do that somewhere else. If that's not it, please provide the output of "ceph -s". -Greg On Sun, Jun 7, 2015 at 8:14 AM, 张忠波 wrote: > Hi , > My ceph health is OK , And now , I want to

Re: [ceph-users] anyone using CephFS for HPC?

2015-06-11 Thread Gregory Farnum
On Thu, Jun 11, 2015 at 10:31 PM, Nigel Williams wrote: > Wondering if anyone has done comparisons between CephFS and other > parallel filesystems like Lustre typically used in HPC deployments > either for scratch storage or persistent storage to support HPC > workflows? Oak Ridge had a paper at

Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 11:07 AM, John Spray wrote: > > Just had a go at reproducing this, and yeah, the behaviour is weird. Our > automated testing for cephfs doesn't include any cache tiering, so this is a > useful exercise! > > With a writeback overlay cache tier pool on an EC pool, I write a

Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 11:59 AM, Lincoln Bryant wrote: > Thanks John, Greg. > > If I understand this correctly, then, doing this: > rados -p hotpool cache-flush-evict-all > should start appropriately deleting objects from the cache pool. I just > started one up, and that seems to be work

Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 1:07 AM, Paweł Sadowski wrote: > Hi All, > > I'm testing erasure coded pools. Is there any protection from bit-rot > errors on object read? If I modify one bit in object part (directly on > OSD) I'm getting *broken*object: Sorry, are you saying that you're getting a broken

Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Gregory Farnum
Okay, Sam thinks he knows what's going on; here's a ticket: http://tracker.ceph.com/issues/12000 On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum wrote: > On Fri, Jun 12, 2015 at 1:07 AM, Paweł Sadowski wrote: >> Hi All, >> >> I'm testing erasure coded poo

Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-14 Thread Gregory Farnum
broken object. > I haven't checked this on other versions but is this bug present > only in Hammer or in all versions? > > > W dniu 12.06.2015 o 21:43, Gregory Farnum pisze: > > Okay, Sam thinks he knows what's going on; here's a ticket: > > http://tracker.c

Re: [ceph-users] cephfs unmounts itself from time to time

2015-06-15 Thread Gregory Farnum
On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler wrote: > I have a small cluster of 4 machines and quite a few drives. After about 2 > - 3 weeks cephfs fails. It's not properly mounted anymore in /mnt/cephfs, > which of course causes the VM's running to fail too. > > In /var/log/syslog I have "/m

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
Every time you delete a snapshot it goes in removed_snaps. The set of removed snaps is stored as an interval set, so it uses up two integers in the OSDMap for each range. There are some patterns of usage that work out badly for this, but generally if you're creating snapshots as time goes forward a

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Gregory Farnum
On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen wrote: > Hello everyone, > > something very strange is driving me crazy with CephFS (kernel driver). > I copy a large directory on the CephFS from one node. If I try to perform a > 'time ls -alR' on that directory it gets executed in less than on

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 3:30 AM, Jan Schermer wrote: > Thanks for the answer. > So it doesn’t hurt performance if it grows to ridiculous size - e.g. no > lookup table overhead, stat()ing additional files etc.? Nope, definitely nothing like that. If it gets sufficiently fragmented it can expand t

Re: [ceph-users] CephFS: delayed objects deletion ?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 11:38 AM, Florent B wrote: > I still have this "problem" on Hammer. > > My CephFS directory contains 46MB of data, but the pool (configured with > layout, not default one) is 6.59GB... > > How to debug this ? On Mon, Mar 16, 2015 at 4:14 PM, John Spray wrote: > If you can

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
deleting them...not sure. :/ What version of Ceph are you currently running? > > Jan > >> On 16 Jun 2015, at 12:32, Gregory Farnum wrote: >> >> On Tue, Jun 16, 2015 at 3:30 AM, Jan Schermer wrote: >>> Thanks for the answer. >>> So it doesn’t hurt per

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 12:03 PM, Jan Schermer wrote: > >> On 16 Jun 2015, at 12:59, Gregory Farnum wrote: >> >> On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer wrote: >>> Well, I see mons dropping out when deleting large amount of snapshots, and >>>

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Gregory Farnum
e writing small amounts to them in round-robin then that's unfortunately not going to work well. :( -Greg > > Thanks again and regards. > > On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum wrote: >> >> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen >> wrote

Re: [ceph-users] CephFS: delayed objects deletion ?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 11:55 AM, Florent B wrote: > > On 06/16/2015 12:47 PM, Gregory Farnum wrote: >> On Tue, Jun 16, 2015 at 11:38 AM, Florent B wrote: >>> I still have this "problem" on Hammer. >>> >>> My CephFS directory contains 46MB of da

Re: [ceph-users] 10d

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 8:56 AM, Dan van der Ster wrote: > Hi, > > After upgrading to 0.94.2 yesterday on our test cluster, we've had 3 > PGs go inconsistent. > > First, immediately after we updated the OSDs PG 34.10d went inconsistent: > > 2015-06-16 13:42:19.086170 osd.52 137.138.39.211:6806/926

Re: [ceph-users] SSD LifeTime for Monitors

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 10:18 AM, Stefan Priebe - Profihost AG wrote: > Hi, > > Does anybody know how many data gets written from the monitors? I was using > some cheaper ssds for monitors and was wondering why they had already written > 80 TB after 8 month. 3.8MB/s? That's a little more than I

Re: [ceph-users] Accessing Ceph from Spark

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 2:58 PM, Milan Sladky wrote: > Is it possible to access Ceph from Spark as it is mentioned here for > Openstack Swift? > > https://spark.apache.org/docs/latest/storage-openstack-swift.html Depends on what you're trying to do. It's possible that the Swift bindings described

Re: [ceph-users] cephfs unmounts itself from time to time

2015-06-19 Thread Gregory Farnum
On Thu, Jun 18, 2015 at 10:15 PM, Roland Giesler wrote: > On 15 June 2015 at 13:09, Gregory Farnum wrote: >> >> On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler >> wrote: >> > I have a small cluster of 4 machines and quite a few drives. After >> > about 2

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-23 Thread Gregory Farnum
On Tue, Jun 23, 2015 at 9:50 AM, Erik Logtenberg wrote: > Thanks! > > Just so I understand correctly, the btrfs snapshots are mainly useful if > the journals are on the same disk as the osd, right? Is it indeed safe > to turn them off if the journals are on a separate ssd? That's not quite it...i

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-23 Thread Gregory Farnum
On Tue, Jun 23, 2015 at 12:17 PM, Lionel Bouton wrote: > On 06/23/15 11:43, Gregory Farnum wrote: >> On Tue, Jun 23, 2015 at 9:50 AM, Erik Logtenberg wrote: >>> Thanks! >>> >>> Just so I understand correctly, the btrfs snapshots are mainly useful if >>

Re: [ceph-users] Explanation for "ceph osd set nodown" and "ceph osd cluster_snap"

2015-06-23 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 11:48 PM, Jan Schermer wrote: > 1) Flags available in ceph osd set are > > pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent > > I know or can guess most of them (the docs are a “bit” lacking) > > But with "ceph osd set nodown” I have no ide

Re: [ceph-users] Mounting cephfs from cluster ip ok but fails from external ip

2015-06-23 Thread Gregory Farnum
Monitors are bound to a particular IP address. If you tell the client to connect to "52.28.87.xxx:6789" and the monitor responds saying "no, I'm 172.31.15.xxx:6789"...I don't think anything is expected to work. You'll need to pick one network or the other. -Greg On Fri, Jun 19, 2015 at 11:47 PM, C

Re: [ceph-users] Unexpected period of iowait, no obvious activity?

2015-06-23 Thread Gregory Farnum
On Fri, Jun 19, 2015 at 7:50 PM, Daniel Schneller wrote: > Hi! > > Recently over a few hours our 4 Ceph disk nodes showed unusually high > and somewhat constant iowait times. Cluster runs 0.94.1 on Ubuntu > 14.04.1. > > It started on one node, then - with maybe 15 minutes delay each - on the > nex

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-23 Thread Gregory Farnum
That's much slower than I'd expect, although FUSE can be slower in metadata operations than the kernel client is. If it's convenient, you could gather some logs with "debug client = 20" (on the client, for ceph-fuse) and "debug mds = 20" (on the mds, for both the ceph-fuse and kernel tests) and pos

Re: [ceph-users] CephFS posix test performance

2015-06-29 Thread Gregory Farnum
Zheng, I don't have any idea what pieces have changed in that kernel range. Did we have to flip some switches that slowed things down and we expect to flip back, or did something more fundamental happen? Do these results make any sense? I'm a little surprised to find ceph-fuse that much faster than

Re: [ceph-users] Round-trip time for monitors

2015-07-01 Thread Gregory Farnum
On Wed, Jul 1, 2015 at 8:38 AM, - - wrote: > Hi everybody, > > We have 3 monitors in our ceph cluster: 2 in one local site (2 data centers a > few km away from each other), and the 3rd one on a remote site, with a maximum > round-trip time (RTT) of 30ms between the local site and the remote site.

Re: [ceph-users] file/directory invisible through ceph-fuse

2015-07-01 Thread Gregory Farnum
On Wed, Jul 1, 2015 at 9:02 AM, flisky wrote: > Hi list, > > I meet a strange problem: > > sometimes I cannot see the file/directory created by another ceph-fuse > client. It comes into visible after I touch/mkdir the same name. > > Any thoughts? What version are you running? We've seen a few thi

Re: [ceph-users] Node reboot -- OSDs not "logging off" from cluster

2015-07-01 Thread Gregory Farnum
On Tue, Jun 30, 2015 at 10:36 AM, Daniel Schneller wrote: > Hi! > > We are seeing a strange - and problematic - behavior in our 0.94.1 > cluster on Ubuntu 14.04.1. We have 5 nodes, 4 OSDs each. > > When rebooting one of the nodes (e. g. for a kernel upgrade) the OSDs > do not seem to shut down cor

Re: [ceph-users] Removing empty placement groups / empty objects

2015-07-01 Thread Gregory Farnum
On Mon, Jun 29, 2015 at 1:44 PM, Burkhard Linke wrote: > Hi, > > I've noticed that a number of placement groups in our setup contain objects, > but no actual data > (ceph pg dump | grep remapped during a hard disk replace operation): > > 7.616 26360 0 52720 4194304 3003

Re: [ceph-users] file/directory invisible through ceph-fuse

2015-07-01 Thread Gregory Farnum
On Wed, Jul 1, 2015 at 9:21 AM, flisky wrote: > On 2015年07月01日 16:11, Gregory Farnum wrote: >> >> On Wed, Jul 1, 2015 at 9:02 AM, flisky wrote: >>> >>> Hi list, >>> >>> I meet a strange problem: >>> >>> sometimes I cannot see

Re: [ceph-users] metadata server rejoin time

2015-07-07 Thread Gregory Farnum
On Thu, Jul 2, 2015 at 11:38 AM, Matteo Dacrema wrote: > Hi all, > > I'm using CephFS on Hammer and I've 1.5 million files , 2 metadata servers > in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of > RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM. > I've noticed tha

Re: [ceph-users] Ceph FS - MDS problem

2015-07-07 Thread Gregory Farnum
On Fri, Jul 3, 2015 at 10:34 AM, Dan van der Ster wrote: > Hi, > > We're looking at similar issues here and I was composing a mail just > as you sent this. I'm just a user -- hopefully a dev will correct me > where I'm wrong. > > 1. A CephFS cap is a way to delegate permission for a client to do I

Re: [ceph-users] Ceph FS - MDS problem

2015-07-07 Thread Gregory Farnum
On Tue, Jul 7, 2015 at 4:02 PM, Dan van der Ster wrote: > Hi Greg, > > On Tue, Jul 7, 2015 at 4:25 PM, Gregory Farnum wrote: >>> 4. "mds cache size = 500" is going to use a lot of memory! We have >>> an MDS with just 8GB of RAM and it goes OOM after

Re: [ceph-users] CephFS archive use case

2015-07-07 Thread Gregory Farnum
That's not something that CephFS supports yet; raw RADOS doesn't have any kind of immutability support either. :( -Greg On Tue, Jul 7, 2015 at 5:28 PM Peter Tiernan wrote: > Hi, > > i have a use case for CephFS whereby files can be added but not modified > or deleted. Is this possible? Perhaps w

Re: [ceph-users] Ceph performance, empty vs part full

2015-07-08 Thread Gregory Farnum
I think you're probably running into the internal PG/collection splitting here; try searching for those terms and seeing what your OSD folder structures look like. You could test by creating a new pool and seeing if it's faster or slower than the one you've already filled up. -Greg On Wed, Jul 8,

Re: [ceph-users] old PG left behind after remapping

2015-07-08 Thread Gregory Farnum
On Sun, Jul 5, 2015 at 5:37 AM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Hi, > > after larger moves of serveral placement groups we tried to empty 3 of > our 66 osds by slowly setting weight of them to 0 within the crushmap. > > After move completed we're still experiencing a large amount

Re: [ceph-users] Removing empty placement groups / empty objects

2015-07-08 Thread Gregory Farnum
On Wed, Jul 1, 2015 at 5:47 PM, Burkhard Linke wrote: > Hi, > > > On 07/01/2015 06:09 PM, Gregory Farnum wrote: >> >> On Mon, Jun 29, 2015 at 1:44 PM, Burkhard Linke >> wrote: >>> >>> Hi, >>> >>> I've noticed that a number

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Gregory Farnum
Your first point of troubleshooting is pretty much always to look at "ceph -s" and see what it says. In this case it's probably telling you that some PGs are down, and then you can look at why (but perhaps it's something else). -Greg On Thu, Jul 9, 2015 at 12:22 PM, Mallikarjun Biradar wrote: > Y

Re: [ceph-users] CephFS kernel client reboots on write

2015-07-13 Thread Gregory Farnum
On Mon, Jul 13, 2015 at 9:49 AM, Ilya Dryomov wrote: > On Fri, Jul 10, 2015 at 9:36 PM, Jan Pekař wrote: >> Hi all, >> >> I think I found a bug in cephfs kernel client. >> When I create directory in cephfs and set layout to >> >> ceph.dir.layout="stripe_unit=1073741824 stripe_count=1 >> object_si

Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?

2015-07-13 Thread Gregory Farnum
On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas wrote: > Hello, > it seems that new packages for firefly have been uploaded to repo. > However, I can't find any details in Ceph Release notes. There is only > one thread in ceph-devel [1], but it is not clear what this new > version is about. Is i

Re: [ceph-users] ceph daemons stucked in FUTEX_WAIT syscall

2015-07-14 Thread Gregory Farnum
On Mon, Jul 13, 2015 at 11:00 PM, Simion Rad wrote: > Hi , > > I'm running a small cephFS ( 21 TB , 16 OSDs having different sizes between > 400G and 3.5 TB ) cluster that is used as a file warehouse (both small and > big files). > Every day there are times when a lot of processes running on the c

Re: [ceph-users] xattrs vs omap

2015-07-14 Thread Gregory Farnum
On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer wrote: > Thank you for your reply. > Comments inline. > > I’m still hoping to get some more input, but there are many people running > ceph on ext4, and it sounds like it works pretty good out of the box. Maybe > I’m overthinking this, then? I thin

Re: [ceph-users] ceph daemons stucked in FUTEX_WAIT syscall

2015-07-14 Thread Gregory Farnum
't that high and I think the write-back > cache of the RAID controller sould be able to help with the journal ops. > > Simion Rad. > > From: Gregory Farnum [g...@gregs42.com] > Sent: Tuesday, July 14, 2015 12:38 > To: Simion Rad >

Re: [ceph-users] Failures with Ceph without redundancy/replication

2015-07-16 Thread Gregory Farnum
On Thu, Jul 16, 2015 at 11:58 AM, Vedran Furač wrote: > Hello, > > I'm experimenting with ceph for caching, it's configured with size=1 (so > no redundancy/replication) and exported via cephfs to clients, now I'm > wondering what happens is an SSD dies and all of its data is lost? I'm > seeing fil

Re: [ceph-users] backing Hadoop with Ceph ??

2015-07-16 Thread Gregory Farnum
On Wed, Jul 15, 2015 at 10:50 PM, John Spray wrote: > > > On 15/07/15 16:57, Shane Gibson wrote: >> >> >> >> We are in the (very) early stages of considering testing backing Hadoop >> via Ceph - as opposed to HDFS. I've seen a few very vague references to >> doing that, but haven't found any conc

Re: [ceph-users] OSD RAM usage values

2015-07-17 Thread Gregory Farnum
On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman wrote: > Hi all, > > I've read in the documentation that OSDs use around 512MB on a healthy > cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram) > Now, our OSD's are all using around 2GB of RAM memory while the cluster is > h

Re: [ceph-users] 10d

2015-07-17 Thread Gregory Farnum
:TPHandle*)+0xc16) [0x975a06] > 2: (FileStore::_do_transactions(std::list std::allocator >&, unsigned long, > ThreadPool::TPHandle*)+0x64) [0x97d794] > 3: (FileStore::_do_op(FileStore::OpSequencer*, > ThreadPool::TPHandle&)+0x2a0) [0x97da50] > 4: (ThreadPool::worker(ThreadPoo

Re: [ceph-users] ceph failure on sf.net?

2015-07-20 Thread Gregory Farnum
"We responded immediately and confirmed the issue was related to filesystem corruption on our storage platform. This incident impacted all block devices on our Ceph cluster." Just guessing from that, I bet they lost power and discovered their local filesystems/disks were misconfigured to not be co

Re: [ceph-users] Ceph Tech Talk next week

2015-07-21 Thread Gregory Farnum
On Tue, Jul 21, 2015 at 6:09 PM, Patrick McGarry wrote: > Hey cephers, > > Just a reminder that the Ceph Tech Talk on CephFS that was scheduled > for last month (and cancelled due to technical difficulties) has been > rescheduled for this month's talk. It will be happening next Thurs at > 17:00 UT

Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-22 Thread Gregory Farnum
We might also be able to help you improve or better understand your results if you can tell us exactly what tests you're conducting that are giving you these numbers. -Greg On Wed, Jul 22, 2015 at 4:44 AM, Florent MONTHEL wrote: > Hi Frederic, > > When you have Ceph cluster with 1 node you don’t

Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in the cache pool

2015-07-22 Thread Gregory Farnum
On Sat, Jul 18, 2015 at 10:25 PM, Nick Fisk wrote: > Hi All, > > I’m doing some testing on the new High/Low speed cache tiering flushing and > I’m trying to get my head round the effect that changing these 2 settings > have on the flushing speed. When setting the osd_agent_max_ops to 1, I can

Re: [ceph-users] Clients' connection for concurrent access to ceph

2015-07-23 Thread Gregory Farnum
On Wed, Jul 22, 2015 at 8:39 PM, Shneur Zalman Mattern wrote: > Workaround... We're building now a huge computing cluster 140 computing > DISKLESS nodes and they are pulling to storage a lot of computing data > concurrently > User that put job for the cluster - need also access to the same sto

Re: [ceph-users] ceph-mon cpu usage

2015-07-23 Thread Gregory Farnum
On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito wrote: > The ceph-mon is already taking a lot of memory, and I ran a heap stats > > MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application > MALLOC: + 27597135872 (26318.7 MiB) Bytes in page

Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-23 Thread Gregory Farnum
I'm not sure. It looks like Ceph and your disk controllers are doing basically the right thing since you're going from 1GB/s to 420MB/s when moving from dd to Ceph (the full data journaling cuts it in half), but just fyi that dd task is not doing nearly the same thing as Ceph does — you'd need to u

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Gregory Farnum
On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote: > Hello, > > I'm having an issue with nginx writing to cephfs. Often I'm getting: > > writev() "/home/ceph/temp/44/94/1/119444" failed (4: Interrupted > system call) while reading upstream > > looking with strace, this happens: > > ... > wri

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-27 Thread Gregory Farnum
What's the full stack you're using to run this with? If you're using the kernel client, try updating it or switching to the userspace (ceph-fuse, or Samba built-in) client. If using userspace, please make sure you've got the latest one. -Greg On Mon, Jul 27, 2015 at 3:16 PM, Jörg Henne wrote: > H

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-27 Thread Gregory Farnum
On Mon, Jul 27, 2015 at 4:33 PM, Burkhard Linke wrote: > Hi, > > the nfs-ganesha documentation states: > > "... This FSAL links to a modified version of the CEPH library that has been > extended to expose its distributed cluster and replication facilities to the > pNFS operations in the FSAL. ...

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-27 Thread Gregory Farnum
On Mon, Jul 27, 2015 at 5:46 PM, Jörg Henne wrote: > Gregory Farnum writes: >> >> What's the full stack you're using to run this with? If you're using >> the kernel client, try updating it or switching to the userspace >> (ceph-fuse, or Samba built-i

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Gregory Farnum
On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke wrote: > Hi, > > On 07/27/2015 05:42 PM, Gregory Farnum wrote: >> >> On Mon, Jul 27, 2015 at 4:33 PM, Burkhard Linke >> wrote: >>> >>> Hi, >>> >>> the nfs-ganesha documentation states:

Re: [ceph-users] hadoop on ceph

2015-07-28 Thread Gregory Farnum
On Mon, Jul 27, 2015 at 6:34 PM, Patrick McGarry wrote: > Moving this to the ceph-user list where it has a better chance of > being answered. > > > > On Mon, Jul 27, 2015 at 5:35 AM, jingxia@baifendian.com > wrote: >> Dear , >> I have questions to ask. >> The doc says hadoop on ceph but requi

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-28 Thread Gregory Farnum
On Mon, Jul 27, 2015 at 6:25 PM, Jörg Henne wrote: > Gregory Farnum writes: >> >> Yeah, I think there were some directory listing bugs in that version >> that Samba is probably running into. They're fixed in a newer kernel >> release (I'm not sure which one

Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Gregory Farnum
On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman wrote: > > > On 07/17/2015 02:50 PM, Gregory Farnum wrote: >> >> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman >> wrote: >>> >>> Hi all, >>> >>> I've read in the doc

Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds odd. Can you create a ticket in the tracker with all the details you can remember or reconstruct? -Greg On Wed, Jul 29, 2015 at 8:34 PM Steve Taylor wrote: > I recently migrated 240 OSDs to new servers this way in a single cluster, > and it worked great. There are two additional item

Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds like you're trying to reconstruct a cluster after destroying the monitors. That is...not going to work well. The monitors define the cluster and you can't move OSDs into different clusters. We have ideas for how to reconstruct monitors and it can be done manually with a lot of hassle, b

Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
at happen. I'd have to defer to David (for OSD object extraction options) or Josh/Jason (for rbd export/import) for that, though. ceph-objectstore-tool will I think be part of your solution, but I'm not sure how much of can do on its own. What's your end goal? > > -- > P

Re: [ceph-users] Ceph File System ACL Support

2015-08-18 Thread Gregory Farnum
On Mon, Aug 17, 2015 at 4:12 AM, Yan, Zheng wrote: > On Mon, Aug 17, 2015 at 9:38 AM, Eric Eastman > wrote: >> Hi, >> >> I need to verify in Ceph v9.0.2 if the kernel version of Ceph file >> system supports ACLs and the libcephfs file system interface does not. >> I am trying to have SAMBA, versi

<    1   2   3   4   5   6   7   8   9   10   >