Re: [ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-22 Thread Gregory Farnum
On Thu, May 22, 2014 at 5:04 AM, Geert Lindemulder wrote: > Hello All > > Trying to implement the osd leveldb backend at an existing ceph test > cluster. > The test cluster was updated from 0.72.1 to 0.80.1. The update was ok. > After the update, the "osd objectstore = keyvaluestore-dev" setting w

Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-22 Thread Gregory Farnum
On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman wrote: > > - Message from Gregory Farnum - >Date: Wed, 21 May 2014 15:46:17 -0700 > > From: Gregory Farnum > Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool > To: Kenneth Waege

Re: [ceph-users] ceph.conf public network

2014-05-27 Thread Gregory Farnum
On Tue, May 27, 2014 at 9:55 AM, Ignazio Cassano wrote: > Hi all, > I read a lot of emails messages and I am confused because in some public > network in /etc/ceph/ceph.com is reported like : > public_network = a.b.c.d/netmask > in others like : > > public network = a.b.c.d/netmask These are equ

Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-27 Thread Gregory Farnum
On Sun, May 25, 2014 at 6:24 PM, Guang Yang wrote: > On May 21, 2014, at 1:33 AM, Gregory Farnum wrote: > >> This failure means the messenger subsystem is trying to create a >> thread and is getting an error code back — probably due to a process >> or system thread li

Re: [ceph-users] Is there a way to repair placement groups?

2014-05-27 Thread Gregory Farnum
Note that while the "repair" command *will* return your cluster to consistency, it is not guaranteed to restore the data you want to see there — in general, it will simply put the primary OSD's view of the world on the replicas. If you have a massive inconsistency like that, you probably want to fi

Re: [ceph-users] Is there a way to repair placement groups?

2014-05-27 Thread Gregory Farnum
r more disks? Or is the most common cause of > inconsistency most likely to not effect the primary? > > -Michael > > > On 27/05/2014 23:55, Gregory Farnum wrote: >> >> Note that while the "repair" command *will* return your cluster to >> consistency, it i

Re: [ceph-users] why use hadoop with ceph ?

2014-05-30 Thread Gregory Farnum
On Friday, May 30, 2014, Ignazio Cassano wrote: > Hi all, > I am testing ceph because I found it is very interesting as far as remote > block > device is concerned. > But my company is very interested in big data. > So I read something about hadoop and ceph integration. > Anyone can suggest me so

Re: [ceph-users] Replication

2014-05-30 Thread Gregory Farnum
Depending on what level of verification you need, you can just do a "ceph pg dump" and look to see which OSDs host every PG. If you want to demonstrate replication to a skeptical audience, sure, turn off the machines and show that data remains accessible. -Greg On Friday, May 30, 2014, wrote: >

Re: [ceph-users] RGW: Multi Part upload and resulting objects

2014-06-04 Thread Gregory Farnum
On Wed, Jun 4, 2014 at 7:58 AM, Sylvain Munaut wrote: > Hi, > > > During a multi part upload you can't upload parts smaller than 5M, and > radosgw also slices object in slices of 4M. Having those two being > different is a bit unfortunate because if you slice your files in the > minimum chunk size

Re: [ceph-users] PGs inconsistency, deep-scrub / repair won't fix (v0.80.1)

2014-06-05 Thread Gregory Farnum
On Thu, Jun 5, 2014 at 4:38 AM, Dennis Kramer wrote: > Hi all, > > A couple of weeks ago i've upgraded from emperor to firefly. > I'm using Cloudstack /w CEPH as the storage backend for VMs and templates. Which versions exactly were you and are you running? > > Since the upgrade, ceph is in a HE

Re: [ceph-users] RGW: Multi Part upload and resulting objects

2014-06-05 Thread Gregory Farnum
I don't believe that should cause any issues; the chunk sizes are in the metadata. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Jun 5, 2014 at 12:23 AM, Sylvain Munaut wrote: > Hello, > >> Huh. We took the 5MB limit from S3, but it definitely is unfortunate >> in co

Re: [ceph-users] Minimal io block in rbd

2014-06-05 Thread Gregory Farnum
There's some prefetching and stuff, but the rbd library and RADOS storage are capable of issuing reads and writes in any size (well, down to the minimal size of the underlying physical disk). There are some scenarios where you will see it writing a lot more if you use layering -- promotion of data

Re: [ceph-users] cephfs snapshots : mkdir: cannot create directory `.snap/test': Operation not permitted

2014-06-06 Thread Gregory Farnum
Snapshots are disabled by default; there's a command you can run to enable them if you want, but the reason they're disabled is because they're significantly more likely to break your filesystem than anything else is! ceph mds set allow_new_snaps true -Greg Software Engineer #42 @ http://inktank.co

Re: [ceph-users] fail to add osd to cluster

2014-06-06 Thread Gregory Farnum
I haven't used ceph-deploy to do this much, but I think you need to "prepare" before you "activate" and it looks like you haven't done so. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jun 6, 2014 at 3:54 PM, Jonathan Gowar wrote: > Assitance really appreciated. Thi

Re: [ceph-users] failed assertion on AuthMonitor

2014-06-09 Thread Gregory Farnum
Barring a newly-introduced bug (doubtful), that assert basically means that your computer lied to the ceph monitor about the durability or ordering of data going to disk, and the store is now inconsistent. If you don't have data you care about on the cluster, by far your best option is: 1) Figure o

Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Gregory Farnum
On Mon, Jun 9, 2014 at 3:22 PM, Craig Lewis wrote: > I've correlated a large deep scrubbing operation to cluster stability > problems. > > My primary cluster does a small amount of deep scrubs all the time, spread > out over the whole week. It has no stability problems. > > My secondary cluster d

Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Gregory Farnum
On Mon, Jun 9, 2014 at 6:42 PM, Mike Dawson wrote: > Craig, > > I've struggled with the same issue for quite a while. If your i/o is similar > to mine, I believe you are on the right track. For the past month or so, I > have been running this cronjob: > > * * * * * for strPg in `ceph pg dump

Re: [ceph-users] PG Selection Criteria for Deep-Scrub

2014-06-10 Thread Gregory Farnum
Hey Mike, has your manual scheduling resolved this? I think I saw another similar-sounding report, so a feature request to improve scrub scheduling would be welcome. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, May 20, 2014 at 5:46 PM, Mike Dawson wrote: > I tend

Re: [ceph-users] How to implement a rados plugin to encode/decode data while r/w

2014-06-10 Thread Gregory Farnum
On Tue, May 27, 2014 at 7:44 PM, Plato wrote: > For certain security issue, I need to make sure the data finally saved to > disk is encrypted. > So, I'm trying to write a rados class, which would be triggered to reading > and writing process. > That is, before data is written, encrypting method of

Re: [ceph-users] failed assertion on AuthMonitor

2014-06-10 Thread Gregory Farnum
h250' but it seems monitor failed while reading 'auth1'. Is > this normal? > As a side note, I did not use cephx in this cluster. > > Thanks, > > > 2014-06-09 22:11 GMT+04:30 Gregory Farnum : >> >> Barring a newly-introduced bug (doubtful),

Re: [ceph-users] MDS crash dump ?

2014-06-11 Thread Gregory Farnum
On Wednesday, June 11, 2014, Florent B wrote: > Hi every one, > > Sometimes my MDS crashes... sometimes after a few hours, sometimes after > a few days. > > I know I could enable debugging and so on to get more information. But > if it crashes after a few days, it generates gigabytes of debugging

Re: [ceph-users] Unable to remove mds

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 4:56 AM, wrote: > Hi All, > > > > I have a four node ceph cluster. The metadata service is showing as degraded > in health. How to remove the mds service from ceph ? Unfortunately you can't remove it entirely right now, but if you create a new filesystem using the "newfs"

Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola wrote: > Hi, > > we have a similar setup where we have SSD and HDD in the same hosts. > Our very basic crushmap is configured as follows: > > # ceph osd tree > # id weight type name up/down reweight > -6 3 root ssd > 3 1 osd.3 up 1 > 4 1 osd.4 up 1

Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Gregory Farnum
On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER wrote: > Hi, > > I'm reading tiering doc here > http://ceph.com/docs/firefly/dev/cache-pool/ > > " > The hit_set_count and hit_set_period define how much time each HitSet should > cover, and how many such HitSets to store. Binning accesses over

Re: [ceph-users] tiering : hit_set_count && hit_set_period memory usage ?

2014-06-11 Thread Gregory Farnum
er to cache tier ? (cache-mode writeback) > Does any read on base tier promote the object in the cache tier ? > Or they are also statistics on the base tier ? > > (I tell the question, because I have cold datas, but I have full backups > jobs running each week, reading all theses col

Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph

2014-06-12 Thread Gregory Farnum
On Thu, Jun 12, 2014 at 2:21 AM, VELARTIS Philipp Dürhammer wrote: > Hi, > > Will ceph support mixing different disk pools (example spinners and ssds) in > the future a little bit better (more safe)? There are no immediate plans to do so, but this is an extension to the CRUSH language that we're

[ceph-users] error (24) Too many open files

2014-06-12 Thread Gregory Farnum
You probably just want to increase the ulimit settings. You can change the OSD setting, but that only covers file descriptors against the backing store, not sockets for network communication -- the latter is more often the one that runs out. -Greg On Thursday, June 12, 2014, Christian Kauhaus > wr

Re: [ceph-users] [ceph] OSD priority / client localization

2014-06-12 Thread Gregory Farnum
You can set up pools which have all their primaries in one data center, and point the clients at those pools. But writes will still have to traverse the network link because Ceph does synchronous replication for strong consistency. If you want them to both write to the same pool, but use local OSD

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Gregory Farnum
To be clear, that's the solution to one of the causes of this issue. The log message is very general, and just means that a disk access thread has been gone for a long time (15 seconds, in this case) without checking in (so usually, it's been inside of a read/write syscall for >=15 seconds). Other

Re: [ceph-users] Fixing inconsistent placement groups

2014-06-12 Thread Gregory Farnum
The OSD should have logged the identities of the inconsistent objects to the central log on the monitors, as well as to its own local log file. You'll need to identify for yourself which version is correct, which will probably involve going and looking at them inside each OSD's data store. If the p

Re: [ceph-users] Run ceph from source code

2014-06-13 Thread Gregory Farnum
I don't know anybody who makes much use of "make install", so it's probably not putting the init system scripts into place. So make sure they aren't there, copy them from the source tree, and try again? Patches to fix are welcome! :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.co

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Gregory Farnum
The OSD did a read off of the local filesystem and it got back the EIO error code. That means the store got corrupted or something, so it killed itself to avoid spreading bad data to the rest of the cluster. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jun 13, 2014 a

Re: [ceph-users] OSD turned itself off

2014-06-13 Thread Gregory Farnum
ah. > > What's best practice when the store is corrupted like this? Remove the OSD from the cluster, and either reformat the disk or replace as you judge appropriate. -Greg > > Cheers, > Josef > > Gregory Farnum skrev 2014-06-14 02:21: > >> The OSD did a r

Re: [ceph-users] Fixing inconsistent placement groups

2014-06-16 Thread Gregory Farnum
| http://ceph.com > > It is still unclear, where these inconsistencies (i.e. missing objects > / empty directories) result from, see also: > http://tracker.ceph.com/issues/8532. > > On Fri, Jun 13, 2014 at 4:58 AM, Gregory Farnum wrote: >> The OSD should have logged the i

Re: [ceph-users] Fixing inconsistent placement groups

2014-06-16 Thread Gregory Farnum
On Mon, Jun 16, 2014 at 11:11 AM, Aaron Ten Clay wrote: > I would also like to see Ceph get smarter about inconsistent PGs. If we > can't automate the repair, at least the "ceph pg repair" command should > figure out which copy is correct and use that, instead of overwriting all > OSDs with whatev

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Gregory Farnum
Try running "ceph health detail" on each of the monitors. Your disk space thresholds probably aren't configured correctly or something. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic wrote: > Hi, > > thanks for that, but is not

Re: [ceph-users] Data versus used space inconsistency

2014-06-17 Thread Gregory Farnum
You probably have sparse objects from RBD. The PG statistics are built off of file size, but the total data used spaces are looking at df output. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 16, 2014 at 7:34 PM, Christian Balzer wrote: > > Hello, > > this is is

Re: [ceph-users] Question about RADOS object consistency

2014-06-17 Thread Gregory Farnum
On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote: > Hi list, > > How does RADOS check an object and its replica are consistent? Is there > a checksum in object's metadata or some other mechanisms? Does the > mechanism depend on > OSD's underlying file system? It does not check consistency on rea

Re: [ceph-users] Adding private network AFTER cluster creation ?

2014-06-17 Thread Gregory Farnum
On Tue, Jun 17, 2014 at 5:00 AM, Florent B wrote: > Hi all, > > I would like to know if I can add a private network to my running Ceph > cluster ? > > And how to proceed ? I add the config to ceph.conf, then restart osd's ? > So, some OSD will have both networks and others not. Yeah. As long as t

Re: [ceph-users] cephx authentication issue

2014-06-17 Thread Gregory Farnum
It's unlikely to be the issue, but you might check the times on your OSDs. cephx is clock-sensitive if you're off by more than an hour or two. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 8:30 AM, Fred Yang wrote: > What's strange is OSD rebalance

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Gregory Farnum
have folowing in > each ceph.conf file, under the general section: > > mon data avail warn = 15 > mon data avail crit = 5 > > I found this settings on ceph mailing list... > > Thanks a lot, > Andrija > > > On 17 June 2014 19:22, Gregory Farnum wrote: >>

Re: [ceph-users] Adding private network AFTER cluster creation ?

2014-06-18 Thread Gregory Farnum
route to the monitors. > > Does monitors need restart ? Not from Ceph's perspective! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > On 06/17/2014 07:29 PM, Gregory Farnum wrote: >> On Tue, Jun 17, 2014 at 5:00 AM, Florent B wrote: >>> Hi al

Re: [ceph-users] Question about RADOS object consistency

2014-06-18 Thread Gregory Farnum
On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin wrote: > 2014-06-18 1:28 GMT+08:00 Gregory Farnum : >> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote: >>> Hi list, >>> >>> How does RADOS check an object and its replica are consistent? Is there >>>

Re: [ceph-users] Cache tier pool in CephFS

2014-06-18 Thread Gregory Farnum
On Wed, Jun 18, 2014 at 12:54 AM, Sherry Shahbazi wrote: > Hi everyone, > > If I have a pool called cold-storage (1) and a pool called hot-storage (2) > that hot-storage is a cache tier for the cold-storage. > > I normally do the followings in order to map a directory in my client to a > pool. > >

Re: [ceph-users] Adding private network AFTER cluster creation ?

2014-06-18 Thread Gregory Farnum
Yeah, the OSDs connect to the monitors over the OSD's public address. Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Jun 18, 2014 at 11:37 AM, Florent B wrote: > On 06/18/2014 04:34 PM, Gregory Farnum wrote: >> On Tue, Jun 17, 2014 at 4:08 PM, Florent B wr

Re: [ceph-users] Question about RADOS object consistency

2014-06-18 Thread Gregory Farnum
On Wed, Jun 18, 2014 at 12:07 PM, Ke-fei Lin wrote: > 2014-06-18 22:44 GMT+08:00 Gregory Farnum : >> On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin wrote: >>> 2014-06-18 1:28 GMT+08:00 Gregory Farnum : >>>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote: >&

Re: [ceph-users] Level DB with RADOS

2014-06-18 Thread Gregory Farnum
On Wed, Jun 18, 2014 at 9:14 PM, Shesha Sreenivasamurthy wrote: > I am doing some research work at UCSC and wanted use LevelDB to store OMAP > key/value pairs. What is the best way to start playing with it. I am a > newbie to RADOS/CEPH code. Can any one point me in the right direction ? I'm not

Re: [ceph-users] understanding rados df statistics

2014-06-19 Thread Gregory Farnum
The total used/available/capacity is calculated by running the syscall which "df" uses across all OSDs and summing the results. The "total data" is calculated by summing the sizes of the objects stored. It depends on how you've configured your system, but I'm guessing the markup is due to the (con

Re: [ceph-users] Cache tier pool in CephFS

2014-06-19 Thread Gregory Farnum
my PGs are clean+active! By the way, I disabled CephX. > > Thanks in advance, > Sherry > > > > > On Thursday, June 19, 2014 3:16 AM, Gregory Farnum > wrote: > > > On Wed, Jun 18, 2014 at 12:54 AM, Sherry Shahbazi > wrote: > > Hi everyone, > > >

Re: [ceph-users] switch pool from replicated to erasure coded

2014-06-19 Thread Gregory Farnum
On Thursday, June 19, 2014, Pavel V. Kaygorodov wrote: > Hi! > > May be I have missed something in docs, but is there a way to switch a > pool from replicated to erasure coded? No. > Or I have to create a new pool an somehow manually transfer data from old > pool to new one? Yes. Please kee

Re: [ceph-users] understanding rados df statistics

2014-06-19 Thread Gregory Farnum
gement that we should be? > > > > > > George > > > > *From:* Gregory Farnum [mailto:g...@inktank.com > ] > *Sent:* 19 June 2014 13:53 > *To:* Ryall, George (STFC,RAL,SC) > *Cc:* ceph-users@lists.ceph.com > > *Subject:* Re: [ceph-users] understanding ra

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-19 Thread Gregory Farnum
No, you definitely don't need to shut down the whole cluster. Just do a polite shutdown of the daemons, optionally with the noout flag that Wido mentioned. Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Jun 19, 2014 at 1:55 PM, Alphe Salas Michels wrote: > Hello, the best p

Re: [ceph-users] Level DB with RADOS

2014-06-23 Thread Gregory Farnum
> RADOS code in which OMAP uses LevelDB. I am a newbie hence the question. > > > On Wed, Jun 18, 2014 at 7:28 PM, Gregory Farnum wrote: >> >> On Wed, Jun 18, 2014 at 9:14 PM, Shesha Sreenivasamurthy >> wrote: >> > I am doing some research work at UCSC and w

Re: [ceph-users] Multiple hierarchies and custom placement

2014-06-23 Thread Gregory Farnum
On Fri, Jun 20, 2014 at 4:23 PM, Shayan Saeed wrote: > Is it allowed for crush maps to have multiple hierarchies for different > pools. So for example, I want one pool to treat my cluster as flat with > every host being equal but the other pool to have a more hierarchical idea > as hosts->racks->r

Re: [ceph-users] Deep scrub versus osd scrub load threshold

2014-06-23 Thread Gregory Farnum
Looks like it's a doc error (at least on master), but it might have changed over time. If you're running Dumpling we should change the docs. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Jun 22, 2014 at 10:18 PM, Christian Balzer wrote: > > Hello, > > This weekend I

Re: [ceph-users] trying to interpret lines in osd.log

2014-06-23 Thread Gregory Farnum
On Mon, Jun 23, 2014 at 4:26 AM, Christian Kauhaus wrote: > I see several instances of the following log messages in the OSD logs each > day: > > 2014-06-21 02:05:27.740697 7fbc58b78700 0 -- 172.22.8.12:6810/31918 >> > 172.22.8.12:6800/28827 pipe(0x7fbe400029f0 sd=764 :6810 s=0 pgs=0 cs=0 l=0 >

Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-23 Thread Gregory Farnum
On Mon, Jun 23, 2014 at 4:54 AM, Christian Eichelmann wrote: > Hi ceph users, > > since our cluster had a few inconsistent pgs in the last time, i was > wondering what ceph pg repair does, depending on the replication level. > So I just wanted to check if my assumptions are correct: > > Replicatio

Re: [ceph-users] Multiple hierarchies and custom placement

2014-06-24 Thread Gregory Farnum
; > On Mon, Jun 23, 2014 at 2:14 PM, Gregory Farnum > wrote: > >> On Fri, Jun 20, 2014 at 4:23 PM, Shayan Saeed > > wrote: >> > Is it allowed for crush maps to have multiple hierarchies for different >> > pools. So for example, I want one pool to treat my cluste

Re: [ceph-users] Continuing placement group problems

2014-06-25 Thread Gregory Farnum
You probably want to look at the central log (on your monitors) and see exactly what scrub errors it's reporting. There might also be useful info if you dump the pg info on the inconsistent PGs. But if you're getting this frequently, you're either hitting some unknown issues with the OSDs around so

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-25 Thread Gregory Farnum
Unfortunately Yehuda's out for a while as he could best handle this, but it sounds familiar so I think you probably want to search the list archives and the bug tracker (http://tracker.ceph.com/projects/rgw). What version precisely are you on? -Greg Software Engineer #42 @ http://inktank.com | http

Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-25 Thread Gregory Farnum
On Wed, Jun 25, 2014 at 12:22 AM, Christian Kauhaus wrote: > Am 23.06.2014 20:24, schrieb Gregory Farnum: >> Well, actually it always takes the primary copy, unless the primary >> has some way of locally telling that its version is corrupt. (This >> might happen if the pri

Re: [ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-06-25 Thread Gregory Farnum
Sorry we let this drop; we've all been busy traveling and things. There have been a lot of changes to librados between Dumpling and Firefly, but we have no idea what would have made it slower. Can you provide more details about how you were running these tests? -Greg Software Engineer #42 @ http:/

Re: [ceph-users] Difference between "ceph osd reweight" and "ceph osd crush reweight"

2014-06-26 Thread Gregory Farnum
On Thu, Jun 26, 2014 at 7:03 AM, Micha Krause wrote: > Hi, > > could someone explain to me what the difference is between > > ceph osd reweight > > and > > ceph osd crush reweight "ceph osd crush reweight" sets the CRUSH weight of the OSD. This weight is an arbitrary value (generally the size of

Re: [ceph-users] Continuing placement group problems

2014-06-26 Thread Gregory Farnum
On Thu, Jun 26, 2014 at 12:52 PM, Kevin Horan wrote: > I am also getting inconsistent object errors on a regular basis, about 1-2 > every week or so for about 300GB of data. All OSDs are using XFS > filesystems. Some OSDs are individual 3TB internal hard drives and some are > external FC attached

Re: [ceph-users] Difference between "ceph osd reweight" and "ceph osd crush reweight"

2014-06-27 Thread Gregory Farnum
Yep, definitely use "osd crush reweigh" for your permanent data placement. Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jun 27, 2014 at 12:13 AM, Micha Krause wrote: > Hi, > > >> "ceph osd crush reweight" sets the CRUSH weight of the OSD. This >> weight is an arbitrary va

Re: [ceph-users] OSD Data not evenly distributed

2014-06-28 Thread Gregory Farnum
Did you also increase the "pgp_num"? On Saturday, June 28, 2014, Jianing Yang wrote: > Actually, I did increase PG number to 32768 (120 osds) and I also use > "tunable optimal". But the data still not distribute evenly. > > > On Sun, Jun 29, 2014 at 3:42 AM, Konrad Gutkowski > wrote: > >> Hi, >

Re: [ceph-users] OSD backfill full tunings

2014-06-30 Thread Gregory Farnum
It looks like that value isn't live-updateable, so you'd need to restart after changing the daemon's config. Sorry! Made a ticket: http://tracker.ceph.com/issues/8695 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 30, 2014 at 12:41 AM, Kostis Fardelas wrote: > Hi,

Re: [ceph-users] CephFS : directory sharding ?

2014-06-30 Thread Gregory Farnum
Directory sharding is even less stable than the rest of the MDS, but if you need it I have some hope that things willow work. You just need to set the "mds bal frag" option to "true". You can configure the limits as well; see the options following: https://github.com/ceph/ceph/blob/master/src/commo

Re: [ceph-users] Some OSD and MDS crash

2014-06-30 Thread Gregory Farnum
What's the backtrace from the crashing OSDs? Keep in mind that as a dev release, it's generally best not to upgrade to unnamed versions like 0.82 (but it's probably too late to go back now). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 30, 2014 at 8:06 AM, Pierr

Re: [ceph-users] CephFS : directory sharding ?

2014-06-30 Thread Gregory Farnum
it's running to force fragments into a specific MDS. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 30, 2014 at 8:51 AM, Florent B wrote: > Ok thank you. So it is not possible to set a specific directory assigned > to a MDS ? > > On 06/30/201

Re: [ceph-users] OSD backfill full tunings

2014-06-30 Thread Gregory Farnum
ble (0.80.1). It may be that > during recovery OSDs are currently backfilling other pgs, so stats are > not updated (because pg were not tried to backfill after setting change). > > On 2014.06.30 18:31, Gregory Farnum wrote: >> It looks like that value isn't live-updateable, so y

Re: [ceph-users] iscsi and cache pool

2014-07-01 Thread Gregory Farnum
It looks like you're using a kernel RBD mount in the second case? I imagine your kernel doesn't support caching pools and you'd need to upgrade for it to work. -Greg On Tuesday, July 1, 2014, Никитенко Виталий wrote: > Good day! > I have server with Ubunu 14.04 and installed ceph firefly. Config

Re: [ceph-users] HEALTH_WARN active+degraded on fresh install CENTOS 6.5

2014-07-01 Thread Gregory Farnum
What's the output of "ceph osd map"? Your CRUSH map probably isn't trying to segregate properly, with 2 hosts and 4 OSDs each. Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jul 1, 2014 at 11:22 AM, Brian Lovett wrote: > I'm pulling my hair out with ceph. I am testing thin

Re: [ceph-users] HEALTH_WARN active+degraded on fresh install CENTOS 6.5

2014-07-01 Thread Gregory Farnum
SDs report each other down much more quickly (~30s) than the monitor timeout (~15 minutes). They'd get marked down eventually. On Tue, Jul 1, 2014 at 11:43 AM, Brian Lovett wrote: > Gregory Farnum writes: > >> >> What's the output of "ceph osd map"? >> >

Re: [ceph-users] HEALTH_WARN active+degraded on fresh install CENTOS 6.5

2014-07-01 Thread Gregory Farnum
On Tue, Jul 1, 2014 at 11:45 AM, Gregory Farnum wrote: > On Tue, Jul 1, 2014 at 11:33 AM, Brian Lovett > wrote: >> Brian Lovett writes: >> >> >> I restarted all of the osd's and noticed that ceph shows 2 osd's up even if >> the servers are complet

Re: [ceph-users] HEALTH_WARN active+degraded on fresh install CENTOS 6.5

2014-07-01 Thread Gregory Farnum
On Tue, Jul 1, 2014 at 11:57 AM, Brian Lovett wrote: > Gregory Farnum writes: > >> ...and one more time, because apparently my brain's out to lunch today: >> >> ceph osd tree >> >> *sigh* >> > > haha, we all have those days. > > [root

Re: [ceph-users] HEALTH_WARN active+degraded on fresh install CENTOS 6.5

2014-07-01 Thread Gregory Farnum
On Tue, Jul 1, 2014 at 1:26 PM, Brian Lovett wrote: > "profile": "bobtail", Okay. That's unusual. What's the oldest client you need to support, and what Ceph version are you using? You probably want to set the crush tunables to "optimal"; the "bobtail" ones are going to have all kinds of is

Re: [ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-07-01 Thread Gregory Farnum
On Thu, Jun 26, 2014 at 11:49 PM, Stefan Priebe - Profihost AG wrote: > Hi Greg, > > Am 26.06.2014 02:17, schrieb Gregory Farnum: >> Sorry we let this drop; we've all been busy traveling and things. >> >> There have been a lot of changes to librados between Dumplin

Re: [ceph-users] iscsi and cache pool

2014-07-01 Thread Gregory Farnum
com | http://ceph.com On Tue, Jul 1, 2014 at 5:44 PM, Никитенко Виталий wrote: > Hi! > > There is some option in the kernel which must be included, or just upgrade > to the latest version of the kernel? I use 3.13.0-24 > > Thanks > > 01.07.2014, 20:17, "Gregory Farnum&quo

Re: [ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-07-02 Thread Gregory Farnum
;t any counters. As this mail was some days unseen - i > thought nobody has an idea or could help. > > Stefan > >> On Wed, Jul 2, 2014 at 9:01 PM, Stefan Priebe - Profihost AG >> wrote: >>> Am 02.07.2014 00:51, schrieb Gregory Farnum: >>>> On Thu

Re: [ceph-users] Issues upgrading from 0.72.x (emperor) to 0.81.x (firefly)

2014-07-02 Thread Gregory Farnum
On Wed, Jul 2, 2014 at 6:18 AM, Sylvain Munaut wrote: > Hi, > > > I'm having a couple of issues during this update. On the test cluster > it went fine, but when running it on production I have a few issues. > (I guess there is some subtle difference I missed, I updated the test > one back when emp

Re: [ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-07-02 Thread Gregory Farnum
On Wed, Jul 2, 2014 at 12:00 PM, Stefan Priebe wrote: > > Am 02.07.2014 16:00, schrieb Gregory Farnum: > >> Yeah, it's fighting for attention with a lot of other urgent stuff. :( >> >> Anyway, even if you can't look up any details or reproduce at this >

Re: [ceph-users] Why is librbd1 / librados2 from Firefly 20% slower than the one from dumpling?

2014-07-02 Thread Gregory Farnum
On Wed, Jul 2, 2014 at 12:44 PM, Stefan Priebe wrote: > Hi Greg, > > Am 02.07.2014 21:36, schrieb Gregory Farnum: >> >> On Wed, Jul 2, 2014 at 12:00 PM, Stefan Priebe >> wrote: >>> >>> >>> Am 02.07.2014 16:00, schrieb Gregory Farnum: >>

Re: [ceph-users] RGW performance test , put 30 thousands objects to one bucket, average latency 3 seconds

2014-07-03 Thread Gregory Farnum
It looks like you're just putting in data faster than your cluster can handle (in terms of IOPS). The first big hole (queue_op_wq->reached_pg) is it sitting in a queue and waiting for processing. The second parallel blocks are 1) write_thread_in_journal_buffer->journaled_completion_queued, and that

Re: [ceph-users] Pools do not respond

2014-07-03 Thread Gregory Farnum
The PG in question isn't being properly mapped to any OSDs. There's a good chance that those trees (with 3 OSDs in 2 hosts) aren't going to map well anyway, but the immediate problem should resolve itself if you change the "choose" to "chooseleaf" in your rules. -Greg Software Engineer #42 @ http:/

Re: [ceph-users] Bypass Cache-Tiering for special reads (Backups)

2014-07-03 Thread Gregory Farnum
On Wed, Jul 2, 2014 at 3:06 PM, Marc wrote: > Hi, > > I was wondering, having a cache pool in front of an RBD pool is all fine > and dandy, but imagine you want to pull backups of all your VMs (or one > of them, or multiple...). Going to the cache for all those reads isn't > only pointless, it'll

Re: [ceph-users] why lock th whole osd handle thread

2014-07-03 Thread Gregory Farnum
On Thu, Jul 3, 2014 at 8:24 AM, baijia...@126.com wrote: > when I see the function "OSD::OpWQ::_process ". I find pg lock locks the > whole function. so when I use multi-thread write the same object , so are > they must > serialize from osd handle thread to journal write thread ? It's serialized

Re: [ceph-users] Pools do not respond

2014-07-03 Thread Gregory Farnum
On Thu, Jul 3, 2014 at 11:17 AM, Iban Cabrillo wrote: > Hi Gregory, > Thanks a lot I begin to understand who ceph works. > I add a couple of osd servers, and balance the disk between them. > > > [ceph@cephadm ceph-cloud]$ sudo ceph osd tree > # idweighttype nameup/downreweight

Re: [ceph-users] Error initializing cluster client: Error

2014-07-07 Thread Gregory Farnum
Do you have a ceph.conf file that the "ceph" tool can access in a known location? Try specifying it manually with the "-c ceph.conf" argument. You can also add "--debug-ms 1, --debug-monc 10" and see if it outputs more useful error logs. -Greg Software Engineer #42 @ http://inktank.com | http://cep

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
What was the exact sequence of events — were you rebalancing when you did the upgrade? Did the marked out OSDs get upgraded? Did you restart all the monitors prior to changing the tunables? (Are you *sure*?) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Jul 5, 2014 at

Re: [ceph-users] emperor -> firefly : Significant increase in RAM usage

2014-07-07 Thread Gregory Farnum
We don't test explicitly for this, but I'm surprised to hear about a jump of that magnitude. Do you have any more detailed profiling? Can you generate some? (With the tcmalloc heap dumps.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jul 7, 2014 at 3:03 AM, Sylvain Mu

Re: [ceph-users] clear active+degraded pgs

2014-07-07 Thread Gregory Farnum
CRUSH is a probabilistic algorithm. By having all those non-existent OSDs in the map, you made it so that 10/12 attempts at mapping would fail and need to be retried. CRUSH handles a lot of retries, but not enough for that to work out well. -Greg Software Engineer #42 @ http://inktank.com | http://

Re: [ceph-users] Temporary degradation when adding OSD's

2014-07-07 Thread Gregory Farnum
On Mon, Jul 7, 2014 at 7:03 AM, Erik Logtenberg wrote: > Hi, > > If you add an OSD to an existing cluster, ceph will move some existing > data around so the new OSD gets its respective share of usage right away. > > Now I noticed that during this moving around, ceph reports the relevant > PG's as

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
Okay. Based on your description I think the reason for the tunables crashes is that either the "out" OSDs, or possibly one of the monitors, never got restarted. You should be able to update the tunables now, if you want to. (Or there's also a config option that will disable the warning; check the r

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
On Mon, Jul 7, 2014 at 4:21 PM, James Harper wrote: >> >> Okay. Based on your description I think the reason for the tunables >> crashes is that either the "out" OSDs, or possibly one of the >> monitors, never got restarted. You should be able to update the >> tunables now, if you want to. (Or the

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
You can look at which OSDs the PGs map to. If the PGs have insufficient replica counts they'll report as degraded in "ceph -s" or "ceph -w". Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jul 7, 2014 at 4:30 PM, James Harper wrote: >> >> It sounds like maybe you've got a ba

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
On Mon, Jul 7, 2014 at 4:39 PM, James Harper wrote: >> >> You can look at which OSDs the PGs map to. If the PGs have >> insufficient replica counts they'll report as degraded in "ceph -s" or >> "ceph -w". > > I meant in a general sense. If I have a pg that I suspect might be > insufficiently redu

Re: [ceph-users] scrub error on firefly

2014-07-07 Thread Gregory Farnum
It's not very intuitive or easy to look at right now (there are plans from the recent developer summit to improve things), but the central log should have output about exactly what objects are busted. You'll then want to compare the copies manually to determine which ones are good or bad, get the g

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Gregory Farnum
The impact won't be 300 times bigger, but it will be bigger. There are two things impacting your cluster here 1) the initial "split" of the affected PGs into multiple child PGs. You can mitigate this by stepping through pg_num at small multiples. 2) the movement of data to its new location (when yo

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Gregory Farnum
On Tue, Jul 8, 2014 at 10:14 AM, Dan Van Der Ster wrote: > Hi Greg, > We're also due for a similar splitting exercise in the not too distant > future, and will also need to minimize the impact on latency. > > In addition to increasing pg_num in small steps and using a minimal > max_backfills/recov

  1   2   3   4   5   6   7   8   9   10   >