On Thu, May 22, 2014 at 5:04 AM, Geert Lindemulder wrote:
> Hello All
>
> Trying to implement the osd leveldb backend at an existing ceph test
> cluster.
> The test cluster was updated from 0.72.1 to 0.80.1. The update was ok.
> After the update, the "osd objectstore = keyvaluestore-dev" setting w
On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
wrote:
>
> - Message from Gregory Farnum -
>Date: Wed, 21 May 2014 15:46:17 -0700
>
> From: Gregory Farnum
> Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
> To: Kenneth Waege
On Tue, May 27, 2014 at 9:55 AM, Ignazio Cassano
wrote:
> Hi all,
> I read a lot of emails messages and I am confused because in some public
> network in /etc/ceph/ceph.com is reported like :
> public_network = a.b.c.d/netmask
> in others like :
>
> public network = a.b.c.d/netmask
These are equ
On Sun, May 25, 2014 at 6:24 PM, Guang Yang wrote:
> On May 21, 2014, at 1:33 AM, Gregory Farnum wrote:
>
>> This failure means the messenger subsystem is trying to create a
>> thread and is getting an error code back — probably due to a process
>> or system thread li
Note that while the "repair" command *will* return your cluster to
consistency, it is not guaranteed to restore the data you want to see
there — in general, it will simply put the primary OSD's view of the
world on the replicas. If you have a massive inconsistency like that,
you probably want to fi
r more disks? Or is the most common cause of
> inconsistency most likely to not effect the primary?
>
> -Michael
>
>
> On 27/05/2014 23:55, Gregory Farnum wrote:
>>
>> Note that while the "repair" command *will* return your cluster to
>> consistency, it i
On Friday, May 30, 2014, Ignazio Cassano wrote:
> Hi all,
> I am testing ceph because I found it is very interesting as far as remote
> block
> device is concerned.
> But my company is very interested in big data.
> So I read something about hadoop and ceph integration.
> Anyone can suggest me so
Depending on what level of verification you need, you can just do a "ceph
pg dump" and look to see which OSDs host every PG. If you want to
demonstrate replication to a skeptical audience, sure, turn off the
machines and show that data remains accessible.
-Greg
On Friday, May 30, 2014, wrote:
>
On Wed, Jun 4, 2014 at 7:58 AM, Sylvain Munaut
wrote:
> Hi,
>
>
> During a multi part upload you can't upload parts smaller than 5M, and
> radosgw also slices object in slices of 4M. Having those two being
> different is a bit unfortunate because if you slice your files in the
> minimum chunk size
On Thu, Jun 5, 2014 at 4:38 AM, Dennis Kramer wrote:
> Hi all,
>
> A couple of weeks ago i've upgraded from emperor to firefly.
> I'm using Cloudstack /w CEPH as the storage backend for VMs and templates.
Which versions exactly were you and are you running?
>
> Since the upgrade, ceph is in a HE
I don't believe that should cause any issues; the chunk sizes are in
the metadata.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Jun 5, 2014 at 12:23 AM, Sylvain Munaut
wrote:
> Hello,
>
>> Huh. We took the 5MB limit from S3, but it definitely is unfortunate
>> in co
There's some prefetching and stuff, but the rbd library and RADOS storage
are capable of issuing reads and writes in any size (well, down to the
minimal size of the underlying physical disk).
There are some scenarios where you will see it writing a lot more if you
use layering -- promotion of data
Snapshots are disabled by default; there's a command you can run to
enable them if you want, but the reason they're disabled is because
they're significantly more likely to break your filesystem than
anything else is!
ceph mds set allow_new_snaps true
-Greg
Software Engineer #42 @ http://inktank.co
I haven't used ceph-deploy to do this much, but I think you need to
"prepare" before you "activate" and it looks like you haven't done so.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Jun 6, 2014 at 3:54 PM, Jonathan Gowar wrote:
> Assitance really appreciated. Thi
Barring a newly-introduced bug (doubtful), that assert basically means
that your computer lied to the ceph monitor about the durability or
ordering of data going to disk, and the store is now inconsistent. If
you don't have data you care about on the cluster, by far your best
option is:
1) Figure o
On Mon, Jun 9, 2014 at 3:22 PM, Craig Lewis wrote:
> I've correlated a large deep scrubbing operation to cluster stability
> problems.
>
> My primary cluster does a small amount of deep scrubs all the time, spread
> out over the whole week. It has no stability problems.
>
> My secondary cluster d
On Mon, Jun 9, 2014 at 6:42 PM, Mike Dawson wrote:
> Craig,
>
> I've struggled with the same issue for quite a while. If your i/o is similar
> to mine, I believe you are on the right track. For the past month or so, I
> have been running this cronjob:
>
> * * * * * for strPg in `ceph pg dump
Hey Mike, has your manual scheduling resolved this? I think I saw
another similar-sounding report, so a feature request to improve scrub
scheduling would be welcome. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, May 20, 2014 at 5:46 PM, Mike Dawson wrote:
> I tend
On Tue, May 27, 2014 at 7:44 PM, Plato wrote:
> For certain security issue, I need to make sure the data finally saved to
> disk is encrypted.
> So, I'm trying to write a rados class, which would be triggered to reading
> and writing process.
> That is, before data is written, encrypting method of
h250' but it seems monitor failed while reading 'auth1'. Is
> this normal?
> As a side note, I did not use cephx in this cluster.
>
> Thanks,
>
>
> 2014-06-09 22:11 GMT+04:30 Gregory Farnum :
>>
>> Barring a newly-introduced bug (doubtful),
On Wednesday, June 11, 2014, Florent B wrote:
> Hi every one,
>
> Sometimes my MDS crashes... sometimes after a few hours, sometimes after
> a few days.
>
> I know I could enable debugging and so on to get more information. But
> if it crashes after a few days, it generates gigabytes of debugging
On Wed, Jun 11, 2014 at 4:56 AM, wrote:
> Hi All,
>
>
>
> I have a four node ceph cluster. The metadata service is showing as degraded
> in health. How to remove the mds service from ceph ?
Unfortunately you can't remove it entirely right now, but if you
create a new filesystem using the "newfs"
On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola wrote:
> Hi,
>
> we have a similar setup where we have SSD and HDD in the same hosts.
> Our very basic crushmap is configured as follows:
>
> # ceph osd tree
> # id weight type name up/down reweight
> -6 3 root ssd
> 3 1 osd.3 up 1
> 4 1 osd.4 up 1
On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER
wrote:
> Hi,
>
> I'm reading tiering doc here
> http://ceph.com/docs/firefly/dev/cache-pool/
>
> "
> The hit_set_count and hit_set_period define how much time each HitSet should
> cover, and how many such HitSets to store. Binning accesses over
er to cache tier ? (cache-mode writeback)
> Does any read on base tier promote the object in the cache tier ?
> Or they are also statistics on the base tier ?
>
> (I tell the question, because I have cold datas, but I have full backups
> jobs running each week, reading all theses col
On Thu, Jun 12, 2014 at 2:21 AM, VELARTIS Philipp Dürhammer
wrote:
> Hi,
>
> Will ceph support mixing different disk pools (example spinners and ssds) in
> the future a little bit better (more safe)?
There are no immediate plans to do so, but this is an extension to the
CRUSH language that we're
You probably just want to increase the ulimit settings. You can change the
OSD setting, but that only covers file descriptors against the backing
store, not sockets for network communication -- the latter is more often
the one that runs out.
-Greg
On Thursday, June 12, 2014, Christian Kauhaus > wr
You can set up pools which have all their primaries in one data
center, and point the clients at those pools. But writes will still
have to traverse the network link because Ceph does synchronous
replication for strong consistency.
If you want them to both write to the same pool, but use local OSD
To be clear, that's the solution to one of the causes of this issue.
The log message is very general, and just means that a disk access
thread has been gone for a long time (15 seconds, in this case)
without checking in (so usually, it's been inside of a read/write
syscall for >=15 seconds).
Other
The OSD should have logged the identities of the inconsistent objects
to the central log on the monitors, as well as to its own local log
file. You'll need to identify for yourself which version is correct,
which will probably involve going and looking at them inside each
OSD's data store. If the p
I don't know anybody who makes much use of "make install", so it's
probably not putting the init system scripts into place. So make sure
they aren't there, copy them from the source tree, and try again?
Patches to fix are welcome! :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.co
The OSD did a read off of the local filesystem and it got back the EIO
error code. That means the store got corrupted or something, so it
killed itself to avoid spreading bad data to the rest of the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Jun 13, 2014 a
ah.
>
> What's best practice when the store is corrupted like this?
Remove the OSD from the cluster, and either reformat the disk or
replace as you judge appropriate.
-Greg
>
> Cheers,
> Josef
>
> Gregory Farnum skrev 2014-06-14 02:21:
>
>> The OSD did a r
| http://ceph.com
>
> It is still unclear, where these inconsistencies (i.e. missing objects
> / empty directories) result from, see also:
> http://tracker.ceph.com/issues/8532.
>
> On Fri, Jun 13, 2014 at 4:58 AM, Gregory Farnum wrote:
>> The OSD should have logged the i
On Mon, Jun 16, 2014 at 11:11 AM, Aaron Ten Clay wrote:
> I would also like to see Ceph get smarter about inconsistent PGs. If we
> can't automate the repair, at least the "ceph pg repair" command should
> figure out which copy is correct and use that, instead of overwriting all
> OSDs with whatev
Try running "ceph health detail" on each of the monitors. Your disk space
thresholds probably aren't configured correctly or something.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic
wrote:
> Hi,
>
> thanks for that, but is not
You probably have sparse objects from RBD. The PG statistics are built
off of file size, but the total data used spaces are looking at df
output.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jun 16, 2014 at 7:34 PM, Christian Balzer wrote:
>
> Hello,
>
> this is is
On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote:
> Hi list,
>
> How does RADOS check an object and its replica are consistent? Is there
> a checksum in object's metadata or some other mechanisms? Does the
> mechanism depend on
> OSD's underlying file system?
It does not check consistency on rea
On Tue, Jun 17, 2014 at 5:00 AM, Florent B wrote:
> Hi all,
>
> I would like to know if I can add a private network to my running Ceph
> cluster ?
>
> And how to proceed ? I add the config to ceph.conf, then restart osd's ?
> So, some OSD will have both networks and others not.
Yeah. As long as t
It's unlikely to be the issue, but you might check the times on your OSDs.
cephx is clock-sensitive if you're off by more than an hour or two.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jun 17, 2014 at 8:30 AM, Fred Yang wrote:
> What's strange is OSD rebalance
have folowing in
> each ceph.conf file, under the general section:
>
> mon data avail warn = 15
> mon data avail crit = 5
>
> I found this settings on ceph mailing list...
>
> Thanks a lot,
> Andrija
>
>
> On 17 June 2014 19:22, Gregory Farnum wrote:
>>
route to the monitors.
>
> Does monitors need restart ?
Not from Ceph's perspective!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> On 06/17/2014 07:29 PM, Gregory Farnum wrote:
>> On Tue, Jun 17, 2014 at 5:00 AM, Florent B wrote:
>>> Hi al
On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin wrote:
> 2014-06-18 1:28 GMT+08:00 Gregory Farnum :
>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote:
>>> Hi list,
>>>
>>> How does RADOS check an object and its replica are consistent? Is there
>>>
On Wed, Jun 18, 2014 at 12:54 AM, Sherry Shahbazi wrote:
> Hi everyone,
>
> If I have a pool called cold-storage (1) and a pool called hot-storage (2)
> that hot-storage is a cache tier for the cold-storage.
>
> I normally do the followings in order to map a directory in my client to a
> pool.
>
>
Yeah, the OSDs connect to the monitors over the OSD's public address.
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Jun 18, 2014 at 11:37 AM, Florent B wrote:
> On 06/18/2014 04:34 PM, Gregory Farnum wrote:
>> On Tue, Jun 17, 2014 at 4:08 PM, Florent B wr
On Wed, Jun 18, 2014 at 12:07 PM, Ke-fei Lin wrote:
> 2014-06-18 22:44 GMT+08:00 Gregory Farnum :
>> On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin wrote:
>>> 2014-06-18 1:28 GMT+08:00 Gregory Farnum :
>>>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin wrote:
>&
On Wed, Jun 18, 2014 at 9:14 PM, Shesha Sreenivasamurthy
wrote:
> I am doing some research work at UCSC and wanted use LevelDB to store OMAP
> key/value pairs. What is the best way to start playing with it. I am a
> newbie to RADOS/CEPH code. Can any one point me in the right direction ?
I'm not
The total used/available/capacity is calculated by running the syscall
which "df" uses across all OSDs and summing the results. The "total data"
is calculated by summing the sizes of the objects stored.
It depends on how you've configured your system, but I'm guessing the
markup is due to the (con
my PGs are clean+active! By the way, I disabled CephX.
>
> Thanks in advance,
> Sherry
>
>
>
>
> On Thursday, June 19, 2014 3:16 AM, Gregory Farnum > wrote:
>
>
> On Wed, Jun 18, 2014 at 12:54 AM, Sherry Shahbazi > wrote:
> > Hi everyone,
> >
>
On Thursday, June 19, 2014, Pavel V. Kaygorodov wrote:
> Hi!
>
> May be I have missed something in docs, but is there a way to switch a
> pool from replicated to erasure coded?
No.
> Or I have to create a new pool an somehow manually transfer data from old
> pool to new one?
Yes. Please kee
gement that we should be?
>
>
>
>
>
> George
>
>
>
> *From:* Gregory Farnum [mailto:g...@inktank.com
> ]
> *Sent:* 19 June 2014 13:53
> *To:* Ryall, George (STFC,RAL,SC)
> *Cc:* ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] understanding ra
No, you definitely don't need to shut down the whole cluster. Just do
a polite shutdown of the daemons, optionally with the noout flag that
Wido mentioned.
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Jun 19, 2014 at 1:55 PM, Alphe Salas Michels wrote:
> Hello, the best p
> RADOS code in which OMAP uses LevelDB. I am a newbie hence the question.
>
>
> On Wed, Jun 18, 2014 at 7:28 PM, Gregory Farnum wrote:
>>
>> On Wed, Jun 18, 2014 at 9:14 PM, Shesha Sreenivasamurthy
>> wrote:
>> > I am doing some research work at UCSC and w
On Fri, Jun 20, 2014 at 4:23 PM, Shayan Saeed wrote:
> Is it allowed for crush maps to have multiple hierarchies for different
> pools. So for example, I want one pool to treat my cluster as flat with
> every host being equal but the other pool to have a more hierarchical idea
> as hosts->racks->r
Looks like it's a doc error (at least on master), but it might have
changed over time. If you're running Dumpling we should change the
docs.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Sun, Jun 22, 2014 at 10:18 PM, Christian Balzer wrote:
>
> Hello,
>
> This weekend I
On Mon, Jun 23, 2014 at 4:26 AM, Christian Kauhaus wrote:
> I see several instances of the following log messages in the OSD logs each
> day:
>
> 2014-06-21 02:05:27.740697 7fbc58b78700 0 -- 172.22.8.12:6810/31918 >>
> 172.22.8.12:6800/28827 pipe(0x7fbe400029f0 sd=764 :6810 s=0 pgs=0 cs=0 l=0
>
On Mon, Jun 23, 2014 at 4:54 AM, Christian Eichelmann
wrote:
> Hi ceph users,
>
> since our cluster had a few inconsistent pgs in the last time, i was
> wondering what ceph pg repair does, depending on the replication level.
> So I just wanted to check if my assumptions are correct:
>
> Replicatio
;
> On Mon, Jun 23, 2014 at 2:14 PM, Gregory Farnum > wrote:
>
>> On Fri, Jun 20, 2014 at 4:23 PM, Shayan Saeed > > wrote:
>> > Is it allowed for crush maps to have multiple hierarchies for different
>> > pools. So for example, I want one pool to treat my cluste
You probably want to look at the central log (on your monitors) and
see exactly what scrub errors it's reporting. There might also be
useful info if you dump the pg info on the inconsistent PGs. But if
you're getting this frequently, you're either hitting some unknown
issues with the OSDs around so
Unfortunately Yehuda's out for a while as he could best handle this,
but it sounds familiar so I think you probably want to search the list
archives and the bug tracker (http://tracker.ceph.com/projects/rgw).
What version precisely are you on?
-Greg
Software Engineer #42 @ http://inktank.com | http
On Wed, Jun 25, 2014 at 12:22 AM, Christian Kauhaus wrote:
> Am 23.06.2014 20:24, schrieb Gregory Farnum:
>> Well, actually it always takes the primary copy, unless the primary
>> has some way of locally telling that its version is corrupt. (This
>> might happen if the pri
Sorry we let this drop; we've all been busy traveling and things.
There have been a lot of changes to librados between Dumpling and
Firefly, but we have no idea what would have made it slower. Can you
provide more details about how you were running these tests?
-Greg
Software Engineer #42 @ http:/
On Thu, Jun 26, 2014 at 7:03 AM, Micha Krause wrote:
> Hi,
>
> could someone explain to me what the difference is between
>
> ceph osd reweight
>
> and
>
> ceph osd crush reweight
"ceph osd crush reweight" sets the CRUSH weight of the OSD. This
weight is an arbitrary value (generally the size of
On Thu, Jun 26, 2014 at 12:52 PM, Kevin Horan
wrote:
> I am also getting inconsistent object errors on a regular basis, about 1-2
> every week or so for about 300GB of data. All OSDs are using XFS
> filesystems. Some OSDs are individual 3TB internal hard drives and some are
> external FC attached
Yep, definitely use "osd crush reweigh" for your permanent data placement.
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Jun 27, 2014 at 12:13 AM, Micha Krause wrote:
> Hi,
>
>
>> "ceph osd crush reweight" sets the CRUSH weight of the OSD. This
>> weight is an arbitrary va
Did you also increase the "pgp_num"?
On Saturday, June 28, 2014, Jianing Yang wrote:
> Actually, I did increase PG number to 32768 (120 osds) and I also use
> "tunable optimal". But the data still not distribute evenly.
>
>
> On Sun, Jun 29, 2014 at 3:42 AM, Konrad Gutkowski > wrote:
>
>> Hi,
>
It looks like that value isn't live-updateable, so you'd need to
restart after changing the daemon's config. Sorry!
Made a ticket: http://tracker.ceph.com/issues/8695
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jun 30, 2014 at 12:41 AM, Kostis Fardelas wrote:
> Hi,
Directory sharding is even less stable than the rest of the MDS, but
if you need it I have some hope that things willow work. You just need
to set the "mds bal frag" option to "true". You can configure the
limits as well; see the options following:
https://github.com/ceph/ceph/blob/master/src/commo
What's the backtrace from the crashing OSDs?
Keep in mind that as a dev release, it's generally best not to upgrade
to unnamed versions like 0.82 (but it's probably too late to go back
now).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jun 30, 2014 at 8:06 AM, Pierr
it's running to force
fragments into a specific MDS.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jun 30, 2014 at 8:51 AM, Florent B wrote:
> Ok thank you. So it is not possible to set a specific directory assigned
> to a MDS ?
>
> On 06/30/201
ble (0.80.1). It may be that
> during recovery OSDs are currently backfilling other pgs, so stats are
> not updated (because pg were not tried to backfill after setting change).
>
> On 2014.06.30 18:31, Gregory Farnum wrote:
>> It looks like that value isn't live-updateable, so y
It looks like you're using a kernel RBD mount in the second case? I imagine
your kernel doesn't support caching pools and you'd need to upgrade for it
to work.
-Greg
On Tuesday, July 1, 2014, Никитенко Виталий wrote:
> Good day!
> I have server with Ubunu 14.04 and installed ceph firefly. Config
What's the output of "ceph osd map"?
Your CRUSH map probably isn't trying to segregate properly, with 2
hosts and 4 OSDs each.
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jul 1, 2014 at 11:22 AM, Brian Lovett
wrote:
> I'm pulling my hair out with ceph. I am testing thin
SDs report each other down much more quickly (~30s) than the
monitor timeout (~15 minutes). They'd get marked down eventually.
On Tue, Jul 1, 2014 at 11:43 AM, Brian Lovett
wrote:
> Gregory Farnum writes:
>
>>
>> What's the output of "ceph osd map"?
>>
>
On Tue, Jul 1, 2014 at 11:45 AM, Gregory Farnum wrote:
> On Tue, Jul 1, 2014 at 11:33 AM, Brian Lovett
> wrote:
>> Brian Lovett writes:
>>
>>
>> I restarted all of the osd's and noticed that ceph shows 2 osd's up even if
>> the servers are complet
On Tue, Jul 1, 2014 at 11:57 AM, Brian Lovett
wrote:
> Gregory Farnum writes:
>
>> ...and one more time, because apparently my brain's out to lunch today:
>>
>> ceph osd tree
>>
>> *sigh*
>>
>
> haha, we all have those days.
>
> [root
On Tue, Jul 1, 2014 at 1:26 PM, Brian Lovett
wrote:
> "profile": "bobtail",
Okay. That's unusual. What's the oldest client you need to support,
and what Ceph version are you using? You probably want to set the
crush tunables to "optimal"; the "bobtail" ones are going to have all
kinds of is
On Thu, Jun 26, 2014 at 11:49 PM, Stefan Priebe - Profihost AG
wrote:
> Hi Greg,
>
> Am 26.06.2014 02:17, schrieb Gregory Farnum:
>> Sorry we let this drop; we've all been busy traveling and things.
>>
>> There have been a lot of changes to librados between Dumplin
com | http://ceph.com
On Tue, Jul 1, 2014 at 5:44 PM, Никитенко Виталий wrote:
> Hi!
>
> There is some option in the kernel which must be included, or just upgrade
> to the latest version of the kernel? I use 3.13.0-24
>
> Thanks
>
> 01.07.2014, 20:17, "Gregory Farnum&quo
;t any counters. As this mail was some days unseen - i
> thought nobody has an idea or could help.
>
> Stefan
>
>> On Wed, Jul 2, 2014 at 9:01 PM, Stefan Priebe - Profihost AG
>> wrote:
>>> Am 02.07.2014 00:51, schrieb Gregory Farnum:
>>>> On Thu
On Wed, Jul 2, 2014 at 6:18 AM, Sylvain Munaut
wrote:
> Hi,
>
>
> I'm having a couple of issues during this update. On the test cluster
> it went fine, but when running it on production I have a few issues.
> (I guess there is some subtle difference I missed, I updated the test
> one back when emp
On Wed, Jul 2, 2014 at 12:00 PM, Stefan Priebe wrote:
>
> Am 02.07.2014 16:00, schrieb Gregory Farnum:
>
>> Yeah, it's fighting for attention with a lot of other urgent stuff. :(
>>
>> Anyway, even if you can't look up any details or reproduce at this
>
On Wed, Jul 2, 2014 at 12:44 PM, Stefan Priebe wrote:
> Hi Greg,
>
> Am 02.07.2014 21:36, schrieb Gregory Farnum:
>>
>> On Wed, Jul 2, 2014 at 12:00 PM, Stefan Priebe
>> wrote:
>>>
>>>
>>> Am 02.07.2014 16:00, schrieb Gregory Farnum:
>>
It looks like you're just putting in data faster than your cluster can
handle (in terms of IOPS).
The first big hole (queue_op_wq->reached_pg) is it sitting in a queue
and waiting for processing. The second parallel blocks are
1) write_thread_in_journal_buffer->journaled_completion_queued, and
that
The PG in question isn't being properly mapped to any OSDs. There's a
good chance that those trees (with 3 OSDs in 2 hosts) aren't going to
map well anyway, but the immediate problem should resolve itself if
you change the "choose" to "chooseleaf" in your rules.
-Greg
Software Engineer #42 @ http:/
On Wed, Jul 2, 2014 at 3:06 PM, Marc wrote:
> Hi,
>
> I was wondering, having a cache pool in front of an RBD pool is all fine
> and dandy, but imagine you want to pull backups of all your VMs (or one
> of them, or multiple...). Going to the cache for all those reads isn't
> only pointless, it'll
On Thu, Jul 3, 2014 at 8:24 AM, baijia...@126.com wrote:
> when I see the function "OSD::OpWQ::_process ". I find pg lock locks the
> whole function. so when I use multi-thread write the same object , so are
> they must
> serialize from osd handle thread to journal write thread ?
It's serialized
On Thu, Jul 3, 2014 at 11:17 AM, Iban Cabrillo wrote:
> Hi Gregory,
> Thanks a lot I begin to understand who ceph works.
> I add a couple of osd servers, and balance the disk between them.
>
>
> [ceph@cephadm ceph-cloud]$ sudo ceph osd tree
> # idweighttype nameup/downreweight
Do you have a ceph.conf file that the "ceph" tool can access in a
known location? Try specifying it manually with the "-c ceph.conf"
argument. You can also add "--debug-ms 1, --debug-monc 10" and see if
it outputs more useful error logs.
-Greg
Software Engineer #42 @ http://inktank.com | http://cep
What was the exact sequence of events — were you rebalancing when you
did the upgrade? Did the marked out OSDs get upgraded?
Did you restart all the monitors prior to changing the tunables? (Are
you *sure*?)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Sat, Jul 5, 2014 at
We don't test explicitly for this, but I'm surprised to hear about a
jump of that magnitude. Do you have any more detailed profiling? Can
you generate some? (With the tcmalloc heap dumps.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jul 7, 2014 at 3:03 AM, Sylvain Mu
CRUSH is a probabilistic algorithm. By having all those non-existent
OSDs in the map, you made it so that 10/12 attempts at mapping would
fail and need to be retried. CRUSH handles a lot of retries, but not
enough for that to work out well.
-Greg
Software Engineer #42 @ http://inktank.com | http://
On Mon, Jul 7, 2014 at 7:03 AM, Erik Logtenberg wrote:
> Hi,
>
> If you add an OSD to an existing cluster, ceph will move some existing
> data around so the new OSD gets its respective share of usage right away.
>
> Now I noticed that during this moving around, ceph reports the relevant
> PG's as
Okay. Based on your description I think the reason for the tunables
crashes is that either the "out" OSDs, or possibly one of the
monitors, never got restarted. You should be able to update the
tunables now, if you want to. (Or there's also a config option that
will disable the warning; check the r
On Mon, Jul 7, 2014 at 4:21 PM, James Harper wrote:
>>
>> Okay. Based on your description I think the reason for the tunables
>> crashes is that either the "out" OSDs, or possibly one of the
>> monitors, never got restarted. You should be able to update the
>> tunables now, if you want to. (Or the
You can look at which OSDs the PGs map to. If the PGs have
insufficient replica counts they'll report as degraded in "ceph -s" or
"ceph -w".
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jul 7, 2014 at 4:30 PM, James Harper wrote:
>>
>> It sounds like maybe you've got a ba
On Mon, Jul 7, 2014 at 4:39 PM, James Harper wrote:
>>
>> You can look at which OSDs the PGs map to. If the PGs have
>> insufficient replica counts they'll report as degraded in "ceph -s" or
>> "ceph -w".
>
> I meant in a general sense. If I have a pg that I suspect might be
> insufficiently redu
It's not very intuitive or easy to look at right now (there are plans
from the recent developer summit to improve things), but the central
log should have output about exactly what objects are busted. You'll
then want to compare the copies manually to determine which ones are
good or bad, get the g
The impact won't be 300 times bigger, but it will be bigger. There are two
things impacting your cluster here
1) the initial "split" of the affected PGs into multiple child PGs. You can
mitigate this by stepping through pg_num at small multiples.
2) the movement of data to its new location (when yo
On Tue, Jul 8, 2014 at 10:14 AM, Dan Van Der Ster
wrote:
> Hi Greg,
> We're also due for a similar splitting exercise in the not too distant
> future, and will also need to minimize the impact on latency.
>
> In addition to increasing pg_num in small steps and using a minimal
> max_backfills/recov
1 - 100 of 2358 matches
Mail list logo