On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer wrote:
>
> Hello,
>
> One of my clusters has become busy enough (I'm looking at you, evil Window
> VMs that I shall banish elsewhere soon) to experience client noticeable
> performance impacts during deep scrub.
> Before this I instructed all OSDs
On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero
wrote:
> No problem thanks for helping.
> I don't want to disable the deep scrubbing process itself because its very
> useful but one placement group (3.30) is continuously deep scrubbing and it
> should finish after some time but it won't.
Hmm, how
log [INF]
> : 1.a8 scrub ok
> 2014-11-12 16:25:53.012220 7f5026f31700 0 log_channel(default) log [INF]
> : 1.a9 scrub ok
> 2014-11-12 16:25:54.009265 7f5026f31700 0 log_channel(default) log [INF]
> : 1.cb scrub ok
> 2014-11-12 16:25:56.516569 7f5026f31700 0 log_channel(default
My recollection is that the RADOS tool is issuing a special eviction
command on every object in the cache tier using primitives we don't use
elsewhere. Their existence is currently vestigial from our initial tiering
work (rather than the present caching), but I have some hope we'll extend
them agai
On Tue, Nov 11, 2014 at 2:32 PM, Christian Balzer wrote:
> On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote:
>
>> On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer wrote:
>> >
>> > Hello,
>> >
>> > One of my clusters has become busy enough
On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird wrote:
> I'm having a problem with my cluster. It's running 0.87 right now, but I
> saw the same behavior with 0.80.5 and 0.80.7.
>
> The problem is that my logs are filling up with "replacing existing (lossy)
> channel" log lines (see below), to the p
What does "ceph -s" output when things are working?
Does the ceph.conf on your admin node
contain the address of each monitor? (Paste is the relevant lines.) it will
need to or the ceph tool won't be able to find the monitors even though the
system is working.
-Greg
On Thu, Nov 13, 2014 at 9:11 AM
On Thu, Nov 13, 2014 at 2:58 PM, Anthony Alba wrote:
> Hi list,
>
>
> When there are multiple rules in a ruleset, is it the case that "first
> one wins"?
>
> When will a rule faisl, does it fall through to the next rule?
> Are min_size, max_size the only determinants?
>
> Are there any examples?
On Thu, Nov 13, 2014 at 3:11 PM, Anthony Alba wrote:
> Thanks! What happens when the lone rule fails? Is there a fallback
> rule that will place the blob in a random PG? Say I misconfigure, and
> my choose/chooseleaf don't add up to pool min size.
There's no built-in fallback rule or anything li
You didn't remove them from the auth monitor's keyring. If you're
removing OSDs you need to follow the steps in the documentation.
-Greg
On Fri, Nov 14, 2014 at 4:42 PM, JIten Shah wrote:
> Hi Guys,
>
> I had to rekick some of the hosts where OSD’s were running and after
> re-kick, when I try to
.
>
> —Jiten
>
> On Nov 14, 2014, at 4:44 PM, Gregory Farnum wrote:
>
>> You didn't remove them from the auth monitor's keyring. If you're
>> removing OSDs you need to follow the steps in the documentation.
>> -Greg
>>
>> On Fri, Nov 14, 2
On Tue, Nov 18, 2014 at 1:26 PM, hp cre wrote:
> Hello everyone,
>
> I'm new to ceph but been working with proprietary clustered filesystem for
> quite some time.
>
> I almost understand how ceph works, but have a couple of questions which
> have been asked before here, but i didn't understand t
On Tue, Nov 11, 2014 at 11:43 PM, Gauvain Pocentek
wrote:
> Hi all,
>
> I'm facing a problem on a ceph deployment. rados mkpool always fails:
>
> # rados -n client.admin mkpool test
> error creating pool test: (2) No such file or directory
>
> rados lspool and rmpool commands work just fine, and t
27;t need to create vm
> instances on filesystems, am I correct?
Right; these systems are doing the cache coherency (by duplicating all
the memory, including that of ext4/whatever) so that they work.
-Greg
>
> On 18 Nov 2014 23:33, "Gregory Farnum" wrote:
>>
>> On Tu
On Thu, Nov 13, 2014 at 9:34 AM, Lincoln Bryant wrote:
> Hi all,
>
> Just providing an update to this -- I started the mds daemon on a new server
> and rebooted a box with a hung CephFS mount (from the first crash) and the
> problem seems to have gone away.
>
> I'm still not sure why the mds was
rst, but it's logging tons of the same errors while
> trying to talk to 10.2.0.34.
>
> On Wed Nov 12 2014 at 10:47:30 AM Gregory Farnum wrote:
>>
>> On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird wrote:
>> > I'm having a problem with my cluster. It's runnin
On Sun, Nov 16, 2014 at 4:17 PM, Anthony Alba wrote:
> The step emit documentation states
>
> "Outputs the current value and empties the stack. Typically used at
> the end of a rule, but may also be used to pick from different trees
> in the same rule."
>
> What use case is there for more than one
Hmm, last time we saw this it meant that the MDS log had gotten
corrupted somehow and was a little short (in that case due to the OSDs
filling up). What do you mean by "rebuilt the OSDs"?
-Greg
On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah wrote:
> After i rebuilt the OSD’s, the MDS went into the
I believe the reason we don't allow you to do this right now is that
there was not a good way of coordinating the transition (so that
everybody starts routing traffic through the cache pool at the same
time), which could lead to data inconsistencies. Looks like the OSDs
handle this appropriately no
On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc wrote:
> I was going to submit this as a bug, but thought I would put it here for
> discussion first. I have a feeling that it could be behavior by design.
>
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>
> I'm using a cache pool an
On Wed, Nov 12, 2014 at 1:41 PM, houmles wrote:
> Hi,
>
> I have 2 hosts with 8 2TB drive in each.
> I want to have 2 replicas between both hosts and then 2 replicas between osds
> on each host. That way even when I lost one host I still have 2 replicas.
>
> Currently I have this ruleset:
>
> rul
I think these numbers are about what is expected. You could try a couple
things to improve it, but neither of them are common:
1) increase the number of PGs (and pgp_num) a lot more. I you decide to
experiment with this, watch your CPU and memory numbers carefully.
2) try to correct for the inequ
You don't really need to do much. There are some "ceph mds" commands
that let you clean things up in the MDSMap if you like, but moving an
MDS essentially it boils down to:
1) make sure your new node has a cephx key (probably for a new MDS
entity named after the new host, but not strictly necessary
On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski wrote:
> Hi,
>
> During deep-scrub Ceph discovered some inconsistency between OSDs on my
> cluster (size 3, min size 2). I have fund broken object and calculated
> md5sum of it on each OSD (osd.195 is acting_primary):
> osd.195 - md5sum_
> osd.
On Fri, Nov 21, 2014 at 4:56 AM, Jon Kåre Hellan
wrote:
> We are testing a Giant cluster - on virtual machines for now. We have seen
> the same
> problem two nights in a row: One of the OSDs gets stuck in uninterruptible
> sleep.
> The only way to get rid of it is apparently to reboot - kill -9, -
Can you post the OSD log somewhere? It should have a few more details
about what's going on here. (This backtrace looks like it's crashing
in a call to phreads, which is a little unusual.)
-Greg
On Sat, Nov 22, 2014 at 1:01 PM, Jeffrey Ollie wrote:
> -- One of my OSDs lost network connectivity fo
On Sat, Nov 22, 2014 at 11:39 AM, Jeffrey Ollie wrote:
> On Sat, Nov 22, 2014 at 1:22 PM, Gregory Farnum wrote:
>> Can you post the OSD log somewhere? It should have a few more details
>> about what's going on here. (This backtrace looks like it's crashing
>> i
On Thu, Nov 20, 2014 at 6:32 PM, Shawn Edwards wrote:
> This page is marked for removal:
>
> http://ceph.com/docs/firefly/dev/differences-from-posix/
I'm not quite sure what that TODO means, there, but
> Is the bug in the above webpage still in the code? If not, in which version
> was it fixed?
On Fri, Nov 21, 2014 at 3:21 PM, JIten Shah wrote:
> I am trying to setup 3 MDS servers (one on each MON) but after I am done
> setting up the first one, it give me below error when I try to start it on
> the other ones. I understand that only 1 MDS is functional at a time, but I
> thought you can
On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah wrote:
> Hi Greg,
>
> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
> wondering where is it getting that value from?
No idea, sorry. Probably some odd
On Fri, Nov 21, 2014 at 12:34 AM, JuanFra Rodriguez Cardoso
wrote:
> Hi all:
>
> As it was asked weeks ago.. what is the way the ceph community uses to
> stay tuned on new features and bug fixes?
I asked Sage about this today and he said he'd set one up. Seems like
a good idea; just not something
On Thu, Nov 20, 2014 at 9:08 AM, Dan van der Ster
wrote:
> Hi all,
> What is compatibility/incompatibility of dumpling clients to talk to firefly
> and giant clusters?
We sadly don't have a good matrix about this yet, but in general you
should assume that anything which changed the way the data i
Ilya, do you have a ticket reference for the bug?
Andrei, we run NFS tests on CephFS in our nightlies and it does pretty well
so in the general case we expect it to work. Obviously not at the moment
with whatever bug Ilya is looking at, though. ;)
-Greg
On Sat, Nov 29, 2014 at 4:51 AM Ilya Dryomov
That's not actually so unusual:
http://techreport.com/review/26058/the-ssd-endurance-experiment-data-retention-after-600tb
The manufacturers are pretty conservative with their ratings and
warranties. ;)
-Greg
On Thu, Nov 27, 2014 at 2:41 AM Andrei Mikhailovsky
wrote:
> Mark, if it is not too much
On Tue, Nov 25, 2014 at 1:00 AM, Dan Van Der Ster
wrote:
> Hi Greg,
>
>
>> On 24 Nov 2014, at 22:01, Gregory Farnum wrote:
>>
>> On Thu, Nov 20, 2014 at 9:08 AM, Dan van der Ster
>> wrote:
>>> Hi all,
>>> What is compatibility/incompatibility o
On Sun, Nov 30, 2014 at 1:15 PM, Andrei Mikhailovsky wrote:
> Greg, thanks for your comment. Could you please share what OS, kernel and
> any nfs/cephfs settings you've used to achieve the pretty well stability?
> Also, what kind of tests have you ran to check that?
We're just doing it on our te
On Mon, Dec 1, 2014 at 8:06 AM, John Spray wrote:
> I meant to chime in earlier here but then the weekend happened, comments
> inline
>
> On Sun, Nov 30, 2014 at 7:20 PM, Wido den Hollander wrote:
>> Why would you want all CephFS metadata in memory? With any filesystem
>> that will be a problem.
We aren't currently doing any of the ongoing testing which that page covers
on CentOS 7. I think that's because it's going to flow through the same Red
Hat mechanisms as the RHEL7 builds, but I'm not on that team so I can't say
for sure.
-Greg
On Tue, Dec 2, 2014 at 9:39 AM Frank Even
wrote:
> He
On Tue, Dec 2, 2014 at 10:55 AM, Ken Dreyer wrote:
> On 12/02/2014 10:59 AM, Gregory Farnum wrote:
>> We aren't currently doing any of the ongoing testing which that page
>> covers on CentOS 7. I think that's because it's going to flow through
>> the same Re
It means that the connection from the client to the osd went away. This
could happen just because the client shut down, but if so it quit before it
had gotten commits from all its disk writes, which seems bad. It could also
mean there was a networking problem of some kind.
-Greg
On Thu, Dec 4, 2014
On Fri, Dec 5, 2014 at 9:36 AM, Sage Weil wrote:
> A while back we merged Haomai's experimental OSD backend KeyValueStore.
> We named the config option 'keyvaluestore_dev', hoping to make it clear to
> users that it was still under development, not fully tested, and not yet
> ready for production.
On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer wrote:
>
> Hello,
>
> This morning I decided to reboot a storage node (Debian Jessie, thus 3.16
> kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some
> changes.
>
> It came back up one OSD short, the last log lines before the reb
n, 8 Dec 2014 19:51:00 -0800 Gregory Farnum wrote:
>
>> On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer wrote:
>> >
>> > Hello,
>> >
>> > Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
>> > 3 storage nodes with 8 OSDs (journals on 4 S
On Mon, Dec 8, 2014 at 8:51 PM, Christian Balzer wrote:
> On Mon, 8 Dec 2014 20:36:17 -0800 Gregory Farnum wrote:
>
>> They never fixed themselves?
> As I wrote, it took a restart of OSD 8 to resolve this on the next day.
>
>> Did the reported times ever increase?
>
On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer wrote:
>
> Hello,
>
> Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
> 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons.
> 2 compute nodes, everything connected via Infiniband.
>
> This is pre-production, currently there are only
It looks like your OSDs all have weight zero for some reason. I'd fix that.
:)
-Greg
On Tue, Dec 9, 2014 at 6:24 AM Giuseppe Civitella <
giuseppe.civite...@gmail.com> wrote:
> Hi,
>
> thanks for the quick answer.
> I did try the force_create_pg on a pg but is stuck on "creating":
> root@ceph-mon1:
MDSes.
-Greg
On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah wrote:
> Do I need to update the ceph.conf to support multiple MDS servers?
>
> —Jiten
>
> On Nov 24, 2014, at 6:56 AM, Gregory Farnum wrote:
>
>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah wrote:
>>>
On Tue, Dec 9, 2014 at 10:24 AM, Abhishek L
wrote:
> Hi
>
> I was going through various conf options to customize a ceph cluster and
> came across `osd pool default flags` in pool-pg config ref[1]. Though
> the value specifies an integer, though I couldn't find a mention of
> possible values this
On Tue, Dec 9, 2014 at 3:11 PM, Christopher Armstrong
wrote:
> Hi folks,
>
> I think we have a bit of confusion around how initial members is used. I
> understand that we can specify a single monitor (or a subset of monitors) so
> that the cluster can form a quorum when it first comes up. This is
dr = 192.168.2.202:6789
>
>
>
> [client.radosgw.gateway]
> host = deis-store-gateway
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /dev/stdout
>
>
> On Wed, Dec 10, 2014 at 11:40 AM,
Was there any activity against your cluster when you reduced the size
from 3 -> 2? I think maybe it was just taking time to percolate
through the system if nothing else was going on. When you reduced them
to size 1 then data needed to be deleted so everything woke up and
started processing.
-Greg
On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito wrote:
> Hi,
>
> I've stopped OSD.16, removed the PG from the local filesystem and started
> the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
> deep-scrub and the PG is still inconsistent.
What led you to remove it from osd 16? Is
On Thu, Dec 11, 2014 at 2:21 AM, Joao Eduardo Luis wrote:
> On 12/11/2014 04:28 AM, Christopher Armstrong wrote:
>>
>> If someone could point me to where this fix should go in the code, I'd
>> actually love to dive in - I've been wanting to contribute back to Ceph,
>> and this bug has hit us perso
On Fri, Dec 12, 2014 at 11:06 AM, Patrick Darley
wrote:
> Hi there,
>
> I am using a custom Linux OS, with ceph v0.89.
>
>
> I have been following the monitor bootstrap instructions [1].
>
> I have a problem in that the OS is firmly on the systemd bandwagon
> and lacks support to run the provided
What version of Ceph are you running? Is this a replicated or
erasure-coded pool?
On Fri, Dec 12, 2014 at 1:11 AM, Luis Periquito wrote:
> Hi Greg,
>
> thanks for your help. It's always highly appreciated. :)
>
> On Thu, Dec 11, 2014 at 6:41 PM, Gregory Farnum wrote:
>&g
Cache tiering is a stable, functioning system. Those particular commands
are for testing and development purposes, not something you should run
(although they ought to be safe).
-Greg
On Wed, Dec 17, 2014 at 1:44 AM Yujian Peng
wrote:
> Hi,
> Since firefly, ceph can support cache tiering.
> Cache
On Wed, Dec 17, 2014 at 2:31 PM, McNamara, Bradley
wrote:
> I have a somewhat interesting scenario. I have an RBD of 17TB formatted
> using XFS. I would like it accessible from two different hosts, one
> mapped/mounted read-only, and one mapped/mounted as read-write. Both are
> shared using Sam
On Thu, Dec 18, 2014 at 4:04 AM, Daniele Venzano wrote:
> Hello,
>
> I have been trying to upload multi-gigabyte files to CEPH via the object
> gateway, using both the swift and s3 APIs.
>
> With file up to about 2GB everything works as expected.
>
> With files bigger than that I get back a "400 B
On Wed, Dec 17, 2014 at 8:52 PM, Lindsay Mathieson
wrote:
> I'be been experimenting with CephFS for funning KVM images (proxmox).
>
> cephfs fuse version - 0.87
>
> cephfs kernel module - kernel version 3.10
>
>
> Part of my testing involves running a Windows 7 VM up and running
> CrystalDiskMark
What kind of uploads are you performing? How are you testing?
Have you looked at the admin sockets on any daemons yet? Examining the OSDs
to see if they're behaving differently on the different requests is one
angle of attack. The other is look into is if the RGW daemons are hitting
throttler limit
On Thu, Dec 18, 2014 at 8:44 PM, Sean Sullivan wrote:
> Thanks for the reply Gegory,
>
> Sorry if this is in the wrong direction or something. Maybe I do not
> understand
>
> To test uploads I either use bash time and either python-swiftclient or boto
> key.set_contents_from_filename to the radosg
On Sun, Dec 21, 2014 at 8:20 PM, Jimmy Chu wrote:
> Hi,
>
> This is a followup question to my previous question. When the last monitor
> in a ceph monitor set is down, what is the proper way to boot up the ceph
> monitor set again?
>
> On one hand, we could try not to make this happen, but on the
On Sun, Dec 21, 2014 at 11:54 PM, Christopher Kunz
wrote:
> Hi all,
>
> I'm trying to get a working PoC installation of Ceph done on an armhf
> platform. I'm failing to find working Ceph packages (so does
> ceph-deploy, too) for Ubuntu Trusty LTS. The ceph.com repos don't have
> anything besides c
On Mon, Dec 22, 2014 at 8:20 AM, Wido den Hollander wrote:
> Hi,
>
> While investigating slow requests on a Firefly (0.80.7) I looked at the
> historic ops from the admin socket.
>
> On a OSD which just spitted out some slow requests I noticed:
>
> "received_at": "2014-12-22 17:08:41.496
On Mon, Dec 22, 2014 at 10:30 AM, Wido den Hollander wrote:
> For example, two ops:
>
> #1:
>
> { "description": "osd_sub_op(client.2433432.0:61603164 20.424
> 19038c24\/rbd_data.d7c912ae8944a.08b6\/head\/\/20 [] v
> 63283'8301089 snapset=0=[]:[] snapc=0=[])",
> "received_at
I think it's just for service isolation that people recommend splitting
them. The only technical issue I can think of is that you don't want to put
kernel clients on the same OS as an OSD (due to deadlock scenarios under
memory pressure and writeback).
-Greg
On Sat, Dec 27, 2014 at 12:11 PM Christo
You can store radosgw data in a regular EC pool without any caching in
front. I suspect this will work better for you, as part of the slowness is
probably the OSDs trying to look up all the objects in the ec pool before
deleting them. You should be able to check if that's the case by looking at
the
The meant-for-human-consumption free space estimates and things won't be
accurate if you weight evenly instead of by size, but otherwise things
should work just fine -- you'll simply get full OSD warnings when you have
1TB/OSD.
-Greg
On Thu, Jan 1, 2015 at 3:10 PM Lindsay Mathieson <
lindsay.mathie
I'm on my phone at the moment, but I think if you run "ceph osd crush rule"
it will prompt you with the relevant options?
On Tue, Dec 30, 2014 at 6:00 PM Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:
> Is there a command to do this without decompiling/editing/compiling the
> crush
> set?
On Saturday, January 3, 2015, Max Power <
mailli...@ferienwohnung-altenbeken.de> wrote:
> Ceph is a cool software but from time to time I am getting gray hairs
> with it. And I hope that's because of a misunderstanding. This time I
> want to balance the load between three osd's evenly (same usage
You might try temporarily increasing the backfill allowance params so that
the stuff can move around more quickly. Given the cluster is idle it's
definitely hitting those limits. ;)
-Greg
On Saturday, January 3, 2015, Lindsay Mathieson
wrote:
> I just added 4 OSD's to my 2 OSD "cluster" (2 Nodes
On Mon, Jan 5, 2015 at 12:11 PM, Robert LeBlanc wrote:
> If Ceph snapshots work like VM snapshots (and I don't have any reason to
> believe otherwise), the snapshot will never grow larger than the size of the
> base image. If the same blocks are rewritten, then they are just rewritten
> in the sna
On Sat, Jan 3, 2015 at 8:53 PM, Christian Balzer wrote:
> On Sat, 3 Jan 2015 16:21:29 +1000 Lindsay Mathieson wrote:
>
>> I just added 4 OSD's to my 2 OSD "cluster" (2 Nodes, now have 3 OSD's per
>> node).
>>
>> Given its the weekend and not in use, I've set them all to weight 1, but
>> looks like
On Tue, Dec 30, 2014 at 11:38 AM, Erik Logtenberg wrote:
>>
>> Hi Erik,
>>
>> I have tiering working on a couple test clusters. It seems to be
>> working with Ceph v0.90 when I set:
>>
>> ceph osd pool set POOL hit_set_type bloom
>> ceph osd pool set POOL hit_set_count 1
>> ceph osd pool set PO
On Thu, Dec 18, 2014 at 1:21 PM, Robert LeBlanc wrote:
> Before we base thousands of VM image clones off of one or more snapshots, I
> want to test what happens when the snapshot becomes corrupted. I don't
> believe the snapshot will become corrupted through client access to the
> snapshot, but so
On Sun, Jan 4, 2015 at 8:10 AM, Lionel Bouton wrote:
> On 01/04/15 16:25, Jiri Kanicky wrote:
>> Hi.
>>
>> I have been experiencing same issues on both nodes over the past 2
>> days (never both nodes at the same time). It seems the issue occurs
>> after some time when copying a large number of f
I'm afraid I don't know what would happen if you change those options.
Hopefully we've set it up so things continue to work, but we definitely
don't test it.
-Greg
On Tue, Jan 6, 2015 at 8:22 AM Lionel Bouton
wrote:
> On 01/06/15 02:36, Gregory Farnum wrote:
> > [.
On Wed, Jan 7, 2015 at 9:55 PM, Christian Balzer wrote:
> On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote:
>
>> On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote:
>>
>> > However, I suspect that temporarily setting min size to a lower number
>> > could be enough for the PGs to recover.
100GB objects (or ~40 on a hard drive!) are way too large for you to
get an effective random distribution.
-Greg
On Thu, Jan 8, 2015 at 5:25 PM, Mark Nelson wrote:
> On 01/08/2015 03:35 PM, Michael J Brewer wrote:
>>
>> Hi all,
>>
>> I'm working on filling a cluster to near capacity for testing p
On Fri, Jan 9, 2015 at 1:24 AM, Christian Eichelmann
wrote:
> Hi all,
>
> as mentioned last year, our ceph cluster is still broken and unusable.
> We are still investigating what has happened and I am taking more deep
> looks into the output of ceph pg query.
>
> The problem is that I can find so
On Thu, Jan 8, 2015 at 5:46 AM, Zeeshan Ali Shah wrote:
> I just finished configuring ceph up to 100 TB with openstack ... Since we
> are also using Lustre in our HPC machines , just wondering what is the
> bottle neck in ceph going on Peta Scale like Lustre .
>
> any idea ? or someone tried it
I
On Fri, Jan 9, 2015 at 2:00 AM, Nico Schottelius
wrote:
> Lionel, Christian,
>
> we do have the exactly same trouble as Christian,
> namely
>
> Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]:
>> We still don't know what caused this specific error...
>
> and
>
>> ...there is currently
What versions of all the Ceph pieces are you using? (Kernel
client/ceph-fuse, MDS, etc)
Can you provide more details on exactly what the program is doing on
which nodes?
-Greg
On Fri, Jan 9, 2015 at 5:15 PM, Lorieri wrote:
> first 3 stat commands shows blocks and size changing, but not the times
; On Fri, Jan 9, 2015 at 7:15 PM, Gregory Farnum wrote:
>>
>> On Thu, Jan 8, 2015 at 5:46 AM, Zeeshan Ali Shah
>> wrote:
>> > I just finished configuring ceph up to 100 TB with openstack ... Since
>> > we
>> > are also using Lustre in our HPC machines ,
gt; https://github.com/ActiveState/tail
> FAILED -> /usr/bin/tail of a Google docker image running debian wheezy
> PASSED -> /usr/bin/tail of a ubuntu 14.04 docker image
> PASSED -> /usr/bin/tail of the coreos release 494.5.0
>
>
> Tests in machine #1 (same machine that
"perf reset" on the admin socket. I'm not sure what version it went in
to; you can check the release logs if it doesn't work on whatever you
have installed. :)
-Greg
On Mon, Jan 12, 2015 at 2:26 PM, Shain Miley wrote:
> Is there a way to 'reset' the osd perf counters?
>
> The numbers for osd 73
Awesome, thanks for the bug report and the fix, guys. :)
-Greg
On Mon, Jan 12, 2015 at 11:18 PM, 严正 wrote:
> I tracked down the bug. Please try the attached patch
>
> Regards
> Yan, Zheng
>
>
>
>
>> 在 2015年1月13日,07:40,Gregory Farnum 写道:
>>
>> Zheng, t
On Mon, Jan 12, 2015 at 8:25 AM, Dan Van Der Ster
wrote:
>
> On 12 Jan 2015, at 17:08, Sage Weil wrote:
>
> On Mon, 12 Jan 2015, Dan Van Der Ster wrote:
>
> Moving forward, I think it would be good for Ceph to a least document
> this behaviour, but better would be to also detect when
> zone_recla
On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler wrote:
> I have a 4 node ceph cluster, but the disks are not equally distributed
> across all machines (they are substantially different from each other)
>
> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3) and
> two machines
On Fri, Jan 16, 2015 at 2:52 AM, Roland Giesler wrote:
> On 14 January 2015 at 21:46, Gregory Farnum wrote:
>>
>> On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler
>> wrote:
>> > I have a 4 node ceph cluster, but the disks are not equally distributed
>
On Sun, Jan 18, 2015 at 6:40 PM, ZHOU Yuan wrote:
> Hi list,
>
> I'm trying to understand the RGW cache consistency model. My Ceph
> cluster has multiple RGW instances with HAProxy as the load balancer.
> HAProxy would choose one RGW instance to serve the request(with
> round-robin).
> The questio
n
>
>
> On Mon, Jan 19, 2015 at 10:58 PM, Gregory Farnum wrote:
> > On Sun, Jan 18, 2015 at 6:40 PM, ZHOU Yuan wrote:
> >> Hi list,
> >>
> >> I'm trying to understand the RGW cache consistency model. My Ceph
> >> cluster has multiple RGW inst
On Tue, Jan 20, 2015 at 2:40 AM, Christian Eichelmann
wrote:
> Hi all,
>
> I want to understand what Ceph does if several OSDs are down. First of our,
> some words to our Setup:
>
> We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks. These Servers
> are spread across 4 racks in our datace
On Tue, Jan 20, 2015 at 1:32 AM, Christopher Armstrong
wrote:
> Hi folks,
>
> We have many users who run Deis on AWS, and our default configuration places
> hosts in an autoscaling group. Ceph runs on all hosts in the cluster
> (monitors and OSDs), and users have reported losing quorum after havin
Joao has done it in the past so it's definitely possible, but I
confess I don't know what if anything he had to hack up to make it
work or what's changed since then. ARMv6 is definitely not something
we worry about when adding dependencies. :/
-Greg
On Thu, Jan 15, 2015 at 12:17 AM, Prof. Dr. Chri
On Tue, Jan 20, 2015 at 5:48 AM, Mohamed Pakkeer wrote:
>
> Hi all,
>
> We are trying to create 2 PB scale Ceph storage cluster for file system
> access using erasure coded profiles in giant release. Can we create Erasure
> coded pool (k+m = 10 +3) for data and replicated (4 replicas) pool for
> m
> release of CephFS happen with erasure coded pool ? We are ready to test
>> > peta-byte scale CephFS cluster with erasure coded pool.
>> >
>> >
>> > -Mohammed Pakkeer
>> >
>> > On Wed, Jan 21, 2015 at 9:11 AM, Gregory Farnum
>> &g
On Mon, Jan 19, 2015 at 8:40 AM, J David wrote:
> A couple of weeks ago, we had some involuntary maintenance come up
> that required us to briefly turn off one node of a three-node ceph
> cluster.
>
> To our surprise, this resulted in failure to write on the VM's on that
> ceph cluster, even thoug
On Mon, Jan 19, 2015 at 2:48 PM, Brian Rak wrote:
> Awhile ago, I ran into this issue: http://tracker.ceph.com/issues/10411
>
> I did manage to solve that by deleting the PGs, however ever since that
> issue my mon databases have been growing indefinitely. At the moment, I'm
> up to 3404 sst file
creating CephFS? Also we would like to know, when will the production
> release of CephFS happen with erasure coded pool ? We are ready to test
> peta-byte scale CephFS cluster with erasure coded pool.
>
>
> -Mohammed Pakkeer
>
> On Wed, Jan 21, 2015 at 9:11 AM, Gregory Farnum
801 - 900 of 2358 matches
Mail list logo