Thanks! I solved it by the ceph-osd command.
So... there is no a script to install Upstart, isn't it?
Jae
On Fri, Nov 18, 2016 at 3:26 PM 钟佳佳 wrote:
> if you built from git repo tag v10.2.3,
> refers to links below from ceph.com
> http://docs.ceph.com/docs/emperor/install/build-packages/
>
> h
Hi,
about 2 weeks ago I upgraded a rather small cluster from ceph 0.94.2 to 0.94.9. The upgrade went fine, the cluster is
running stable. But I just noticed that one monitor is already eating 20 GB of memory, growing slowly over time. The
other 2 mons look fine. The disk space used by the probl
Hi,
We have support for offline bucket resharding admin command:
https://github.com/ceph/ceph/pull/11230.
It will be available in Jewel 10.2.5.
Orit
On Thu, Nov 17, 2016 at 9:11 PM, Yoann Moulin wrote:
> Hello,
>
> is that possible to shard the index of existing buckets ?
>
> I have more than
Hi All,
I want to submit a PR to include fix in this tracker bug, as I have just
realised I've been experiencing it.
http://tracker.ceph.com/issues/9860
I understand that I would also need to update the debian/ceph-osd.* to get the
file copied, however I'm not quite sure where this
new file (/
Hi Sam,
Updated with some more info.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Samuel Just
> Sent: 17 November 2016 19:02
> To: Nick Fisk
> Cc: Ceph Users
> Subject: Re: [ceph-users] After OSD Flap - FAILED assert(oi.version ==
>
Hi Nick,
Here are some logs. The system is in IST TZ and I have filtered the logs to get
only 2 last hours during which we can observe the issue.
In that particular case, issue is illustrated with the following OSDs
Primary:
ID:607
PID:2962227
HOST:10.137.81.18
Secondary1
ID:528
PID:3721728
HO
Hi,
Follow up from the suggestion to use any of the following options:
- client_mount_timeout
- rados_mon_op_timeout
- rados_osd_op_timeout
To mitigate the waiting time being blocked on requests. Is there
really no other way around this?
If two OSDs go down that between them have the both copi
On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote:
> Hi,
>
> Follow up from the suggestion to use any of the following options:
>
> - client_mount_timeout
> - rados_mon_op_timeout
> - rados_osd_op_timeout
>
> To mitigate the waiting time being blocked on requests. Is there
> really no other way
On 18 November 2016 at 13:14, John Spray wrote:
> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote:
>> Hi,
>>
>> Follow up from the suggestion to use any of the following options:
>>
>> - client_mount_timeout
>> - rados_mon_op_timeout
>> - rados_osd_op_timeout
>>
>> To mitigate the waiting tim
On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclaw wrote:
> On 18 November 2016 at 13:14, John Spray wrote:
>> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote:
>>> Hi,
>>>
>>> Follow up from the suggestion to use any of the following options:
>>>
>>> - client_mount_timeout
>>> - rados_mon_op_timeo
Hi list, I wonder if there is anyone who have experience with Intel
P3700 SSD drives as Journals, and can share their experience?
I was thinking of using the P3700 SSD 400GB as journal in my ceph
deployment. It is benchmarked in Sebastian hann ssd page as well.
However a vendor I spoke to didn't q
We've had this for a while. We just monitor memory usage and restart the mon
services when 1 or more reach 80%.
Sent from my iPhone
> On Nov 18, 2016, at 3:35 AM, Corin Langosch
> wrote:
>
> Hi,
>
> about 2 weeks ago I upgraded a rather small cluster from ceph 0.94.2 to
> 0.94.9. The upgrade
thanks Yehuda and Brian. I'm not sure if you have ever seen this error
with radosgw (lastest Hammer CentOS7), or can advice whether this is a
critical error? appreciate any hints here. thx will
2016-11-12 13:49:08.905114 7fbba7fff700 20 RGWUserStatsCache: sync
user=myuserid1
2016-11-12 13:49:08.90
hi Corin. We run latest hammer on CentOS7.2, with 3 mons and have not
seen this problem. I'm not sure if there are any other possible
differences between the healthy nodes and the one that has excessive
consumption of memory? thx will
On Fri, Nov 18, 2016 at 6:35 PM, Corin Langosch
wrote:
> Hi,
>
I'm using the 400Gb models as a Journal for 12x drives. I know this is probably
pushing it a little bit, but seems to work fine. I'm
guessing the reason may be relating to the TBW figure being higher on the more
expensive models, maybe they don't want to have to
replace warn NVME's on warranty?
yes nick, you're right, I can now see on page 16 here
www.intel.com/content/www/xa/en/solid-state-drives/ssd-dc-p3700-spec.html
there is a difference in the durability.
However, I think 7.3PBW isn't much worse than Intel S3610 that's much
slower. thx will
400GB: 7.3 PBW
800GB: 14.6 PBW (10 drive
> I was wondering how exactly you accomplish that?
> Can you do this with a "ceph-deploy create" with "noin" or "noup" flags
> set, or does one need to follow the manual steps of adding an osd?
You can do it either way (manual or with ceph-deploy). Here are the
steps using ceph-deploy:
1. Add "os
Hey Cephers,
Due to Dreamhost shutting down the old DreamCompute cluster in their
US-East 1 region, we are in the process of beginning the migration of
Ceph infrastructure. We will need to move download.ceph.com,
tracker.ceph.com, and docs.ceph.com to their US-East 2 region.
The current plan is
- Le 3 Nov 16, à 5:18, Thomas a écrit :
> Hi guys,
Hi Thomas,
This is a question I also asked myself ...
Maybe something like :
radosgw-admin zonegroup get
radosgw-admin zone get
And for each user :
radosgw-admin metadata get user:uid
Anyone ?
Stephane.
> I'm not sure this was ask
- Le 4 Nov 16, à 21:17, Andrey Ptashnik a écrit :
> Hello Ceph team!
> I’m trying to create different pools in Ceph in order to have different tiers
> (some are fast, small and expensive and others are plain big and cheap), so
> certain users will be tied to one pool or another.
> - I crea
I often read that small IO write and RBD are working better with bigger
filestore_max_sync_interval than default value.
Default value is 5 sec and I saw many post saying they are using 30 sec.
Also the slow request symptom is often linked to this parameter.
My journals are 10GB ( collocated with O
I've used the 400GB unit extensively for almost 18 months, one per six drives.
They've performed flawlessly.
In practice, journals will typically be quite small relative to the total
capacity of the SSD. As such, there will be plenty of room for wear leveling.
If there was some concern, one
We use the 800GB version as journal devices with up to an 1:18 ratio and have
had good experiences no bottleneck on the journal side. These also feature good
endurance characteristics. I would think that higher capacities are hard to
justify as journals
-Original Message-
From: ceph-use
Hi,
On 15/11/16 11:55, Craig Chi wrote:
> You can try to manually fix this by adding the
> /lib/systemd/system/ceph-mon.target file, which contains:
> and then execute the following command to tell systemd to start this
> target on bootup
> systemctl enable ceph-mon.target
This worked a treat
Hi Nick and other Cephers,
Thanks for your reply.
>2) Config Errors>This can be an easy one to say you are safe from. But I would
>say most outages and data loss incidents I have seen on the mailing>lists have
>been due to poor hardware choice or configuring options such as size=2,
>min_size=1
Never *ever* use nobarrier with ceph under *any* circumstances. I
cannot stress this enough.
-Sam
On Fri, Nov 18, 2016 at 10:39 AM, Craig Chi wrote:
> Hi Nick and other Cephers,
>
> Thanks for your reply.
>
>> 2) Config Errors
>> This can be an easy one to say you are safe from. But I would say
Hi,
MSI has an erasure coded ceph pool accessible by the radosgw interface.
We recently upgraded to Jewel from Hammer. Several days ago, we
experienced issues with a couple of the rados gateway servers and
inadvertently deployed older Hammer versions of the radosgw instances.
This configuration
On Fri, Nov 18, 2016 at 1:14 PM, Jeffrey McDonald wrote:
> Hi,
>
> MSI has an erasure coded ceph pool accessible by the radosgw interface.
> We recently upgraded to Jewel from Hammer. Several days ago, we
> experienced issues with a couple of the rados gateway servers and
> inadvertently deploye
+ceph-devel
On Fri, Nov 18, 2016 at 8:45 PM, Nick Fisk wrote:
> Hi All,
>
> I want to submit a PR to include fix in this tracker bug, as I have just
> realised I've been experiencing it.
>
> http://tracker.ceph.com/issues/9860
>
> I understand that I would also need to update the debian/ceph-osd
On 11/18/16 18:00, Thomas Danan wrote:
>
> I often read that small IO write and RBD are working better with
> bigger filestore_max_sync_interval than default value.
>
> Default value is 5 sec and I saw many post saying they are using 30 sec.
>
> Also the slow request symptom is often linked to this
I have a Cluster with 5 nodes Ceph. For some reason the sync down and now I
don't know what i can do to restore it.
# ceph -s
cluster 338bc0a5-c2f7-4c0a-9b35-25c7afee50c6
health HEALTH_WARN
1 pgs down
6 pgs incomplete
6 pgs stuck inactive
6 p
On Sat, Nov 19, 2016 at 6:59 AM, Brad Hubbard wrote:
> +ceph-devel
>
> On Fri, Nov 18, 2016 at 8:45 PM, Nick Fisk wrote:
>> Hi All,
>>
>> I want to submit a PR to include fix in this tracker bug, as I have just
>> realised I've been experiencing it.
>>
>> http://tracker.ceph.com/issues/9860
>>
>
This is like your mother telling not to cross the road when you were 4
years of age but not telling you it was because you could be flattened
by a car :)
Can you expand on your answer? If you are in a DC with AB power,
redundant UPS, dual feed from the electric company, onsite generators,
dual PSU
Yes, because these things happen
http://www.theregister.co.uk/2016/11/15/memset_power_cut_service_interruption/
We had customers who had kit in this DC.
To use your analogy, it's like crossing the road at traffic lights but
not checking cars have stopped. You might be OK 99%of the time, but
Many reasons:
1) You will eventually get a DC wide power event anyway at which point
probably most of the OSDs will have hopelessly corrupted internal xfs
structures (yes, I have seen this happen to a poor soul with a DC with
redundant power).
2) Even in the case of a single rack/node power failur
Hi, thanks.
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 device5
devic
Olá Bruno
I am not understanding your outputs.
On the first 'ceph -s' it says one mon is down but hour 'ceph health detail'
does not report it further.
On your crush map I count 7 osds= 0,1,2,3,4,6,7 but ceph -s says only 6 are
active.
Can you send the output of 'ceph osd tree, 'ceph osd df'
Just to update, this is still an issue as of the latest Git commit (
64bcf92e87f9fbb3045de49b7deb53aca1989123).
On Fri, Nov 11, 2016 at 1:31 PM, bobobo1...@gmail.com
wrote:
> Here's another: http://termbin.com/smnm
>
> On Fri, Nov 11, 2016 at 1:28 PM, Sage Weil wrote:
> > On Fri, 11 Nov 2016, b
Thanks Nick / Samuel,
It's definitely worthwhile to explain exactly why this is such a bad
idea. I think it will prevent people from ever doing it - rather than
just telling people not to do it.
On Sat, Nov 19, 2016 at 12:30 AM, Samuel Just wrote:
> Many reasons:
>
> 1) You will eventually get
39 matches
Mail list logo