Den mån 22 nov. 2021 kl 06:52 skrev GHui :
>
> I have do "systemctl restart ceph.target". But the osd service is not started.
> It's strange that the osd.2 is up, but I cann't find the osd service is up,
> or the osd container is up.
> [root@GHui cephconfig]# ceph osd df
> ID CLASS WEIGHT REWE
Den mån 22 nov. 2021 kl 09:03 skrev Janne Johansson :
>
> Den mån 22 nov. 2021 kl 06:52 skrev GHui :
> >
> > I have do "systemctl restart ceph.target". But the osd service is not
> > started.
> > It's strange that the osd.2 is up, but I cann't find the osd service is up,
> > or the osd container
Hi David,
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon
I think this is the reason. Although the page is describing a erasure coded
pool, I think it also applies to replicated pools. You may check that page and
try the steps described there.
Many of us deploy ceph as a solution to storage high-availability.
During the time, I've encountered a couple of moments when ceph refused to
deliver I/O to VMs even when a tiny part of the PGs were stuck in
non-active states due to challenges on the OSDs.
So I found myself in very unpleasant situ
On Sat, 20 Nov 2021 at 02:26, Yan, Zheng wrote:
> we have FS contain more than 40 billions small files.
>
That is an impressive statistic! Are you able to share the output of ceph
-s / ceph df /etc to get an idea of your cluster deployment?
thanks.
__
>
> Many of us deploy ceph as a solution to storage high-availability.
>
> During the time, I've encountered a couple of moments when ceph refused
> to
> deliver I/O to VMs even when a tiny part of the PGs were stuck in
> non-active states due to challenges on the OSDs.
I do not know what you me
> I do not know what you mean by this, you can tune this with your min size
and replication. It is hard to believe that exactly harddrives fail in the
same pg. I wonder if this is not more related to your 'non-default' config?
In my setup size=2 and min_size=1. I had cases when 1 PG being stuck in
Den mån 22 nov. 2021 kl 11:40 skrev Marius Leustean :
> > I do not know what you mean by this, you can tune this with your min size
> and replication. It is hard to believe that exactly harddrives fail in the
> same pg. I wonder if this is not more related to your 'non-default' config?
>
> In my se
> In my setup size=2 and min_size=1
just don't.
> Real case: host goes down, individual OSDs from other hosts started
consuming >100GB RAM during backfill and get OOM-killed
configure your cluster in a better way can help
There will never be a single system that redundant that it has 100% uptim
On Monday, November 22, 2021 at 12:39 Marius Leustean
wrote:
> In my setup size=2 and min_size=1.
I'm sorry, but that's the root cause of the problems you're seeing. You really
want size=3, min_size=2 for your production cluster unless you have some
specific uncommon use case and you really k
Can anyone help me with these questions?
On Sun, Nov 21, 2021 at 11:23 AM mahnoosh shahidi
wrote:
> Hi,
>
> Running cluster in octopus 15.2.12 . We have a big bucket with about 800M
> objects and resharding this bucket makes many slow ops in our bucket index
> osds. I wanna know what happens if
yeah. The starting patch works.
Now the stopping side is still missing. Do you have some patches to
https://tracker.ceph.com/issues/53327 already, which I could test?
Thanks,
Manuel
On Fri, 19 Nov 2021 11:56:41 +0100
Manuel Lausch wrote:
> Nice. Just now I building a 16.2.6 relese with this pa
Hello,
We are looking to replace the 36 aging 4TB HDDs in our 6 OSD machines
with 36x 4TB SATA SSDs.
There's obviously a big range of prices for large SSDs so I would
appreciate any recommendations of Manufacturer/models to consider/avoid.
I expect the balance to be between
price/performan
As the price for SSDs is the same regardless of the interface, I would not
invest so much money in a still slow and outdated platform.
Just buy some new chassis as well and go NVMe. It adds only a little cost
but will increase performance drastically.
--
Martin Verges
Managing director
Mobile: +4
Yes it is on:
# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.001867",
"last_optimize_started": "Mon Nov 22 13:10:24 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s)
pg_num is decreasing, or distribution is
On 22/11/2021 12:59, Martin Verges wrote:
As the price for SSDs is the same regardless of the interface, I would
not invest so much money in a still slow and outdated platform.
Just buy some new chassis as well and go NVMe. It adds only a little
cost but will increase performance drastically.
I just had a look at the balance docs and it says "No adjustments will be
made to the PG distribution if the cluster is degraded (e.g., because an
OSD has failed and the system has not yet healed itself).". That implies
that the balancer won't run until the disruption caused by the removed OSD
has
Hi,
We were in the same position as you, and replaced our 24 4TB harddisks
with Samsung PM883 .
They seem to work quite nicely, and their wearout (after one year) is
still at 1% for our use.
MJ
Op 22-11-2021 om 13:57 schreef Luke Hall:
Hello,
We are looking to replace the 36 aging 4TB HD
On 22/11/2021 15:18, mj wrote:
Hi,
We were in the same position as you, and replaced our 24 4TB harddisks
with Samsung PM883 .
They seem to work quite nicely, and their wearout (after one year) is
still at 1% for our use.
Thanks, that's really useful to know.
Op 22-11-2021 om 13:57 schre
Am 22.11.21 um 16:25 schrieb Luke Hall:
> On 22/11/2021 15:18, mj wrote:
>> Hi,
>>
>> We were in the same position as you, and replaced our 24 4TB harddisks with
>> Samsung PM883 .
>>
>> They seem to work quite nicely, and their wearout (after one year) is still
>> at 1% for our use.
>
> Thanks,
I believe this is a fairly straight-forward question, but is it true that any PG not in "active+..." (Peering, down, etc.) blocks writes to the entire pool? I'm also wondering if there are methods for improving peering times for placement groups.
Thanks,
Eric
Thanks Patrick and Dan,
I conclude that it is caused by a large amount of inotify watches that keeps
the inode from being evicted. I used this script [1] and found that the number
of watches matched the num_caps. And if I kill the process (VS Code server in
our case) holding the inotify instanc
On Mon, Nov 22, 2021 at 4:52 PM Nico Schottelius
wrote:
>
>
> Peter Lieven writes:
> > Whatever SSD you choose, look if they support power-loss-protection and
> > make sure you disable the write cache.
>
> I have read this statement multiple times now, but I am still puzzled by
> the disabling w
This depends on how the write cache is implemented and where the cache is.
If its on a caching controller that has a BBU then it depends on what happens
when a f_sync is issued.
If it forces the write to go down to the underlying devices then it could be a
bad thing.
With many caching controll
> This depends on how the write cache is implemented and where the cache is.
Exactly!
> With many caching controllers as the controller is the end device that you
> connect to at an OS level then you can get substantial performance increases
> by having the cache enabled.
A couple of years
Oh, I misread your initial email and thought you were on hard drives.
These do seem slow for SSDs.
You could try tracking down where the time is spent; perhaps run
strace and see which calls are taking a while, and go through the op
tracker on the MDS and see if it has anything that's obviously ta
Yes, we were a little bit concerned about the write endurence of those
drives. There are SSD with much higher DWPD endurance, but we expected
that we would not need the higher endurance. So we decided not to pay
the extra price.
Turns out to have been a good guess. (educated guess, but still)
I have direct experience with SATA SSDs used for RBD with an active public
cloud (QEMU/KVM) workload. Drives rated ~ 1 DWPD after 3+ years of service
consistently reported <10% of lifetime used.
SMART lifetime counters are often (always?) based on rated PE cycles, which I
would expect to be mo
hi,all:
In the process of using RGW, I still cannot authenticate users through
IAM. In the near future, will RGW support IAM to manage user permissions
and authentication functions?
Looking forward to your reply 😁
___
ceph-users mailing list -- ceph
Hi Nio,
Can you provide more details around what you are trying to do?
RGW supports attaching IAM policies to users that aid in managing their
permissions.
Thanks,
Pritha
On Tue, Nov 23, 2021 at 11:43 AM nio wrote:
> hi,all:
> In the process of using RGW, I still cannot authenticate users
Den mån 22 nov. 2021 kl 16:36 skrev Stephen Smith6 :
>
> I believe this is a fairly straight-forward question, but is it true that any
> PG not in "active+..." (Peering, down, etc.) blocks writes to the entire pool?
I'm not sure if this is strictly true, but in the example of say a VM
having a 40
Den tis 23 nov. 2021 kl 05:56 skrev GHui :
>
> I use "systemctl start/stop ceph.target" to start and stop Ceph Cluster.
> Maybe this is problem. Because of I restart the computer. The osd is all up.
> Is there any way to safe restart Ceph Cluster?
That is how you stop and start all ceph services
Yes it recovered when I put the OSD back in. The issue is that it fails to
sort itself out when I remove that OSD even though I have loads of space
and 8 other OSDs in 4 different zones to choose from. The weights are very
different (some 3.2 others 0.36) and that post I found suggested that this
m
33 matches
Mail list logo