Is there now a stable version of Ceph in Hammer and/or Infernalis whis
which we can safely use cache tier in write back mode ?
I saw few month ago a post saying that we have to wait for a next release
to use it safely.
___
ceph-users mailing list
ceph-use
Hello.
I've been testing Intel 3500 as journal store for few HDD-based OSD. I
stumble on issues with multiple partitions (>4) and UDEV (sda5, sda6,etc
sometime do not appear after partition creation). And I'm thinking that
partition is not that useful for OSD management, because linux do no
allow
Hi,
I'm facing this problem. The cluster is in Hammer 0.94.5
When i do a ceph health detail, i can see :
pg 8.c1 is stuck unclean for 21691.555742, current state
active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last
acting [140]
pg 8.c1 is stuck undersized for 21327.027365, cu
The OSD 140 is 73.61% used and its backfill_full_ratio is 0.85 too
-- Forwarded message --
From: Vincent Godin
Date: 2016-07-25 17:35 GMT+02:00
Subject: 1
active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???
To: ceph-users@lists.ceph.com
Hi,
I'm facing
I restart osd.80 and till now : no bakfill_toofull anymore
2016-07-25 17:46 GMT+02:00 M Ranga Swami Reddy :
> can you restart osd.80 and check see, if the recovery procced?
>
> Thanks
> Swami
>
> On Mon, Jul 25, 2016 at 9:05 PM, Vincent Godin
> wrote:
> > Hi,
>
If you have at least 2 hosts per room, you can use a k=3 and m=3 and
place 2 shards per room (one on each host). So you'll need 3 shards to
read the data : you can loose a room and one host in the two other
rooms and still get your data.It covers a double faults which is
better.
It will take more s
In addition to the points that you made :
I noticed on RAID0 disk that read IO errors are not always trapped by
ceph leading to unattended behaviour of the impacted OSD daemon.
On both RAID0 disk or non-RAID disk, a IO error is trapped on /var/log/messages
Oct 2 15:20:37 os-ceph05 kernel: sd 0:
We have some scrub errors on our cluster. A ceph pg repair x.xxx is
take in account only after hours. It seems to be linked to deep-scrubs
which are running at the same time. It 's look like it has to wait for
a slot before launching the repair. I have then two question :
is it possible to launch a
Yesterday we just encountered this bug. One OSD was looping on
"2018-01-03 16:20:59.148121 7f011a6a1700 0 log_channel(cluster) log
[WRN] : slow request 30.254269 seconds old, received at 2018-01-03
16:20:28.883837: osd_op(client.48285929.0:14601958 35.8abfc02e
.dir.0a3e5369-ff79-4f7d-b0b6-79c5a75b
Yesterday we had an outage on our ceph cluster. One OSD was looping on << [call
rgw.bucket_complete_op] snapc 0=[]
ack+ondisk+write+known_if_redirected e359833) currently waiting for
degraded object >> for hours blocking all the requests to this OSD and
then ...
We had to delete the degraded object
How to know the usage of an indexless bucket ? We need to have this
information for our billing process
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
As no response were given, i will explain what i found : maybe it
could help other people
.dirXXX object is an index marker with a 0 data size. The metadata
associated to this object (located in the levelDB of the OSDs
currently holding this marker) is the index of the bucket
corresponding to
Hello Alex,
We have a similar design. Two Datacenters at short distance (sharing
the same level 2 network) and one Datacenter at long range (more than
100km) for our Ceph cluster. Let's call these sites A1, A2 and B.
We set 2 Mons on A1, 2 Mons on A2 and 1 Mon on B. A1 and A2 shared a
same level
Hi,
In fact, when you increase your pg number, the new pgs will have to peer
first and during this time, a lot a pg will be unreachable. The best way to
upgrade the number of PG of a cluster (you 'll need to adjust the number of
PGP too) is :
- Don't forget to apply Goncalo advices to keep yo
When you increase your pg number, the new pgs will have to peer first and
during this time they will be unreachable.So you need to put the cluster in
maintenance mode for this operation.
The way to upgrade the number of PG and the PGP of a running cluster is :
- First, it's very important to
We have an Openstack which use Ceph for Cinder and Glance. Ceph is in
Hammer release and we need to upgrade to Jewel. My question is :
are the Hammer clients compatible with the Jewel servers ? (upgrade of Mon
then Ceph servers first)
As the upgrade of the Ceph client need a reboot of all the insta
After a test on a non production environment, we decided to upgrade our
running cluster to jewel 10.2.3. Our cluster has 3 monitors and 8 nodes of
20 disks. The cluster is in hammer 0.94.5 with tunables set to "bobtail".
As the cluster is in production and it wasn't possible to upgrade ceph
client
Hello,
We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial
was hammer 0.94.5) but we have still some big problems on our production
environment :
- some ceph filesystem are not mounted at startup and we have to mount
them with the "/bin/sh -c 'flock /var/lock/ceph-disk
Hello,
We had our cluster failed again this morning. It took almost the day to
stabilize.Here are some problems in OSD's logs we have encountered :
*Some OSDs refused to start :*
-1> 2016-11-23 15:50:49.507588 7f5f5b7a5800 -1 osd.27 196774 load_pgs: have
pgid 9.268 at epoch 196874, but missing m
Hello,
I didn't look at your video but i already can tell you some tracks :
1 - there is a bug in 10.2.2 which make the client cache not working. The
client cache works as it never recieved a flush so it will stay in
writethrough mode. This bug is clear in 10.2.3
2 - 2 SSDs in JBOD and 12 x 4TB
Hello Cephers,
if i had to go for production today, which release should i choose :
Luminous or Mimic ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Ceph cluster in Jewel 10.2.11
Mons & Hosts are on CentOS 7.5.1804 kernel 3.10.0-862.6.3.el7.x86_64
Everyday, we can see in ceph.log on Monitor a lot of logs like these :
2018-10-02 16:07:08.882374 osd.478 192.168.1.232:6838/7689 386 :
cluster [WRN] map e612590 wrongly marked me down
2018-10-02 16
Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?
Thx
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
complete manual
Le lun. 15 oct. 2018 à 14:26, Matthew Vernon a écrit :
>
> Hi,
>
> On 15/10/18 11:44, Vincent Godin wrote:
> > Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?
>
> No, but there is some --help output:
>
> root@sto-1-1:~
Hi,
As i understand it, you'll have one RAID1 of two SSDs for 12 HDDs. A
WAL is used for all writes on your host. If you have good SSDs, they
can handle 450-550 MBpsc. Your 12 HDDs SATA can handle 12 x 100 MBps
that is to say 1200 GBps. So your RAID 1 will be the bootleneck with
this design. A goo
Two monthes ago, we had a simple crushmap :
- one root
- one region
- two datacenters
- one room per datacenter
- two pools per room (one SATA and one SSD)
- hosts in SATA pool only
- osds in host
So we created a ceph pool at the level SATA on each site.
After some disk problems which impacted alm
We are using a production cluster which started in Firefly, then moved to
Giant, Hammer and finally Jewel. So our images have different features
correspondind to the value of "rbd_default_features" of the version when
they were created.
We have actually three pack of features activated :
image with
ht be hitting this issue [1] where mkfs is issuing lots of
> discard operations. If you get a chance, can you retest w/ the "-E
> nodiscard" option?
>
> Thanks
>
> [1] http://tracker.ceph.com/issues/16689
>
> On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin
> wr
with
mkfs.ext4, there is always one process over the 16 (we have 16 volumes)
which hangs !
2017-01-16 17:45 GMT+01:00 Jason Dillaman :
> Are you using krbd directly within the VM or librbd via
> virtio-blk/scsi? Ticket #9071 is against krbd.
>
> On Mon, Jan 16, 2017 at 11:34 AM, Vincent Godin
O requests between librbd and the OSDs. I
> would also check your librbd logs to see if you are seeing an error
> like "heartbeat_map is_healthy 'tp_librbd thread tp_librbd' had timed
> out after 60" being logged periodically, which would indicate a thread
> deadlock w
I created 2 users : jack & bob inside a tenant_A
jack created a bucket named BUCKET_A and want to give read access to the
user bob
with s3cmd, i can grant a user without tenant easylly: s3cmd setacl
--acl-grant=read:user s3://BUCKET_A
but with an explicit tenant, i tried :
--acl-grant=read:bob
--
>On 02/17/2017 06:25 PM, Vincent Godin wrote:
>> I created 2 users : jack & bob inside a tenant_A
>> jack created a bucket named BUCKET_A and want to give read access to the
>> user bob
>>
>> with s3cmd, i can grant a user without tenant easylly: s3cmd setacl
&
First of all, don't do a ceph upgrade while your cluster is in warning or
error state. A process upgrade must be done from an clean cluster.
Don't stay with a replicate at 2. Majority of problems come from that
point: just look the advices given by experience users of the list. You
should set a re
when you replace a failed osd, it has to recover all of its pgs and so it
is pretty busy. Is it possible to tell the OSD to not become primary for
any of its already synchronized pgs till every pgs (of the OSD) have
recover ? It should accelerate the rebuild process because the OSD won't
have to se
When we use a replicated pool of size 3 for example, each data, a block of
4MB is written on one PG which is distributed on 3 hosts (by default). The
osd holding the primary will copy the block to OSDs holding the secondary
and third PG.
With erasure code, let's take a raid5 schema like k=2 and m=
Hi,
If you're using ceph-deploy, just run the command :
ceph-deploy osd prepare --overwrite-conf {your_host}:/dev/sdaa:/dev/sdaf2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
We had similar problem few month ago when migrating from hammer to
jewel. We encountered some old bugs (which were declared closed on
Hammer !!!l). We had some OSDs refusing to start because of lack of pg
map like yours, some others which were completly busy and start
declaring valid OSDs losts =>
Hi,
I need to import a new crushmap in production (the old one is the default
one) to define two datacenters and to isolate SSD from SATA disk. What is
the best way to do this without starting an hurricane on the platform ?
Till now, i was just using hosts (SATA OSD) on one datacenter with the
de
an existing pool, but
> it is in the man page. You learn something new everyday.
>
>
> [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg26017.html
> -
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>
>
>
I have an Ceph cluster which was designed for file store. Each host
have 5 SSDs write intensive of 400GB and 20 HDD of 6TB. So each HDD
have a WAL of 5 GB on SSD
If i want to put Bluestore on this cluster, i can only allocate ~75GB
of WAL and DB on SSD for each HDD which is far below the 4% limit o
The documentation tell to size the DB to 4% of the disk data ie 240GB
for a 6 TB disk. Plz gives more explanations when your answer disagree
with the documentation !
Le lun. 25 nov. 2019 à 11:00, Konstantin Shalygin a écrit :
>
> I have an Ceph cluster which was designed for file store. Each host
We encounter a strange behavior on our Mimic 13.2.6 cluster. A any
time, and without any load, some OSDs become unreachable from only
some hosts. It last 10 mn and then the problem vanish.
It 's not always the same OSDs and the same hosts. There is no network
failure on any of the host (because onl
42 matches
Mail list logo