Greetings -
I have created several test buckets in radosgw, to test different
expiration durations:
$ s3cmd mb s3://test2d
I set a lifecycle for each of these buckets:
$ s3cmd setlifecycle lifecycle2d.xml s3://test2d --signature-v2
The files look like this:
http://s3.amazonaws.com/doc/
In my cluster, rados bench shows about 1GB/s bandwidth. I've done some
tuning:
[osd]
osd op threads = 8
osd disk threads = 4
osd recovery max active = 7
I was hoping to get much better bandwidth. My network can handle it, and
my disks are pretty fast as well. Are there any major tunables I c
that.
Does "rados bench" show a near maximum of what a cluster can do? Or is it
possible that I can tune it to get more bandwidth?
On Fri, Nov 10, 2017 at 3:43 AM, John Spray wrote:
> On Fri, Nov 10, 2017 at 4:29 AM, Robert Stanford
> wrote:
> >
> > In my cluster
;
> Denes
>
> On 11/10/2017 05:10 PM, Robert Stanford wrote:
>
>
> The bandwidth of the network is much higher than that. The bandwidth I
> mentioned came from "rados bench" output, under the "Bandwidth (MB/sec)"
> row. I see from comparing mine to
But sorry, this was about "rados bench" which is run inside the Ceph
cluster. So there's no network between the "client" and my cluster.
On Fri, Nov 10, 2017 at 10:35 AM, Robert Stanford
wrote:
>
> Thank you for that excellent observation. Are there any rumor
ver as well.
>
> On Fri, Nov 10, 2017 at 6:29 AM, Robert Stanford
> wrote:
>
>>
>> In my cluster, rados bench shows about 1GB/s bandwidth. I've done some
>> tuning:
>>
>> [osd]
>> osd op threads = 8
>> osd disk threads = 4
>> osd rec
Once 'osd max write size' (90MB by default I believe) is exceeded, does
Ceph reject the object (which is coming in through RGW), or does it break
it up into smaller objects (of max 'osd max write size' size)? If it
breaks them up, does it read the fragments in parallel when they're
requested by
sstriper, but thanks for
bringing it to my attention. We are using the S3 interface of RGW
exclusively (nothing custom in there).
On Thu, Nov 16, 2017 at 9:41 AM, Wido den Hollander wrote:
>
> > Op 16 november 2017 om 16:32 schreef Robert Stanford <
> rstanford8...@gmail.com>:
>
I did some benchmarking with cosbench and found that successful uploads (as
shown in the output report) was not 100% unless I used the "hashCheck=True"
flag in the cosbench configuration file. Under high load, the percent
successful was significantly lower (say, 80%).
Has anyone dealt with object
I have an indexless pool that I plan to use with RGW, to store up to
billions of objects. I have the impression that because it's indexless,
the performance won't taper off as the buckets and pools grow.
Would there be any point / enhancement from dividing our users into more
than one pool, or
I am installing with ceph-deploy using the instructions at
http://docs.ceph.com/docs/master/install/get-packages/
ceph-deploy runs fine for the first node until it dies due to not finding
radosgw package. I have verified on that node (apt-cache search radosgw,
apt-get install radosgw) that this
I would like to use the new object lifecycle feature of kraken /
luminous. I have jewel, with buckets that have lots and lots of objects.
It won't be practical to move them, then move them back after upgrading.
In order to use the object lifecycle feature of radosgw in
kraken/luminous, do I nee
Hello Ceph users. Is object lifecycle (currently expiration) for rgw
implementable on a per-object basis, or is the smallest scope the bucket?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Do the recommendations apply to both data and journal SSDs equally?
On Tue, Jul 10, 2018 at 12:59 PM, Satish Patel wrote:
> On Tue, Jul 10, 2018 at 11:51 AM, Simon Ironside
> wrote:
> > Hi,
> >
> > On 10/07/18 16:25, Satish Patel wrote:
> >>
> >> Folks,
> >>
> >> I am in middle or ordering har
I installed my OSDs using ceph-disk. The journals are SSDs and are 1TB.
I notice that Ceph has only dedicated 5GB each to the four OSDs that use
the journal.
1) Is this normal
2) Would performance increase if I made the partitions bigger?
Thank you
__
In a recent thread the Samsung SM863a was recommended as a journal SSD.
Are there any recommendations for data SSDs, for people who want to use
just SSDs in a new Ceph cluster?
Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://list
ander wrote:
>
>
> On 07/11/2018 10:10 AM, Robert Stanford wrote:
> >
> > In a recent thread the Samsung SM863a was recommended as a journal
> > SSD. Are there any recommendations for data SSDs, for people who want
> > to use just SSDs in a new Ceph cluster?
>
ingly similar disks might be
> just completely bad, for
> example, the Samsung PM961 is just unusable for Ceph in our experience.
>
> Paul
>
> 2018-07-11 10:14 GMT+02:00 Wido den Hollander :
>
>>
>>
>> On 07/11/2018 10:10 AM, Robert Stanford wrote:
>> >
&
Any opinions on the Dell DC S3520 (for journals)? That's what I have,
stock and I wonder if I should replace them.
On Wed, Jul 11, 2018 at 8:34 AM, Simon Ironside
wrote:
>
> On 11/07/18 14:26, Simon Ironside wrote:
>
> The 2TB Samsung 850 EVO for example is only rated for 300TBW (terabytes
>>
I saw this in the Luminous release notes:
"Each OSD now adjusts its default configuration based on whether the
backing device is an HDD or SSD. Manual tuning generally not required"
Which tuning in particular? The ones in my configuration are
osd_op_threads, osd_disk_threads, osd_recovery_max
This is what leads me to believe it's other settings being referred to as
well:
https://ceph.com/community/new-luminous-rados-improvements/
*"There are dozens of documents floating around with long lists of Ceph
configurables that have been tuned for optimal performance on specific
hardware or fo
I'm using filestore now, with 4 data devices per journal device.
I'm confused by this: "BlueStore manages either one, two, or (in certain
cases) three storage devices."
(
http://docs.ceph.com/docs/luminous/rados/configuration/bluestore-config-ref/
)
When I convert my journals to bluestore, wil
I am upgrading my clusters to Luminous. We are already using rados
gateway, and index max shards has been set for the rgw data pools. Now we
want to use Luminous dynamic index resharding. How do we make this
transition?
Regards
___
ceph-users mailing
Golden advice. Thank you Greg
On Mon, Jul 16, 2018 at 1:45 PM, Gregory Farnum wrote:
> On Fri, Jul 13, 2018 at 2:50 AM Robert Stanford
> wrote:
>
>>
>> This is what leads me to believe it's other settings being referred to
>> as well:
>> https:/
Looking here:
https://ceph.com/geen-categorie/the-ceph-and-tcmalloc-performance-story/
I see that it was a good idea to change to JEMalloc. Is this still the
case, with up to date Linux and current Ceph?
___
ceph-users mailing list
ceph-users@lists.cep
I have ceph clusters in a zone configured as active/passive, or
primary/backup. If the network link between the two clusters is slower
than the speed of data coming in to the active cluster, what will
eventually happen? Will data pool on the active cluster until memory runs
out?
It seems that the Ceph community no longer recommends changing to
jemalloc. However this also recommends to do what's in this email's
subject:
https://ceph.com/geen-categorie/the-ceph-and-tcmalloc-performance-story/
Is it still recommended to increase the tcmalloc thread cache bytes, or is
that
I am following the steps here:
http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
The final step is:
ceph-volume lvm create --bluestore --data $DEVICE --osd-id $ID
I notice this command doesn't specify a device to use as the journal. Is
it implied that BlueStore will use
eph will use the one provided disk for all
> data and RocksDB/WAL.
> Before you create that OSD you probably should check out the help page for
> that command, maybe there are more options you should be aware of, e.g. a
> separate WAL on NVMe.
>
> Regards,
> Eugen
>
>
&
I already have a set of default.rgw.* pools. They are in use. I want to
convert to multisite. The tutorials show to create new pools
(zone.rgw.*). Do I have to destroy my old pools and lose all data, in
order to convert to multisite?
___
ceph-users m
I have a Luminous Ceph cluster that uses just rgw. We want to turn it
into a mult-site installation. Are there instructions online for this?
I've been unable to find them.
-R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com
I have a Jewel Ceph cluster with RGW index sharding enabled. I've
configured the index to have 128 shards. I am upgrading to Luminous. What
will happen if I enable dynamic bucket index resharding in ceph.conf? Will
it maintain my 128 shards (the buckets are currently empty), and will it
split
According to the instructions to upgrade a journal to BlueStore (
http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/),
the OSD that uses the journal is destroyed and recreated.
I am using SSD journals, and want to use them with BlueStore. Reusing the
SSD requires zapping the
tion at a time and replace the
> OSD. Or am I misunderstanding the question?
>
> Regards,
> Eugen
>
>
> Zitat von Bastiaan Visser :
>
>
> As long as your fault domain is host (or even rack) you're good, just take
>> out the entire host and recreate all osd's
I was surprised to see an email on this list a couple of days ago, which
said that write performance would actually fall with BlueStore. I thought
the reason BlueStore existed was to increase performance. Nevertheless, it
seems like filestore is going away and everyone should upgrade.
My quest
Just FYI. I asked about cluster names a month or two back and was told
that support for them is being phased out. I've had all sorts of problems
using clusters with cluster names, and stopped using it myself.
On Fri, Aug 10, 2018 at 2:06 AM, Glen Baars
wrote:
> I have now gotten this working.
[root@monitor07]# ceph version
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)
[root@monitor07]# ceph mon feature ls
no valid command found; 10 closest matches:
mon compact
mon scrub
mon metadata {}
mon sync force {--yes-i-really-mean-it} {--i-know-what-i-am-doing}
I am keeping the wal and db for a ceph cluster on an SSD. I am using the
masif_bluestore_block_db_size / masif_bluestore_block_wal_size parameters
in ceph.conf to specify how big they should be. Should these values be the
same, or should one be much larger than the other?
R
der wrote:
>
>
> On 08/15/2018 04:17 AM, Robert Stanford wrote:
> > I am keeping the wal and db for a ceph cluster on an SSD. I am using
> > the masif_bluestore_block_db_size / masif_bluestore_block_wal_size
> > parameters in ceph.conf to specify how big they should be.
05:57 PM, Robert Stanford wrote:
> >
> > Thank you Wido. I don't want to make any assumptions so let me verify,
> > that's 10GB of DB per 1TB storage on that OSD alone, right? So if I
> > have 4 OSDs sharing the same SSD journal, each 1TB, there are 4 10 GB DB
&g
I am turning off resharding for Luminous with rgw dynamic resharding =
false on the rgw server. When I show the configuration on that server
(with ceph daemon), I see that it is false, like I expect. When I show the
configuration on the monitor servers, that setting shows up as "true". Do
I nee
I am following the steps to my filestore journal with a bluestore journal (
http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/). It
is broken at ceph-volume lvm create. Here is my error:
--> Zapping successful for: /dev/sdc
Preparing sdc
Running command: /bin/ceph-authtool --
n Fri, Aug 17, 2018 at 5:55 AM Alfredo Deza wrote:
> On Thu, Aug 16, 2018 at 9:00 PM, Robert Stanford
> wrote:
> >
> > I am following the steps to my filestore journal with a bluestore
> journal
> > (http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migrati
.db command.
>
> Likely the command you want will end up being this after you create a
> partition on the SSD for the db/wal.
> `ceph-volume lvm create --osd-id 0 --bluestore --data /dev/sdc --block.db
> /dev/sdb1`
>
> On Fri, Aug 17, 2018 at 10:24 AM Robert Stanford
> wrote:
This is helpful, thanks. Since the example is only for block.db, does
that imply that the wal should (can efficiently) live on the same disk as
data?
R
On Fri, Aug 17, 2018 at 10:50 AM Alfredo Deza wrote:
> On Fri, Aug 17, 2018 at 11:47 AM, Robert Stanford
> wrote:
> >
> &g
I have created new OSDs for Ceph Luminous. In my Ceph.conf I have
specified that the db size be 10GB, and the wal size be 1GB. However when
I type ceph daemon osd.0 perf dump I get: bluestore_allocated": 5963776
I think this means that the bluestore db is using the default, and not the
value o
t functionality rolled out, but not
> ready unless you are using master (please don't use master :))
>
>
> >
> > On Wed, Aug 22, 2018 at 1:34 PM Robert Stanford >
> > wrote:
> >>
> >>
> >> I have created new OSDs for Ceph Luminous. In my Ceph.co
e initial creation, the space will not be used for the the
> DB. You cannot resize it.
>
> On Wed, Aug 22, 2018 at 3:39 PM Robert Stanford
> wrote:
>
>>
>> In my case I am using the same values for lvcreate and in the ceph.conf
>> (bluestore* settings). Since my
I installed a new Ceph cluster with Luminous, after a long time working
with Jewel. I created my RGW pools the same as always (pool create
default.rgw.buckets.data etc.), but they don't show up in ceph df with
Luminous. Has the command changed?
Thanks
R
___
I just installed a new luminous cluster. When I run this command:
ceph mgr module enable dashboard
I get this response:
all mgr daemons do not support module 'dashboard'
All daemons are Luminous (I confirmed this by runing ceph version).
Why would this error appear?
Thank you
R
_
Casey - this was exactly it. My ceph-mgr had issues. I didn't know this
was necessary for ceph df to work. Thank you
R
On Fri, Aug 24, 2018 at 8:56 AM Casey Bodley wrote:
>
>
> On 08/23/2018 01:22 PM, Robert Stanford wrote:
> >
> > I installed a new Ceph clus
I am following the procedure here:
http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
When I get to the part to run "ceph osd safe-to-destroy $ID" in a while
loop, I get a EINVAL error. I get this error when I run "ceph osd
safe-to-destroy 0" on the command line by itself, to
I installed a new Luminous cluster. Everything is fine so far. Then I
tried to start RGW and got this error:
2018-08-31 15:15:41.998048 7fc350271e80 0 rgw_init_ioctx ERROR:
librados::Rados::pool_create returned (34) Numerical result out of range
(this can be due to a pool or placement group mi
that did show
>> up made it go past the mon_max_pg_per_osd ?
>>
>>
>> Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford <
>> rstanford8...@gmail.com>:
>>
>>>
>>> I installed a new Luminous cluster. Everything is fine so far.
Awhile back the favorite SSD for Ceph was the Samsung SM863a. Are there
any larger SSDs that are known to work well with Ceph? I'd like around 1TB
if possible. Is there any better alternative to the SM863a?
Regards
R
___
ceph-users mailing list
c
Our OSDs are BlueStore and are on regular hard drives. Each OSD has a
partition on an SSD for its DB. Wal is on the regular hard drives. Should
I move the wal to share the SSD with the DB?
Regards
R
___
ceph-users mailing list
ceph-users@lists.ceph.
Hello,
Does object expiration work on indexless (blind) buckets?
Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
When I am running at full load my radosgw process uses 100% of one CPU
core (and has many threads). I have many idle cores. Is it common for
people to run several radosgw processes on their gateways, to take
advantage of all their cores?
___
ceph-users
This is a known issue as far as I can tell, I've read about it several
times. Ceph performs great (using radosgw), but as the OSDs fill up
performance falls sharply. I am down to half of empty performance with
about 50% disk usage.
My questions are: does adding more OSDs / disks to the cluster
I read a couple of versions ago that ceph-deploy was not recommended for
production clusters. Why was that? Is this still the case? We have a lot
of problems automating deployment without ceph-deploy.
___
ceph-users mailing list
ceph-users@lists.ceph.
I used this command to purge my rgw data:
rados purge default.rgw.buckets.data --yes-i-really-really-mean-it
Now, when I list the buckets with s3cmd, I still see the buckets (s3cmd ls
shows a listing of them.) When I try to delete one (s3cmd rb) I get this:
ERROR: S3 error: 404 (NoSuchKey)
Ok. How do I fix what's been broke? How do I "rebuild my index"?
Thanks
On Wed, Apr 11, 2018 at 1:49 AM, Robin H. Johnson
wrote:
> On Tue, Apr 10, 2018 at 10:06:57PM -0500, Robert Stanford wrote:
> > I used this command to purge my rgw data:
> >
> >
I have 65TB stored on 24 OSDs on 3 hosts (8 OSDs per host). SSD journals
and spinning disks. Our performance before was acceptable for our purposes
- 300+MB/s simultaneous transmit and receive. Now that we're up to about
50% of our total storage capacity (65/120TB, say), the write performance i
I deleted my default.rgw.buckets.data and default.rgw.buckets.index pools
in an attempt to clean them out. I brought this up on the list and
received replies telling me essentially, "You shouldn't do that." There
was however no helpful advice on recovering.
When I run 'radosgw-admin bucket lis
Iperf gives about 7Gb/s between a radosgw host and one of my OSD hosts (8
disks, 8 OSD daemons, one of 3 OSD hosts). When I benchmark radosgw with
cosbench I see high TCP retransmission rates (from sar -n ETCP 1). I don't
see this with iperf. Why would Ceph, but not iperf, cause high TCP
retran
I should have been more clear. The TCP retransmissions are on the OSD
host.
On Sun, Apr 15, 2018 at 1:48 PM, Paweł Sadowski wrote:
> On 04/15/2018 08:18 PM, Robert Stanford wrote:
>
>>
>> Iperf gives about 7Gb/s between a radosgw host and one of my OSD hosts
>> (8 d
bucket info for bucket="bucket5",
On Mon, Apr 16, 2018 at 8:30 AM, Casey Bodley wrote:
>
>
> On 04/14/2018 12:54 PM, Robert Stanford wrote:
>
>
> I deleted my default.rgw.buckets.data and default.rgw.buckets.index pools
> in an attempt to clean them out. I brou
The rule of thumb is not to have tens of millions of objects in a radosgw
bucket, because reads will be slow. If using bucket index sharding (with
128 or 256 shards), does this eliminate this concern? Has anyone tried
tens of millions (20-40M) of objects with sharded indexes?
Thank you
___
If I use another cluster name (other than the default "ceph"), I've
learned that I have to create symlinks in /var/lib/ceph/osd/ with
[cluster-name]-[osd-num] that symlink to ceph-[osd-num]. The ceph-disk
command doesn't seem to take a --cluster argument like other commands.
Is this a known iss
testing for
> all the little things like this. :/
> -Greg
>
> On Fri, Apr 20, 2018 at 10:18 AM Robert Stanford
> wrote:
>
>>
>> If I use another cluster name (other than the default "ceph"), I've
>> learned that I have to create symlinks in /var/
nked to our actual config file for multiple ceph
> tools. It was never a widely adopted feature, and the nature open-source
> had a lot of people contributing tools that had never used or thought about
> clusters with different names.
>
> On Fri, Apr 20, 2018 at 4:56 PM Robert Stanford
In examples I see that each host has a section in ceph.conf, on every host
(host-a has a section in its conf on host-a, but there's also a host-a
section in the ceph.conf on host-b, etc.) Is this really necessary? I've
been using just generic osd and monitor sections, and that has worked out
fin
Listing will always take forever when using a high shard number, AFAIK.
That's the tradeoff for sharding. Are those 2B objects in one bucket?
How's your read and write performance compared to a bucket with a lower
number (thousands) of objects, with that shard number?
On Tue, May 1, 2018 at 7:59
g files and you want to clean them up. If I had someone
> with a bucket with 2B objects, I would force them to use an index-less
> bucket.
>
> That's me, though. I'm sure there are ways to manage a bucket in other
> ways, but it sounds awful.
>
> On Tue, May 1, 2018
I have Ceph set up and running RGW. I want to use multisite (
http://docs.ceph.com/docs/jewel/radosgw/multisite/), but I don't want to
delete my pools or lose any of my data. Is this possible, or do pools have
to be recreated when changing a cluster zone / zonegroup to multisite?
___
After I started using multipart uploads to RGW, Ceph automatically created
a non-ec pool. It looks like it stores object pieces there until all the
pieces of a multipart upload arrive, then moves the completed piece to the
normal rgw data pool. Is this correct?
__
An email from this list stated that the wal would be created in the same
place as the db, if the db were specified when running ceph-volume lvm
create, and the db were specified on that command line. I followed those
instructions and like the other person writing to this list today, I was
surpris
show you the disk labels.
> ceph-bluestore-tool show-label --dev /dev/sda1
> On Sun, Oct 21, 2018 at 1:29 AM Robert Stanford
> wrote:
> >
> >
> > An email from this list stated that the wal would be created in the
> same place as the db, if the db were specified when r
t
On Sun, Oct 21, 2018 at 11:13 AM Serkan Çoban wrote:
> wal and db device will be same if you use just db path during osd
> creation. i do not know how to verify this with ceph commands.
> On Sun, Oct 21, 2018 at 4:17 PM Robert Stanford
> wrote:
> >
> >
> > Thanks Se
;
> > To: "ceph-users"
> > Sent: Saturday, 20 October, 2018 20:05:44
> > Subject: Re: [ceph-users] Drive for Wal and Db
> >
> > On 20/10/18 18:57, Robert Stanford wrote:
> >
> >
> >
> >
> > Our OSDs are BlueStore and are on r
var/lib/ceph/osd/ceph-{osd-num}/ and look at
> where the symlinks for block and block.wal point to.
>
> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford
> wrote:
>
>>
>> That's what they say, however I did exactly this and my cluster
>> utilization is higher tha
on a separate NVMe.
>>
>> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford
>> wrote:
>>
>>>
>>> David - is it ensured that wal and db both live where the symlink
>>> block.db points? I assumed that was a symlink for the db, but necessarily
>>>
Let me add, I have no block.wal file (which the docs suggest should be
there).
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
On Mon, Oct 22, 2018 at 3:13 PM Robert Stanford
wrote:
>
> We're out of sync, I think. You have your DB on your data d
9",
> "bluefs_wal_partition_path": "/dev/dm-41",
> "bluestore_bdev_partition_path": "/dev/dm-29",
>
> [2] $ ceph osd metadata 5 | grep path
> "bluefs_db_partition_path": "/dev/dm-5",
> "bluestore
Someone deleted our rgw data pool to clean up. They recreated it
afterward. This is fine in one respect, we don't need the data. But
listing with radosgw-admin still shows all the buckets. How can we clean
things up and get rgw to understand what actually exists, and what doesn't?
on like this
>
> Hab
> - Mehmet
>
> Am 21. Oktober 2018 19:39:58 MESZ schrieb Robert Stanford <
> rstanford8...@gmail.com>:
>>
>>
>> I did exactly this when creating my osds, and found that my total
>> utilization is about the same as the sum of the util
In the old days when I first installed Ceph with RGW the performance would
be very slow after storing 500+ million objects in my buckets. With
Luminous and index sharding is this still a problem or is this an old
problem that has been solved?
Regards
R
_
I have Mimic Ceph clusters that are hundreds of miles apart. I want to use
them in a multisite configuration. Will the latency between them cause any
problems?
Regards
R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listin
Is it possible to add and remove monitors in Mimic, using the new
centralized configuration method?
Regards
R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
unsubscribe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
90 matches
Mail list logo