ed.
So, I think I understand why it may not have worked before, but the
goalposts seem to have changed to a new problem.
Would appreciate any ideas...
Graham
On 06/08/2017 11:54 AM, Graham Allan wrote:
Sorry I didn't get to reply until now. The thing is I believe I *do*
have a lifecycle confi
;:testgta:default.6790451.1",
"status": "UNINITIAL"
}
]
then:
# radosgw-admin lc process
and all the (very old) objects disappeared from the test bucket.
Thanks!
Graham
On 06/28/2017 09:47 AM, Daniel Gryniewicz wrote:
This is almost certainly because it's
://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
000
Jul 11 17:15:18 hostname sh[40830]: command: Running command:
/usr/sbin/ceph-disk --verbose activate-lockbox /dev/sdy3
Jul 11 17:15:20 hostname sh[40830]: main_trigger:
Jul 11 17:15:20 hostname sh[40830]: main_trigger: get_dm_uuid: get_dm_uuid
/dev/sdy3 uuid path is /sys/dev/block/65:131/dm/u
"*"} which radosgw seems to accept without
discarding, however user gta2 still has no access.
So tried setting the principal to {"AWS": "arn:aws:iam:::user/gta2"}
This just resulted in a crash dump from radosgw... in summary
rgw_iam_policy.cc: In function 'boost
uses the same crash
- { "AWS": "arn:aws:iam::lemming:gta4"} causes principal discarded
of course... that is the user I am trying to grant access to. Possibly
the problem might be the blank tenant for the bucket owner?
Thanks,
Graham
On 07/12/2017 02:53 PM, Adam C. Emerson
"Resource": ["arn:aws:s3:::gta/*"]
}
]
}
but...
% s3cmd setpolicy s3policy s3://gta
ERROR: S3 error: 400 (InvalidArgument)
I have "debug rgw = 20" but nothing revealing in the logs.
Do you see anything obviously wrong in my policy file?
Thanks,
Graham
On
fetch an object or read bucket contents.
Admittedly I have no experience with AWS bucket policies so I could be
doing something dumb...
Thanks,
Graham
On 07/17/2017 06:33 PM, Graham Allan wrote:
Thanks for the update. I saw there was a set of new 12.1.1 packages
today so I updated to th
On 07/21/2017 02:23 AM, Pritha Srivastava wrote:
- Original Message -
From: "Pritha Srivastava"
- Original Message -
From: "Graham Allan"
I'm a bit surprised that allowing "s3:GetObject" doesn't seem to permit
reading the same obje
users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
decommissioning one machine at a time to
reinstall with CentOS 7 and Bluestore. I too don't see any reason the
mixed Jewel/Luminous cluster wouldn't work, but still felt less
comfortable with extending the upgrade duration.
Graham
--
Graham Allan
Minnesota Supercomputing
On 12/06/2017 03:20 AM, Wido den Hollander wrote:
Op 5 december 2017 om 18:39 schreef Richard Hesketh
:
On 05/12/17 17:10, Graham Allan wrote:
On 12/05/2017 07:20 AM, Wido den Hollander wrote:
I haven't tried this before but I expect it to work, but I wanted to
check before proceedin
# ceph osd crush remove osd.2
device 'osd.2' does not appear in the crush map
so I wonder where it's getting this warning from, and if it's erroneous,
how can I clear it?
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
efore
restarting, if it's useful to open an issue.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
osgw" from /etc/logrotate.d/ceph) from being SIGHUPed,
and to rotate the logs manually from time to time and completely restarting the radosgw
processes one after the other on my radosgw cluster.
Regards,
Martin
Am 08.12.17, 18:58 schrieb "ceph-users im Auftrag von Graham Allan
pathological buckets (multi-million objects in a single shard).
Thanks for any pointers,
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 12/14/2017 04:00 AM, Martin Emrich wrote:
Hi!
Am 13.12.17 um 20:50 schrieb Graham Allan:
After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the
same issues reported earlier on the list under "rgw resharding
operation seemingly won't end".
Yes, that were/ar
Does that make sense?
Thanks,
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
of luck? Or is object
lifecycle functionality available as soon as radosgw is upgraded?
Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputin
_
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
com/2016-March/008317.html
However unlike that thread, I'm not finding any other files with
duplicate names in the hierarchy.
I'm not sure there's much else I can do besides record the names of any
unfound objects before resorting to "mark_unfound_lost delete" - any
suggestions for further research?
Thanks,
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
gered
the problem being found.
Graham
On 02/12/2018 06:26 PM, Graham Allan wrote:
Hi,
For the past few weeks I've been seeing a large number of pgs on our
main erasure coded pool being flagged inconsistent, followed by them
becoming active+recovery_wait+inconsistent with unfound objects
Hi Greg,
On 02/14/2018 11:49 AM, Gregory Farnum wrote:
On Tue, Feb 13, 2018 at 8:41 AM Graham Allan <mailto:g...@umn.edu>> wrote:
I'm replying to myself here, but it's probably worth mentioning that
after this started, I did bring back the failed host, though
Actually now I notice that a pg reported as
active+recovery_wait+inconsistent by "ceph health detail" is shown as
active+recovering+inconsistent by "ceph pg list". That makes more sense
to me - "recovery_wait" implied to me that it was waiting for recovery
to start, whi
On 02/15/2018 05:33 PM, Gregory Farnum wrote:
On Thu, Feb 15, 2018 at 3:10 PM Graham Allan <mailto:g...@umn.edu>> wrote:
A lot more in xattrs which I won't paste, though the keys are:
> root@cephmon1:~# ssh ceph03 find
/var/lib/ceph/osd/ceph-295/current/70
On 02/16/2018 12:31 PM, Graham Allan wrote:
If I set debug rgw=1 and demug ms=1 before running the "object stat"
command, it seems to stall in a loop of trying communicate with osds for
pool 96, which is .rgw.control
10.32.16.93:0/2689814946 --> 10.31.0.68:6818/8969 --
osd_o
;.rgw",
"control_pool": ".rgw.control",
"gc_pool": ".rgw.gc",
"lc_pool": ".log:lc",
"log_pool": ".log",
"intent_log_pool": ".intent-log",
"usage_log_pool": ".usage",
"reshard_pool": ".log:reshard",
"user_keys_pool": ".users",
"user_email_pool": ".users.email",
"user_swift_pool": ".users.swift",
"user_uid_pool": ".users.uid",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": ".rgw.buckets.index",
"data_pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_type": 0,
"compression": ""
}
},
{
"key": "ec42-placement",
"val": {
"index_pool": ".rgw.buckets.index",
"data_pool": ".rgw.buckets.ec42",
"data_extra_pool": ".rgw.buckets.extra",
"index_type": 0,
"compression": ""
}
}
],
"metadata_heap": ".rgw.meta",
"tier_config": [],
"realm_id": "dbfd45d9-e250-41b0-be3e-ab9430215d5b"
}
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Does anyone know if this issue was corrected in Hammer 0.94.8?
http://tracker.ceph.com/issues/15002
It's marked as resolved but I don't see it listed in the release notes.
G.
--
Graham Allan
Systems Researcher - Minnesota Supercomputing Institute - g.
problem?
Thanks
Andrei
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-
he rgw is functional and
user clients can connect.
Hope that helps
andrei
- Original Message -
From: "Graham Allan"
To: "ceph-users"
Sent: Thursday, 6 October, 2016 20:04:38
Subject: Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to
10.2.3
get --rgw-zone=default
{
"id": "default",
"name": "default",
"domain_root": ".rgw",
"control_pool": ".rgw.control",
"gc_pool": ".rgw.gc",
"log_pool": ".log",
"intent_log_pool": ".intent-log",
"usage_log_pool": ".usage",
"user_keys_pool": ".users",
"user_email_pool": ".users.email",
"user_swift_pool": ".users.swift",
"user_uid_pool": ".users.uid",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [],
"metadata_heap": ".rgw.meta",
"realm_id": ""
}
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
may encountered http://tracker.ceph.com/issues/17371
If an hammer radosgw-admin runs on the jewel radosgw it corrupts the
configuration.
We are working on a fix for that.
Orit
On Fri, Oct 7, 2016 at 9:37 PM, Graham Allan wrote:
Dear Orit,
On 10/07/2016 04:21 AM, Orit Wasserman wrote:
Hi,
On Wed, Oct 5, 2016
of the rados gateway servers and
inadvertently deployed older Hammer versions of the radosgw instances.
This configuration was running for a couple of days. We removed the Hammer
versions and re-deployed the Jewel versions of the radosgw.S3cmd
querying of some of the buckets are now re
On 11/21/2016 04:44 PM, Yehuda Sadeh-Weinraub wrote:
On Mon, Nov 21, 2016 at 2:42 PM, Graham Allan wrote:
Following up to this (same problem, looking at it with Jeff)...
There was definite confusion with the zone/zonegroup/realm/period changes
during the hammer->jewel upgrade. It's
sam.2~10nUehK5BnyXdhhiOqTL2JdpLfDCd0k.11_76
lemming
edit "num_shards": 32
# radosgw-admin metadata put bucket.instance:tcga:default.712449.19 < tcga.json
and this bucket is now visible again! Thanks so much!
I wonder how this happened. It looks like this affects ~25/680 buckets.
Grah
mber of 15 minutes,
isn't it?
Interesting - seeing the same thing; restarted all OSDs on one node and
saw pretty much the same period of 15 minutes normal cpu use before
soaring up again (to loadav 3000 on this particular node).
--
Graham Allan
Minnesota Supercomputing Institute
On Thu, Dec 8, 2016 at 5:19 AM, Francois Lafont <
francois.lafont.1...@gmail.com> wrote:
> On 12/08/2016 11:24 AM, Ruben Kerkhof wrote:
>
> > I've been running this on one of my servers now for half an hour, and
> > it fixes the issue.
>
> It's the same for me. ;)
>
> ~$ ceph -v
> ceph version 10.
process if
you use the AES-XTS cipher, I suspect there might be a severe
performance impact without.
__ __
Also as Ceph+network itself brings a fair amount of overhead, I
wouldn’t suspect that dmcrypt would introduce any noticeable
overhead of its own.
--
Graham
se it was generated a long time ago under hammer?).
Thanks for any feedback,
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ing crush rule, to populate the
"take" command, so this shouldn't be significant.
At least I hope not. The relationship between the crush rule and the ec
profile, when changing the crush rule on an existing ec pool, is still
not very clear to me...
Graham
--
Graham Allan
Minneso
unlikely (radosgw
disabled).
So if 98 were marked lost would it roll back to the prior interval? I am
not certain how to interpret this information!
Running luminous 12.2.7 if it makes a difference.
Thanks as always for pointers,
Graham
On 07/18/2018 02:35 PM, Graham Allan wrote:
Like
On 09/14/2018 02:38 PM, Gregory Farnum wrote:
On Thu, Sep 13, 2018 at 3:05 PM, Graham Allan wrote:
However I do see transfer errors fetching some files out of radosgw - the
transfer just hangs then aborts. I'd guess this probably due to one pg stuck
down, due to a lost (failed HDD) o
On 09/17/2018 04:33 PM, Gregory Farnum wrote:
On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <mailto:g...@umn.edu>> wrote:
Looking back through history it seems that I *did* override the
min_size
for this pool, however I didn't reduce it - it used to have min_size 2!
I did naively try some "radosgw-admin bucket check [--fix]" commands
with no change.
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--op repair"...? There seems
little to lose by trying but there isn't a lot of documentation on the
operations available in ceph-objectstore-tool.
I also know of the option "osd_find_best_info_ignore_history_les" but
little of what it actually does, other than being
om/gtallan/e72b4461fb315983ae9a62cbbcd851d4/raw/0d30ceb315dd5567cb05fd0dc3e2e2c4975d8c01/pg70.b1c-query.txt
(Out of curiosity, is there any way to relate the first and last numbers
in an interval to an actual timestamp?)
Thanks,
Graham
On 10/03/2018 12:18 PM, Graham Allan wrote:
Following
Oops, by "periods" I do of course mean "intervals"...!
On 10/8/2018 4:57 PM, Graham Allan wrote:
I'm still trying to find a way to reactivate this one pg which is
incomplete. There are a lot of periods in its history based on a
combination of a peering storm a
On 10/9/2018 12:19 PM, Gregory Farnum wrote:
On Wed, Oct 3, 2018 at 10:18 AM Graham Allan <mailto:g...@umn.edu>> wrote:
However I have one pg which is stuck in state remapped+incomplete
because it has only 4 out of 6 osds running, and I have been unable to
bring the mi
On 10/09/2018 01:14 PM, Graham Allan wrote:
On 10/9/2018 12:19 PM, Gregory Farnum wrote:
I think unfortunately the easiest thing for you to fix this will be to
set the min_size back to 4 until the PG is recovered (or at least has
5 shards done). This will be fixed in a later version of
a s3, but one I have seen implicated in similar
crash logs for otehr OSDs and the etag again does not match; the other I
have not seen in crash logs and does generate a matching etag.
Opened a tracker issue for this: http://tracker.ceph.com/issues/36411
Graham
On 10/09/2018 06:55 PM, Graham Allan wrote:
.
Graham
On 10/15/2018 01:44 PM, Gregory Farnum wrote:
On Thu, Oct 11, 2018 at 3:22 PM Graham Allan
As the osd crash implies, setting "nobackfill" appears to let all the
osds keep running and the pg stays active and can apparently serve data.
If I track down the ob
a whenever they are not in this state. It is a 4+2 EC pool, so I
would think it possible to reconstruct any missing EC chunks.
It's an extensive problem; while I have been focusing on examining a
couple of specific pgs, the pool in general is showing 2410 pgs
inconsistent (out of 4096)
kets were created around
10/2016 - I suspect the placement policies were incorrect for a short
time due to confusion over the hammer->jewel upgrade (the
realm/period/zonegroup/zone conversion didn't really go smoothly!)
On 02/16/2018 11:39 PM, Robin H. Johnson wrote:
On Fri, Feb 16
wIXBgAAAGJseW5jaAkAAABCZW4gTHluY2gDA2IBAQYAAABibHluY2gPAQYAAABibHluY2gEAzcCAgQABgAAAGJseW5jagIEDwkAAABCZW4gTHluY2gAAA=="
},
{
"key": "user.rgw.idtag",
"val&
info is either failing or incorrect!
Did you run a local build w/ the linked patch? I think that would have
more effect than
I did just build a local copy of 12.2.2 with the patch - and it does
seem to fix it.
Thanks!
Graham
--
Graham Allan
Minnesota Supercomputing
1f727d8217e3ed74a1a3355f364f3
Author: David Zafman
Date: Mon Oct 9 08:19:21 2017 -0700
osd, mon: Add new pg states recovery_unfound and backfill_unfound
Signed-off-by: David Zafman
On 2/16/18 1:40 PM, Gregory Farnum wrote:
On Fri, Feb 16, 2018 at 12:17 PM Graham Allan wrote:
On 02
@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
s all users and buckets in the region/zonegroup?
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 03/17/2017 11:47 AM, Casey Bodley wrote:
On 03/16/2017 03:47 PM, Graham Allan wrote:
This might be a dumb question, but I'm not at all sure what the
"global quotas" in the radosgw region map actually do.
It is like a default quota which is applied to all users or buckets,
wi
scrub" also don't
clear the inconsistency.
Thanks for any suggestions,
G.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
adosgw-admin region-map set < regionmap.json
but this has no effect on jewel. There doesn't seem to be any analogous
function in the "period"-related commands which I think would be the
right place to look for jewel.
Am I missing something, or should I open a bug?
Graham
O
.conf.
Thanks,
Casey
On 03/27/2017 03:13 PM, Graham Allan wrote:
I'm following up to myself here, but I'd love to hear if anyone knows
how the global quotas can be set in jewel's radosgw. I haven't found
anything which has an effect - the documentation says to use:
radosgw-admi
cal zone
2017-05-19 15:28:46.453431 7feddc240c80 2 all 8 watchers are set, enabling
cache
2017-05-19 15:28:46.614991 7feddc240c80 2 removed watcher, disabling cache
"radosgw-admin lc list" seems to return "empty" output:
# radosgw-admin lc list
[]
tp://tracker.ceph.com/issues/20177[http://tracker.ceph.com/issues/20177]
<http://tracker.ceph.com/issues/20177%5Bhttp://tracker.ceph.com/issues/20177%5D>
As it's my first one, hope it's ok as it is...
Thanks & regards
Anton
, you probably need to add a
lifecycle configuration using the S3 API. It's not automatic and has to
be added per-bucket.
Here's some sample code for doing so: http://tracker.ceph.com/issues/19587
-Ben
On Tue, Jun 6, 2017 at 9:07 AM, Graham Allan <mailto:g...@umn.edu>> wrote:
do some path
parsing to produce nice output? If so, it may be playing a role in the
delay as well.
Eric
On 9/26/18 5:27 PM, Graham Allan wrote:
I have one user bucket, where inexplicably (to me), the bucket takes an
eternity to list, though only on the top level. There are two
subfolders, each
om/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
icies
are expected to work.
Currently using Luminous 12.2.8, if it matters.
Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
t the 12.2.9 nodes kept to wait for a future release - or wait on all?
Thanks, Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
sting data volume as a
mirror; wait for it to sync, then break the mirror and remove the
original disk.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/lis
I think I need a second set of eyes to understand some unexpected data
movement when adding new OSDs to a cluster (Luminous 12.2.11).
Our cluster ran low on space sooner than expected; so as a stopgap I
recommissioned a couple of older storage nodes while we get new hardware
purchases under wa
"fix themselves" without
apparent cause!
Graham
On 4/29/19 12:12 PM, Graham Allan wrote:
I think I need a second set of eyes to understand some unexpected data
movement when adding new OSDs to a cluster (Luminous 12.2.11).
Our cluster ran low on space sooner than expected; so as a st
73 matches
Mail list logo