Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-06-30 Thread Sean Purdy
On Fri, 30 Jun 2017, Lenz Grimmer said:
> > 1/ Stop testing filestore+btrfs for luminous onward.  We've recommended 
> > against btrfs for a long time and are moving toward bluestore anyway.
> 
> Searching the documentation for "btrfs" does not really give a user any
> clue that the use of Btrfs is discouraged.
> 
> Where exactly has this been recommended?

As a new user, I certainly picked up on btrfs being discouraged, or not as 
stable as XFS.

e.g.
http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/?highlight=btrfs

"We currently recommend XFS for production deployments.

We used to recommend btrfs for testing, development, and any non-critical 
deployments ..."


http://docs.ceph.com/docs/master/start/hardware-recommendations/?highlight=btrfs

"btrfs is not quite stable enough for production"
 

> If you want to get rid of filestore on Btrfs, start a proper deprecation
> process and inform users that support for it it's going to be removed in
> the near future. The documentation must be updated accordingly and it
> must be clearly emphasized in the release notes.

But this sounds sane.


Sean Purdy
CV-Library Ltd
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw hung when OS disks went readonly, different node radosgw restart fixed it

2017-07-31 Thread Sean Purdy

Hi,


Just had an incident in a 3-node test cluster running 12.1.1 on debian stretch

Each cluster had its own mon, mgr, radosgw, and osds.  Just object store.

I had s3cmd looping and uploading files via S3.

On one of the machines, the RAID controller barfed and dropped the OS disks.  
Or the disks failed.  TBC.  Anyway, / and /var went readonly.

The monitor on that machine found it couldn't write its logs and died.  But the 
OSDs stayed up - those disks didn't go readonly.


health: HEALTH_WARN
1/3 mons down, quorum store01,store03
osd: 18 osds: 18 up, 18 in
rgw: 3daemons active


The S3 process started timing out on connections to radosgw.  Even when talking 
to one of the other two radosgw instances.  (I'm RRing the DNS records at the 
moment).

I stopped the OSDs on that box.  No change.  I stopped radosgw on that box.  
Still no change.  The S3 upload process was still hanging/timing out.  A manual 
telnet to port 80 on the good nodes still hung.

"radosgw-admin bucket list" showed buckets &c

Then I restarted radosgw on one of the other two nodes.  After about a minute, 
the looping S3 upload process started working again.


So my questions:  Why did I have to manually restart radosgw on one of the 
other nodes?  Why didn't it either keep working, or e.g. start working when 
radosgw was stopped on the bad node?

Also where are the radosgw server/access logs?


I know it's probably an unusual edge case or something, but we're aiming for HA 
and redundancy.


Thanks!

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New OSD missing from part of osd crush tree

2017-08-10 Thread Sean Purdy
Luminous 12.1.1 rc


Our OSD osd.8 failed.  So we removed that.


We added a new disk and did:

$ ceph-deploy osd create  --dmcrypt --bluestore store02:/dev/sdd

That worked, created osd.18, OSD has data.

However, mgr output at http://localhost:7000/servers showed
osd.18 under a blank hostname and not e.g. on the node we attached it to.
But it is working.  "ceph osd tree" looks OK


The problem I see is:
When I do "ceph osd crush tree" I see the items list under the name:default~hdd 
tree:

device_class:hdd
name:store02~hdd
type:host

but my new drive is missing under this name - there are 5 OSDs, not 6.


*However*, if I look further down under the name:default tree

device_class:""
name:store02
type:host

I see all devices I am expecting, including osd.18


Is this something to worry about?  Or is there something needs fixing?  Health 
is warning for scrubbing reasons.


Output of related commands below.


Thanks for any help,

Sean Purdy


$ sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAMEUP/DOWN REWEIGHT PRI-AFF
-1   32.73651 root default
-3   10.91217 host store01
 0   hdd  1.81870 osd.0 up  1.0 1.0
 5   hdd  1.81870 osd.5 up  1.0 1.0
 6   hdd  1.81870 osd.6 up  1.0 1.0
 9   hdd  1.81870 osd.9 up  1.0 1.0
12   hdd  1.81870 osd.12up  1.0 1.0
15   hdd  1.81870 osd.15up  1.0 1.0
-5   10.91217 host store02
 1   hdd  1.81870 osd.1 up  1.0 1.0
 7   hdd  1.81870 osd.7 up  1.0 1.0
10   hdd  1.81870 osd.10up  1.0 1.0
13   hdd  1.81870 osd.13up  1.0 1.0
16   hdd  1.81870 osd.16up  1.0 1.0
18   hdd  1.81870 osd.18up  1.0 1.0
-7   10.91217 host store03
 2   hdd  1.81870 osd.2 up  1.0 1.0
 3   hdd  1.81870 osd.3 up  1.0 1.0
 4   hdd  1.81870 osd.4 up  1.0 1.0
11   hdd  1.81870 osd.11up  1.0 1.0
14   hdd  1.81870 osd.14up  1.0 1.0
17   hdd  1.81870 osd.17up  1.0 1.0


$ sudo ceph osd crush tree
[
{
"id": -8,
"device_class": "hdd",
"name": "default~hdd",
"type": "root",
"type_id": 10,
"items": [
{
"id": -2,
"device_class": "hdd",
"name": "store01~hdd",
"type": "host",
"type_id": 1,
"items": [
{
"id": 0,
"device_class": "hdd",
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 5,
"device_class": "hdd",
"name": "osd.5",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 6,
"device_class": "hdd",
"name": "osd.6",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 9,
"device_class": "hdd",
"name": "osd.9",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 12,
"device_class": "hdd",
"name": "osd.12",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},

[ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-15 Thread Sean Purdy
Luminous 12.1.1 rc1

Hi,


I have a three node cluster with 6 OSD and 1 mon per node.

I had to turn off one node for rack reasons.  While the node was down, the 
cluster was still running and accepting files via radosgw.  However, when I 
turned the machine back on, radosgw uploads stopped working and things like 
"ceph status" starting timed out.  It took 20 minutes for "ceph status" to be 
OK.  

In the recent past I've rebooted one or other node and the cluster kept 
working, and when the machine came back, the OSDs and monitor rejoined the 
cluster and things went on as usual.

The machine was off for 21 hours or so.

Any idea what might be happening, and how to mitigate the effects of this next 
time a machine has to be down for any length of time?


"ceph status" said:

2017-08-15 11:28:29.835943 7fdf2d74b700  0 monclient(hunting): authenticate 
timed out after 3002017-08-15 11:28:29.835993 
7fdf2d74b700  0 librados: client.admin authentication error (110) Connection 
timed out


monitor log said things like this before everything came together:

2017-08-15 11:23:07.180123 7f11c0fcc700  0 -- 172.16.0.43:0/2471 >> 
172.16.0.45:6812/1904 conn(0x556eeaf4d000 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER

but "ceph --admin-daemon /var/run/ceph/ceph-mon.xxx.asok quorum_status" did 
work.  This monitor node was detected but not yet in quorum.


OSDs had 15 minutes of

ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-9: (2) No such 
file or directory

before becoming available.


Advice welcome.

Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-16 Thread Sean Purdy
On Tue, 15 Aug 2017, Gregory Farnum said:
> On Tue, Aug 15, 2017 at 4:23 AM Sean Purdy  wrote:
> > I have a three node cluster with 6 OSD and 1 mon per node.
> >
> > I had to turn off one node for rack reasons.  While the node was down, the
> > cluster was still running and accepting files via radosgw.  However, when I
> > turned the machine back on, radosgw uploads stopped working and things like
> > "ceph status" starting timed out.  It took 20 minutes for "ceph status" to
> > be OK.

> > 2017-08-15 11:28:29.835943 7fdf2d74b700  0 monclient(hunting):
> > authenticate timed out after 3002017-08-15
> > 11:28:29.835993 7fdf2d74b700  0 librados: client.admin authentication error
> > (110) Connection timed out
> >
> 
> That just means the client couldn't connect to an in-quorum monitor. It
> should have tried them all in sequence though — did you check if you had
> *any* functioning quorum?

There was a functioning quorum - I checked with "ceph --admin-daemon 
/var/run/ceph/ceph-mon.xxx.asok quorum_status".  Well - I interpreted the 
output as functioning.  There was a nominated leader.
 

> > 2017-08-15 11:23:07.180123 7f11c0fcc700  0 -- 172.16.0.43:0/2471 >>
> > 172.16.0.45:6812/1904 conn(0x556eeaf4d000 :-1
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
> > l=0).handle_connect_reply connect got BADAUTHORIZER
> >
> 
> This one's odd. We did get one report of seeing something like that, but I
> tend to think it's a clock sync issue.

I saw some messages about clock sync, but ntpq -p looked OK on each server.  
Will investigate further.

 remote   refid  st t when poll reach   delay   offset  jitter
==
+172.16.0.16 129.250.35.250   3 u  847 1024  3770.2891.103   0.376
+172.16.0.18 80.82.244.1203 u   93 1024  3770.397   -0.653   1.040
*172.16.0.19 158.43.128.332 u  279 1024  3770.2440.262   0.158
 

> > ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-9: (2) No
> > such file or directory
> >
> And that would appear to be something happening underneath Ceph, wherein
> your data wasn't actually all the way mounted or something?

It's the machine mounting the disks at boot time - udev or ceph-osd.target 
keeps retrying until eventually the disk/OSD is mounted.  Or eventually it 
gives up.  Do the OSDs need a monitor quorum at startup?  It kept restarting 
OSDs for 20 mins.

Timing went like this:

11:22 node boot
11:22 ceph-mon starts, recovers logs, compaction, first BADAUTHORIZER message
11:22 starting disk activation for 18 partitions (3 per bluestore)
11:23 mgr on other node can't find secret_id
11:43 bluefs mount succeeded on OSDs, ceph-osds go live
11:45 last BADAUTHORIZER message in monitor log
11:45 this host calls and wins a monitor election, mon_down health check clears
11:45 mgr happy
 
 
> Anyway, it should have survived that transition without any noticeable
> impact (unless you are running so close to capacity that merely getting the
> downed node up-to-date overwhelmed your disks/cpu). But without some basic
> information about what the cluster as a whole was doing I couldn't
> speculate.

This is a brand new 3 node cluster.  Dell R720 running Debian 9 with 2x SSD for 
OS and ceph-mon, 6x 2Tb SATA for ceph-osd using bluestore, per node.  Running 
radosgw as object store layer.  Only activity is a single-threaded test job 
uploading millions of small files over S3.  There are about 5.5million test 
objects so far (additionally 3x replication).  This job was fine when the 
machine was down, stalled when machine booted.

Looking at activity graphs at the time, there didn't seem to be a network 
bottleneck or CPU issue or disk throughput bottleneck.  But I'll look a bit 
closer.

ceph-mon is on an ext4 filesystem though.   Perhaps I should move this to xfs?  
Bluestore is xfs+bluestore.

I presume it's a monitor issue somehow.


> -Greg

Thanks for your input.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-21 Thread Sean Purdy
Hi,

On Thu, 17 Aug 2017, Gregory Farnum said:
> On Wed, Aug 16, 2017 at 4:04 AM Sean Purdy  wrote:
> 
> > On Tue, 15 Aug 2017, Gregory Farnum said:
> > > On Tue, Aug 15, 2017 at 4:23 AM Sean Purdy 
> > wrote:
> > > > I have a three node cluster with 6 OSD and 1 mon per node.
> > > >
> > > > I had to turn off one node for rack reasons.  While the node was down, 
> > > > the
> > > > cluster was still running and accepting files via radosgw.  However, 
> > > > when I
> > > > turned the machine back on, radosgw uploads stopped working and things 
> > > > like
> > > > "ceph status" starting timed out.  It took 20 minutes for "ceph status" 
> > > > to
> > > > be OK.

> Did you try running "ceph -s" from more than one location? If you had a
> functioning quorum that should have worked. And any live clients should
> have been able to keep working.

I tried from more than one location, yes.
 

> > Timing went like this:
> >
> > 11:22 node boot
> > 11:22 ceph-mon starts, recovers logs, compaction, first BADAUTHORIZER
> > message
> > 11:22 starting disk activation for 18 partitions (3 per bluestore)
> > 11:23 mgr on other node can't find secret_id
> > 11:43 bluefs mount succeeded on OSDs, ceph-osds go live
> > 11:45 last BADAUTHORIZER message in monitor log
> > 11:45 this host calls and wins a monitor election, mon_down health check
> > clears
> > 11:45 mgr happy
> >
> 
> The timing there on the mounting (how does it take 20 minutes?!?!?) and
> everything working again certainly is suspicious. It's not the direct cause
> of the issue, but there may be something else going on which is causing
> both of them.
> 
> All in all; I'm confused.


I tried again today, having a node down for an hour.  This might be a different 
set of questions.


This time, after the store came up, OSDs caught up quickly.

But the monitor process on the rebooted node took 25 minutes to come back into 
quorum.  Is this normal?


2017-08-21 16:10:45.243323 7f3fb62b2700  0 
mon.store03@2(synchronizing).data_health(0) update_stats avail 94% total 211 
GB, used 914 MB, avail 200 GB
...
2017-08-21 16:38:45.251345 7f3fb62b2700  0 mon.store03@2(peon).data_health(298) 
update_stats avail 94% total 211 GB, used 1229 MB, avail 199 GB

What is the monitor process doing this time?  It didn't seem to be maxing out 
network, CPU or disk.


During this time, e.g. "ceph mon stat" on any node took 6 to 15s to return.  
Which I presume is a function of "mon client hunt interval".  But still seems 
long.

However, radosgw file transactions seemed to work fine during the entire 
process.  So it's probably working as designed.


Mon 21 Aug 16:30:06 BST 2017        
   
e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 294, leader 0 store01, quorum 0,1 store01,store02

real0m8.456s
user0m0.304s
sys 0m0.024s


Thanks for feedback, I'm still new to this.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-23 Thread Sean Purdy
On Tue, 15 Aug 2017, Sean Purdy said:
> Luminous 12.1.1 rc1
> 
> Hi,
> 
> 
> I have a three node cluster with 6 OSD and 1 mon per node.
> 
> I had to turn off one node for rack reasons.  While the node was down, the 
> cluster was still running and accepting files via radosgw.  However, when I 
> turned the machine back on, radosgw uploads stopped working and things like 
> "ceph status" starting timed out.  It took 20 minutes for "ceph status" to be 
> OK.  

Well I've figured out why "ceph status" was hanging (and possibly radosgw).  It 
seems that ceph utility looks at ceph.conf to find a monitor to connect to (or 
at least that's what strace implied), but our ceph.conf only had one monitor 
out of three actually listed in the file.  And that was the node I turned off.  
Updating mon_initial_members and mon_host with the other two monitors worked.

TBF, 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/administration_guide/managing_cluster_size
 does mention you should add your second and third monitors here.  But I hadn't 
read that, and elsewhere I read that on boot the monitors will discover other 
monitors, so I thought you didn't need to list them all.  e.g. 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address
 (which also says clients use ceph.conf to find monitors - I missed that part).

Anyway, I'll do a few more tests with a better ceph.conf


Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy
Hi,

Luminous 12.1.1

I've had a couple of servers where at cold boot time, one or two of the OSDs 
haven't mounted/been detected.  Or been partially detected.  These are luminous 
Bluestore OSDs.  Often a warm boot fixes it, but I'd rather not have to reboot 
the node again.

Sometimes /var/lib/ceph/osd/ceph-NN is empty - i.e. not mounted.  And sometimes 
/var/lib/ceph/osd/ceph-NN is mounted, but the /var/lib/ceph/osd/ceph-NN/block 
symlink is pointing to a /dev/mapper UUID path that doesn't exist.  Those 
partitions have to be mounted before "systemctl start ceph-osd@NN.service" will 
work.

What happens at disk detect and mount time?  Is there a timeout somewhere I can 
extend?

How can I tell udev to have another go at mounting the disks?

If it's in the docs and I've missed it, apologies.


Thanks in advance,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy
On Wed, 23 Aug 2017, David Turner said:
> This isn't a solution to fix them not starting at boot time, but a fix to
> not having to reboot the node again.  `ceph-disk activate-all` should go
> through and start up the rest of your osds without another reboot.

Thanks, will try next time.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Sean Purdy
Datapoint: I have the same issue on 12.1.1, three nodes, 6 disks per node.

On Thu, 31 Aug 2017, Piotr Dzionek said:
> For a last 3 weeks I have been running latest LTS Luminous Ceph release on
> CentOS7. It started with 4th RC and now I have Stable Release.
> Cluster runs fine, however I noticed that if I do a reboot of one the nodes,
> it takes a really long time for cluster to be in ok status.
> Osds are starting up, but not as soon as the server is up. They are up one
> by one during a period of 5 minutes. I checked the logs and all osds have
> following errors.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] s3cmd not working with luminous radosgw

2017-09-19 Thread Sean Purdy
On Tue, 19 Sep 2017, Yoann Moulin said:
> Hello,
> 
> Does anyone have tested s3cmd or other tools to manage ACL on luminous 
> radosGW ?

Don't know about ACL, but s3cmd for other things works for me.  Version 1.6.1


My config file includes (but is not limited to):

access_key = yourkey
secret_key = yoursecret
host_bucket = %(bucket)s.host.yourdomain
host_base = host.yourdomain

$ s3cmd -c s3cfg-ceph ls s3://test/148671665
2017-08-02 21:39 18218   s3://test/1486716654.15214271.docx.gpg.97
2017-08-02 22:10 18218   s3://test/1486716654.15214271.docx.gpg.98
2017-08-02 22:48 18218   s3://test/1486716654.15214271.docx.gpg.99

I have not tried rclone or ACL futzing.


Sean Purdy
 
> I have opened an issue on s3cmd too
> 
> https://github.com/s3tools/s3cmd/issues/919
> 
> Thanks for your help
> 
> Yoann
> 
> > I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 
> > 1.5M files) with rclone, I'm able to list/copy files with rclone but
> > s3cmd does not work at all, it is just able to give the bucket list but I 
> > can't list files neither update ACL.
> > 
> > does anyone already test this ?
> > 
> > root@iccluster012:~# rclone --version
> > rclone v1.37
> > 
> > root@iccluster012:~# s3cmd --version
> > s3cmd version 2.0.0
> > 
> > 
> > ### rclone ls files ###
> > 
> > root@iccluster012:~# rclone ls testadmin:image-net/LICENSE
> >  1589 LICENSE
> > root@iccluster012:~#
> > 
> > nginx (as revers proxy) log :
> > 
> >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
> >> HTTP/1.1" 200 0 "-" "rclone/v1.37"
> >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET 
> >> /image-net?delimiter=%2F&max-keys=1024&prefix= HTTP/1.1" 200 779 "-" 
> >> "rclone/v1.37"
> > 
> > rgw logs :
> > 
> >> 2017-09-15 10:30:02.620266 7ff1f58f7700  1 == starting new request 
> >> req=0x7ff1f58f11f0 =
> >> 2017-09-15 10:30:02.622245 7ff1f58f7700  1 == req done 
> >> req=0x7ff1f58f11f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:02.622324 7ff1f58f7700  1 civetweb: 0x56061584b000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
> >> HTTP/1.0" 1 0 - rclone/v1.37
> >> 2017-09-15 10:30:02.623361 7ff1f50f6700  1 == starting new request 
> >> req=0x7ff1f50f01f0 =
> >> 2017-09-15 10:30:02.689632 7ff1f50f6700  1 == req done 
> >> req=0x7ff1f50f01f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:02.689719 7ff1f50f6700  1 civetweb: 0x56061585: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET 
> >> /image-net?delimiter=%2F&max-keys=1024&prefix= HTTP/1.0" 1 0 - rclone/v1.37
> > 
> > 
> > 
> > ### s3cmds ls files ###
> > 
> > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls 
> > s3://image-net/LICENSE
> > root@iccluster012:~#
> > 
> > nginx (as revers proxy) log :
> > 
> >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-"
> >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F&prefix=LICENSE 
> >> HTTP/1.1" 200 318 "-" "-"
> > 
> > rgw logs :
> > 
> >> 2017-09-15 10:30:04.295355 7ff1f48f5700  1 == starting new request 
> >> req=0x7ff1f48ef1f0 =
> >> 2017-09-15 10:30:04.295913 7ff1f48f5700  1 == req done 
> >> req=0x7ff1f48ef1f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:04.295977 7ff1f48f5700  1 civetweb: 0x560615855000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location 
> >> HTTP/1.0" 1 0 - -
> >> 2017-09-15 10:30:04.299303 7ff1f40f4700  1 == starting new request 
> >> req=0x7ff1f40ee1f0 =
> >> 2017-09-15 10:30:04.300993 7ff1f40f4700  1 == req done 
> >> req=0x7ff1f40ee1f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:04.301070 7ff1f40f4700  1 civetweb: 0x56061585a000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> /?delimiter=%2F&prefix=LICENSE HTTP/1.0" 1 0 - 
> > 
> > 
> > 
> > ### s3cmd : list bucket ###
> > 
> > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://
> > 2017-08-28 12:27  s3://image-net
> > roo

[ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-20 Thread Sean Purdy

Hi,


Luminous 12.2.0

Three node cluster, 18 OSD, debian stretch.


One node is down for maintenance for several hours.  When bringing it back up, 
OSDs rejoin after 5 minutes, but health is still warning.  monitor has not 
joined quorum after 40 minutes and logs show BADAUTHORIZER message every time 
the monitor tries to connect to the leader.

2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER

Then after ~45 minutes monitor *does* join quorum.

I'm presuming this isn't normal behaviour?  Or if it is, let me know and I 
won't worry.

All three nodes are using ntp and look OK timewise.


ceph-mon log:

(.43 is leader, .45 is rebooted node, .44 is other live node in quorum)

Boot:

2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0 
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), 
pid 2243

2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept connect_seq 3 vs existing csq=0 existing_state=STATE_CONNECTING
2017-09-20 09:46:01.824723 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept we reset (peer sent cseq 3, 0x5600722c.cseq = 0), sending 
RESETSESSION
2017-09-20 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
2017-09-20 09:46:01.828053 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x5600722c :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 l=0).process 
missed message?  skipped from seq 0 to 552717734

2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.43:6789/0 conn(0x5600720fe800 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=49261 cs=1 l=0).process 
missed message?  skipped from seq 0 to 1151972199
2017-09-20 09:46:05.581097 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER
2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER
...
[message repeats for 45 minutes]
...
2017-09-20 10:23:38.818767 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect
 got BADAUTHORIZER


At this point, "ceph mon stat" says .45/store03 not in quorum:

e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 376, leader 0 store01, quorum 0,1 store01,store02


Then suddenly a valid connection is made and sync happens:

2017-09-20 10:23:43.041009 7f49e5b2f700  1 mon.store03@2(synchronizing).mds e1 
Unable to load 'last_metadata'
2017-09-20 10:23:43.041967 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2381 e2381: 18 total, 13 up, 14 in
...
2017-09-20 10:23:43.045961 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2393 e2393: 18 total, 15 up, 15 in
...
2017-09-20 10:23:43.049255 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2406 e2406: 18 total, 18 up, 18 in
...
2017-09-20 10:23:43.054828 7f49e5b2f700  0 log_channel(cluster) log [INF] : 
mon.store03 calling new monitor election
2017-09-20 10:23:43.054901 7f49e5b2f700  1 mon.store03@2(electing).elector(372) 
init, last seen epoch 372


Now "ceph mon stat" says:

e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 378, leader 0 store01, quorum 0,1,2 store01,store02,store03

and everything's happy.


What should I look for/fix?  It's a fairly vanilla system.


Thanks in advance,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: FileStore vs BlueStore

2017-09-20 Thread Sean Purdy
On Wed, 20 Sep 2017, Burkhard Linke said:
> The main reason for having a journal with filestore is having a block device
> that supports synchronous writes. Writing to a filesystem in a synchronous
> way (e.g. including all metadata writes) results in a huge performance
> penalty.
> 
> With bluestore the data is also stored on a block devices, and thus also
> allows to perform synchronous writes directly (given the backing storage is
> handling sync writes correctly and in a consistent way, e.g. no drive
> caches, bbu for raid controllers/hbas). And similar to the filestore journal

Our Bluestore disks are hosted on RAID controllers.  Should I set cache policy 
as WriteThrough for these disks then?


Sean Purdy

> the bluestore wal/rocksdb partitions can be used to allow both faster
> devices (ssd/nvme) and faster sync writes (compared to spinners).
> 
> Regards,
> Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: FileStore vs BlueStore

2017-09-20 Thread Sean Purdy
On Wed, 20 Sep 2017, Burkhard Linke said:
> Hi,
> 
> 
> On 09/20/2017 12:24 PM, Sean Purdy wrote:
> >On Wed, 20 Sep 2017, Burkhard Linke said:
> >>The main reason for having a journal with filestore is having a block device
> >>that supports synchronous writes. Writing to a filesystem in a synchronous
> >>way (e.g. including all metadata writes) results in a huge performance
> >>penalty.
> >>
> >>With bluestore the data is also stored on a block devices, and thus also
> >>allows to perform synchronous writes directly (given the backing storage is
> >>handling sync writes correctly and in a consistent way, e.g. no drive
> >>caches, bbu for raid controllers/hbas). And similar to the filestore journal
> >Our Bluestore disks are hosted on RAID controllers.  Should I set cache 
> >policy as WriteThrough for these disks then?
> 
> It depends on the setup and availability of a BBU. If you have a BBU and
> cache on the controller, using write back should be ok if you monitor the
> BBU state. To be on the safe side is using write through and live with the
> performance impact.

We do have BBU and cache and we do monitor state.  Thanks!

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-21 Thread Sean Purdy
On Wed, 20 Sep 2017, Gregory Farnum said:
> That definitely sounds like a time sync issue. Are you *sure* they matched
> each other?

NTP looked OK at the time.  But see below.


> Is it reproducible on restart?

Today I did a straight reboot - and it was fine, no issues.


The issue occurs after the machine is off for a number of hours, or has been 
worked on in the BIOS for a number of hours and then booted.  And then perhaps 
waited at the disk decrypt key prompt.

So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
machines)


Logs show a time change a few seconds after boot.  After boot it's running NTP 
and within that 45 minute period the NTP state looks the same as the other 
nodes in the (small) cluster.

How much drift is allowed between monitors?


Logs say:

Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24)
...
Sep 20 09:46:44 store03 systemd[1]: Time has been changed
Sep 20 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
0xdd6ca972.c694801d does not match aorg 00. from 
server@172.16.0.16 xmt 0xdd6ca974.0c5c18f

So system time was changed about 6 seconds after disks were unlocked/boot 
proceeded.  But there was still 45 minutes of monitor messages after that.  
Surely the time should have converged sooner than 45 minutes?



NTP from today, post-problem.  But ntpq at the time of the problem looked just 
as OK:

store01:~$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 47 ms

store02$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ sudo ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ ntpq -p
 remote   refid  st t when poll reach   delay   offset  jitter
==
+172.16.0.16 85.91.1.164  3 u  561 1024  3770.2870.554   0.914
+172.16.0.18 94.125.129.7 3 u  411 1024  3770.388   -0.331   0.139
*172.16.0.19 158.43.128.332 u  289 1024  3770.282   -0.005   0.103


Sean

 
> On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy  wrote:
> 
> >
> > Hi,
> >
> >
> > Luminous 12.2.0
> >
> > Three node cluster, 18 OSD, debian stretch.
> >
> >
> > One node is down for maintenance for several hours.  When bringing it back
> > up, OSDs rejoin after 5 minutes, but health is still warning.  monitor has
> > not joined quorum after 40 minutes and logs show BADAUTHORIZER message
> > every time the monitor tries to connect to the leader.
> >
> > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
> > l=0).handle_connect_reply connect got BADAUTHORIZER
> >
> > Then after ~45 minutes monitor *does* join quorum.
> >
> > I'm presuming this isn't normal behaviour?  Or if it is, let me know and I
> > won't worry.
> >
> > All three nodes are using ntp and look OK timewise.
> >
> >
> > ceph-mon log:
> >
> > (.43 is leader, .45 is rebooted node, .44 is other live node in quorum)
> >
> > Boot:
> >
> > 2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0
> > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process
> > (unknown), pid 2243
> >
> > 2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept connect_seq 3 vs existing csq=0 existing_state=STATE_CONNECTING
> > 2017-09-20 09:46:01.824723 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept we reset (peer sent cseq 3, 0x5600722c.cseq = 0), sending
> > RESETSESSION
> > 2017-09-20 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
> > 2017-09-20 09:46:01.828053 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x5600722c :-1
> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 l=0).process
> > missed message?  skipped from seq 0 to 552717734
> >
> > 2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789

Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-21 Thread Sean Purdy
On Thu, 21 Sep 2017, Marc Roos said:
>  
> 
> In my case it was syncing, and was syncing slowly (hour or so?). You 
> should see this in the log file. I wanted to report this, because my 
> store.db is only 200MB, and I guess you want your monitors up and 
> running quickly.

Well I wondered about that, but if it can't talk to the monitor quorum leader, 
it's not going to start copying data.

And no new files had been added to this test cluster.

 
> I also noticed that when the 3rd monitor left the quorum, ceph -s 
> command was slow timing out. Probably trying to connect to the 3rd 
> monitor, but why? When this monitor is not in quorum.

There's a setting for client timeouts.  I forget where.
 

Sean
 
 
 
 
 
> -Original Message-
> From: Sean Purdy [mailto:s.pu...@cv-library.co.uk] 
> Sent: donderdag 21 september 2017 12:02
> To: Gregory Farnum
> Cc: ceph-users
> Subject: Re: [ceph-users] monitor takes long time to join quorum: 
> STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER
> 
> On Wed, 20 Sep 2017, Gregory Farnum said:
> > That definitely sounds like a time sync issue. Are you *sure* they 
> > matched each other?
> 
> NTP looked OK at the time.  But see below.
> 
> 
> > Is it reproducible on restart?
> 
> Today I did a straight reboot - and it was fine, no issues.
> 
> 
> The issue occurs after the machine is off for a number of hours, or has 
> been worked on in the BIOS for a number of hours and then booted.  And 
> then perhaps waited at the disk decrypt key prompt.
> 
> So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
> machines)
> 
> 
> Logs show a time change a few seconds after boot.  After boot it's 
> running NTP and within that 45 minute period the NTP state looks the 
> same as the other nodes in the (small) cluster.
> 
> How much drift is allowed between monitors?
> 
> 
> Logs say:
> 
> Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
> Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24) 
> ...
> Sep 20 09:46:44 store03 systemd[1]: Time has been changed Sep 20 
> 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
> 0xdd6ca972.c694801d does not match aorg 00. from 
> server@172.16.0.16 xmt 0xdd6ca974.0c5c18f
> 
> So system time was changed about 6 seconds after disks were 
> unlocked/boot proceeded.  But there was still 45 minutes of monitor 
> messages after that.  Surely the time should have converged sooner than 
> 45 minutes?
> 
> 
> 
> NTP from today, post-problem.  But ntpq at the time of the problem 
> looked just as OK:
> 
> store01:~$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 47 ms
> 
> store02$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 63 ms
> 
> store03:~$ sudo ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 63 ms
> 
> store03:~$ ntpq -p
>  remote   refid  st t when poll reach   delay   offset  
> jitter
> 
> ==
> +172.16.0.16 85.91.1.164      3 u  561 1024  3770.2870.554   
> 0.914
> +172.16.0.18 94.125.129.7 3 u  411 1024  3770.388   -0.331   
> 0.139
> *172.16.0.19 158.43.128.332 u  289 1024  3770.282   -0.005   
> 0.103
> 
> 
> Sean
> 
>  
> > On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy  
> wrote:
> > 
> > >
> > > Hi,
> > >
> > >
> > > Luminous 12.2.0
> > >
> > > Three node cluster, 18 OSD, debian stretch.
> > >
> > >
> > > One node is down for maintenance for several hours.  When bringing 
> > > it back up, OSDs rejoin after 5 minutes, but health is still 
> > > warning.  monitor has not joined quorum after 40 minutes and logs 
> > > show BADAUTHORIZER message every time the monitor tries to connect 
> to the leader.
> > >
> > > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect got BADAUTHORIZER
> > >
> > > Then after ~45 minutes monitor *does* join quorum.
> > >
> > > I'm presuming this isn't normal behaviour?  Or if it is, let me know 
> 
> > > and I won't worry.
> > >
> > > All three nodes are using ntp and look OK timewise.
> > >
> > >
> &

Re: [ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-28 Thread Sean Purdy
On Thu, 28 Sep 2017, Matthew Vernon said:
> Hi,
> 
> TL;DR - the timeout setting in ceph-disk@.service is (far) too small - it
> needs increasing and/or removing entirely. Should I copy this to ceph-devel?

Just a note.  Looks like debian stretch luminous packages have a 10_000 second 
timeout:

from /lib/systemd/system/ceph-disk@.service

Environment=CEPH_DISK_TIMEOUT=1
ExecStart=/bin/sh -c 'timeout $CEPH_DISK_TIMEOUT flock 
/var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose --log-stdout 
trigger --sync %f'
 

Sean

> On 15/09/17 16:48, Matthew Vernon wrote:
> >On 14/09/17 16:26, Götz Reinicke wrote:
> >>After that, 10 OSDs did not came up as the others. The disk did not get
> >>mounted and the OSD processes did nothing … even after a couple of
> >>minutes no more disks/OSDs showed up.
> >
> >I'm still digging, but AFAICT it's a race condition in startup - in our
> >case, we're only seeing it if some of the filesystems aren't clean. This
> >may be related to the thread "Very slow start of osds after reboot" from
> >August, but I don't think any conclusion was reached there.
> 
> This annoyed me enough that I went off to find the problem :-)
> 
> On systemd-enabled machines[0] ceph disks are activated by systemd's
> ceph-disk@.service, which calls:
> 
> /bin/sh -c 'timeout 120 flock /var/lock/ceph-disk-$(basename %f)
> /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
> 
> ceph-disk trigger --sync calls ceph-disk activate which (among other things)
> mounts the osd fs (first in a temporary location, then in /var/lib/ceph/osd/
> once it's extracted the osd number from the fs). If the fs is unclean, XFS
> auto-recovers before mounting (which takes time - range 2-25s for our 6TB
> disks) Importantly, there is a single global lock file[1] so only one
> ceph-disk activate can be doing this at once.
> 
> So, each fs is auto-recovering one at at time (rather than in parallel), and
> once the elapsed time gets past 120s, timeout kills the flock, systemd kills
> the cgroup, and no more OSDs start up - we typically find a few fs mounted
> in /var/lib/ceph/tmp/mnt.. systemd keeps trying to start the remaining
> osds (via ceph-osd@.service), but their fs isn't in the correct place, so
> this never works.
> 
> The fix/workaround is to adjust the timeout value (edit the service file
> directly, or for style points write an override in /etc/systemd/system
> remembering you need a blank ExecStart line before your revised one).
> 
> Experimenting, one of our storage nodes with 60 6TB disks took 17m35s to
> start all its osds when started up with all fss dirty. So the current 120s
> is far too small (it's just about OK when all the osd fss are clean).
> 
> I think, though, that having the timeout at all is a bug - if something
> needs to time out under some circumstances, should it be at a lower layer,
> perhaps?
> 
> A couple of final points/asides, if I may:
> 
> ceph-disk trigger uses subprocess.communicate (via the command() function),
> which means it swallows the log output from ceph-disk activate (and only
> outputs it after that process finishes) - as well as producing confusing
> timestamps, this means that when systemd kills the cgroup, all the output
> from the ceph-disk activate command vanishes into the void. That made
> debugging needlessly hard. Better to let called processes like that output
> immediately?
> 
> Does each fs need mounting twice? could the osd be encoded in the partition
> label or similar instead?
> 
> Is a single global activation lock necessary? It slows startup down quite a
> bit; I see no reason why (at least in the one-osd-per-disk case) you
> couldn't be activating all the osds at once...
> 
> Regards,
> 
> Matthew
> 
> [0] I note, for instance, that /etc/init/ceph-disk.conf doesn't have the
> timeout, so presumably upstart systems aren't affected
> [1] /var/lib/ceph/tmp/ceph-disk.activate.lock at least on Ubuntu
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE. ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD missing from part of osd crush tree

2017-09-29 Thread Sean Purdy
On Thu, 10 Aug 2017, John Spray said:
> On Thu, Aug 10, 2017 at 4:31 PM, Sean Purdy  wrote:
> > Luminous 12.1.1 rc

And 12.2.1 stable

> > We added a new disk and did:

> > That worked, created osd.18, OSD has data.
> >
> > However, mgr output at http://localhost:7000/servers showed
> > osd.18 under a blank hostname and not e.g. on the node we attached it to.
> 
> Don't worry about this part.  It's a mgr bug that it sometimes fails
> to pick up the hostname for a service
> (http://tracker.ceph.com/issues/20887)
> 
> John

Thanks.  This still happens in 12.2.1 (I notice the bug isn't closed).  mgrs 
have been restarted. It is consistently the same OSD that mgr can't find a 
hostname for.  I'd have thought if it were a race condition, then different 
OSDs would show up detached.

Oh well, no biggie right now.


Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread Sean Purdy
Hi,


Is there any way that radosgw can ping something when a file is removed or 
added to a bucket?

Or use its sync facility to sync files to AWS/Google buckets?

Just thinking about backups.  What do people use for backups?  Been looking at 
rclone.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests

2017-10-19 Thread Sean Purdy
Are you using radosgw?  I found this page useful when I had a similar issue:

http://www.osris.org/performance/rgw.html


Sean

On Wed, 18 Oct 2017, Ольга Ухина said:
> Hi!
> 
> I have a problem with ceph luminous 12.2.1. It was upgraded from kraken,
> but I'm not sure if it was a problem in kraken.
> I have slow requests on different OSDs on random time (for example at
> night, but I don't see any problems at the time of problem with disks, CPU,
> there is possibility of network problem at night). During daytime I have
> not this problem.
> Almost all requests are nearly 30 seconds, so I receive warnings like this:
> 
> 2017-10-18 01:20:26.147758 mon.st3 mon.0 10.192.1.78:6789/0 22686 : cluster
> [WRN] Health check failed: 1 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:28.025315 mon.st3 mon.0 10.192.1.78:6789/0 22687 : cluster
> [WRN] overall HEALTH_WARN 1 slow requests are blocked > 32 sec
> 2017-10-18 01:20:32.166758 mon.st3 mon.0 10.192.1.78:6789/0 22688 : cluster
> [WRN] Health check update: 38 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:6789/0 22689 : cluster
> [WRN] Health check update: 49 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:38.727421 osd.23 osd.23 10.192.1.158:6840/3659 1758 :
> cluster [WRN] 27 slow requests, 5 included below; oldest blocked for >
> 30.839843 secs
> 2017-10-18 01:20:38.727425 osd.23 osd.23 10.192.1.158:6840/3659 1759 :
> cluster [WRN] slow request 30.814060 seconds old, received at 2017-10-18
> 01:20:07.913300: osd_op(client.12464272.1:56610561 31.410dd55
> 5 31:aaabb082:::rbd_data.7b3e22ae8944a.00012e2c:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 2977792~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 39
> 2017-10-18 01:20:38.727431 osd.23 osd.23 10.192.1.158:6840/3659 1760 :
> cluster [WRN] slow request 30.086589 seconds old, received at 2017-10-18
> 01:20:08.640771: osd_repop(client.12464806.1:17326170 34.242
> e10926/10860 34:426def95:::rbd_data.acdc9238e1f29.1231:head v
> 10926'4976910) currently write_thread_in_journal_buffer
> 2017-10-18 01:20:38.727433 osd.23 osd.23 10.192.1.158:6840/3659 1761 :
> cluster [WRN] slow request 30.812569 seconds old, received at 2017-10-18
> 01:20:07.914791: osd_repop(client.12464272.1:56610570 31.1eb
> e10926/10848 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head v
> 10926'135331) currently write_thread_in_journal_buffer
> 2017-10-18 01:20:38.727436 osd.23 osd.23 10.192.1.158:6840/3659 1762 :
> cluster [WRN] slow request 30.807328 seconds old, received at 2017-10-18
> 01:20:07.920032: osd_op(client.12464272.1:56610586 31.3f2f2e2
> 6 31:6474f4fc:::rbd_data.7b3e22ae8944a.00013673:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 12288~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 30
> 2017-10-18 01:20:38.727438 osd.23 osd.23 10.192.1.158:6840/3659 1763 :
> cluster [WRN] slow request 30.807253 seconds old, received at 2017-10-18
> 01:20:07.920107: osd_op(client.12464272.1:56610588 31.2d23291
> 8 31:1894c4b4:::rbd_data.7b3e22ae8944a.00013a5b:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 700416~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 28
> 2017-10-18 01:20:38.006142 osd.39 osd.39 10.192.1.159:6808/3323 1501 :
> cluster [WRN] 2 slow requests, 2 included below; oldest blocked for >
> 30.092091 secs
> 2017-10-18 01:20:38.006153 osd.39 osd.39 10.192.1.159:6808/3323 1502 :
> cluster [WRN] slow request 30.092091 seconds old, received at 2017-10-18
> 01:20:07.913962: osd_op(client.12464272.1:56610570 31.e683e9e
> b 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 143360~4096]
> snapc 0=[] ondisk+write e10926) currently op_applied
> 2017-10-18 01:20:38.006159 osd.39 osd.39 10.192.1.159:6808/3323 1503 :
> cluster [WRN] slow request 30.086123 seconds old, received at 2017-10-18
> 01:20:07.919930: osd_op(client.12464272.1:56610587 31.e683e9eb
> 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head [set-alloc-hint
> object_size 4194304 write_size 4194304,write 3256320~4096] snapc 0=[]
> ondisk+write e10926) currently op_applied
> 2017-10-18 01:20:38.374091 osd.38 osd.38 10.192.1.159:6857/236992 1387 :
> cluster [WRN] 2 slow requests, 2 included below; oldest blocked for >
> 30.449318 secs
> 2017-10-18 01:20:38.374107 osd.38 osd.38 10.192.1.159:6857/236992 1388 :
> cluster [WRN] slow request 30.449318 seconds old, received at 2017-10-18
> 01:20:07.924670: osd_op(client.12464272.1:56610603 31.fe179bed
> 31:b7d9e87f:::rbd_data.7b3e22ae8944a.00013a60:head [set-alloc-hint
> object_size 4194304 write_size 4194304,write 143360~4096] snapc 0=[]
> ondisk+write e10926) currently op_applied
> 
> 
> How can I determine the reason of problem? Should I only adjust
> osd_op_com

[ceph-users] collectd doesn't push all stats

2017-10-20 Thread Sean Purdy
Hi,


The default collectd ceph plugin seems to parse the output of "ceph daemon 
 perf dump" and generate graphite output.  However, I see more 
fields in the dump than in collectd/graphite

Specifically I see get stats for rgw (ceph_rate-Client_rgw_nodename_get) but 
not put stats (e.g. ceph_rate-Client_rgw_nodename_put)

e.g. (abbreviated) dump says:
{
"client.rgw.store01": {
"req": 164927606,
"failed_req": 43482,
"get": 162727054,
"put": 917996,
}
}
but put stats don't show up.

Anybody know how to tweak the plugin to select the stats you want to see?  e.g. 
monitor paxos stuff doesn't show up either.  Perhaps there's a deliberate 
limitation somewhere, but it seems odd to show "get" and not "put" request 
rates.

(collectd 5.7.1 on debian stretch, ceph luminous 12.2.1)


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy
Hi,


http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object notifications 
are not supported.  I'd like something like object notifications so that we can 
backup new objects in realtime, instead of trawling the whole object list for 
what's changed.

Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
which traps requests and updates redis - 
https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy
On Tue, 28 Nov 2017, Yehuda Sadeh-Weinraub said:
> rgw has a sync modules framework that allows you to write your own
> sync plugins. The system identifies objects changes and triggers

I am not a C++ developer though.

http://ceph.com/rgw/new-luminous-rgw-metadata-search/ says

"Stay tuned in future releases for sync plugins that replicate data to (or even 
from) cloud storage services like S3!"

But then it looks like you wrote that blog post!  I guess I'll stay tuned


Sean


> callbacks that can then act on those changes. For example, the
> metadata search feature that was added recently is using this to send
> objects metadata into elasticsearch for indexing.
> 
> Yehuda
> 
> On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
> > Hi,
> >
> >
> > http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
> > notifications are not supported.  I'd like something like object 
> > notifications so that we can backup new objects in realtime, instead of 
> > trawling the whole object list for what's changed.
> >
> > Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
> > which traps requests and updates redis - 
> > https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?
> >
> >
> > Thanks,
> >
> > Sean Purdy
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Efficient deletion of large radosgw buckets

2018-02-15 Thread Sean Purdy

Hi,

I have a few radosgw buckets with millions or tens of millions of objects.  I 
would like to delete these entire buckets.

Is there a way to do this without ceph rebalancing as it goes along?

Is there anything better than just doing:

radosgw-admin bucket rm --bucket=test --purge-objects --bypass-gc


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Efficient deletion of large radosgw buckets

2018-02-16 Thread Sean Purdy
Thanks David.


> purging the objects and bypassing the GC is definitely the way to go

Cool.

> What rebalancing do you expect to see during this operation that you're 
> trying to avoid

I think I just have a poor understanding or wasn't thinking very hard :)  I 
suppose the question really was "are there any performance implications in 
deleting large buckets that I should be aware of?".  So, no really.  Just will 
take a while.

The actual cluster is small and balanced with free space.  Buckets are not 
customer-facing.


Thanks for the advice,

Sean


On Thu, 15 Feb 2018, David Turner said:
> Which is more important to you?  Deleting the bucket fast or having the
> used space become available?  If deleting the bucket fast is the priority,
> then you can swamp the GC by multithreading object deletion from the bucket
> with python or something.  If having everything deleted and cleaned up from
> the cluster is the priority (which is most likely the case), then what you
> have there is the best option.  If you want to do it in the background away
> from what the client can see, then you can change the ownership of the
> bucket so they no longer see it and then take care of the bucket removal in
> the background, but purging the objects and bypassing the GC is definitely
> the way to go. ... It's just really slow.
> 
> I just noticed that your question is about ceph rebalancing.  What
> rebalancing do you expect to see during this operation that you're trying
> to avoid?  I'm unaware of any such rebalancing (unless it might be the new
> automatic OSD rebalancing mechanism in Luminous to keep OSDs even... but
> deleting data shouldn't really trigger that if the cluster is indeed
> balanced).
> 
> On Thu, Feb 15, 2018 at 9:13 AM Sean Purdy  wrote:
> 
> >
> > Hi,
> >
> > I have a few radosgw buckets with millions or tens of millions of
> > objects.  I would like to delete these entire buckets.
> >
> > Is there a way to do this without ceph rebalancing as it goes along?
> >
> > Is there anything better than just doing:
> >
> > radosgw-admin bucket rm --bucket=test --purge-objects --bypass-gc
> >
> >
> > Thanks,
> >
> > Sean Purdy
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous 12.2.6 release date?

2018-07-10 Thread Sean Purdy
While we're at it, is there a release date for 12.2.6?  It fixes a 
reshard/versioning bug for us.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-10 Thread Sean Purdy
Hi Sean,

On Tue, 10 Jul 2018, Sean Redmond said:
> Can you please link me to the tracker 12.2.6 fixes? I have disabled
> resharding in 12.2.5 due to it running endlessly.

http://tracker.ceph.com/issues/22721


Sean
 
> Thanks
> 
> On Tue, Jul 10, 2018 at 9:07 AM, Sean Purdy 
> wrote:
> 
> > While we're at it, is there a release date for 12.2.6?  It fixes a
> > reshard/versioning bug for us.
> >
> > Sean
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovering from broken sharding: fill_status OVER 100%

2018-08-07 Thread Sean Purdy
Hi,


On my test servers, I created a bucket using 12.2.5, turned on versioning, 
uploaded 100,000 objects, and the bucket broke, as expected.  Autosharding said 
it was running but didn't complete.

Then I upgraded that cluster to 12.2.7.  Resharding seems to have finished, but 
now that cluster says it has *300,000* objects, instead of 100,000.  But an S3 
list shows 100,000 objects.

How do I fix this?  We have a production cluster that has a similar bucket.

I have tried both "bucket check" and "bucket check --check-objects" and they 
just return []


$ /usr/local/bin/aws --endpoint-url http://test/ --profile test s3 ls 
s3://test2/ | wc -l
13

$ sudo radosgw-admin bucket limit check
[
{
"user_id": "test",
"buckets": [
...
{
"bucket": "test2",
"tenant": "",
"num_objects": 300360,
"num_shards": 2,
"objects_per_shard": 150180,
"fill_status": "OVER 100.00%"
}
]
}
]

$ sudo radosgw-admin reshard status --bucket test2
[
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
}
]


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bucket limit check is 3x actual objects after autoreshard/upgrade

2018-08-22 Thread Sean Purdy
Hi,


I was testing versioning and autosharding in luminous 12.2.5 upgrading to 
12.2.7  I wanted to know if the upgraded autosharded bucket is still usable.  
Looks like it is, but a bucket limit check seems to show too many objects.


On my test servers, I created a bucket using 12.2.5, turned on versioning and 
autosharding, uploaded 100,000 objects, and bucket uploads hung, as is known.  
Autosharding said it was running but didn't complete.

Then I upgraded that cluster to 12.2.7.  Resharding seems to have finished, 
(two shards), but "bucket limit check" says there are 300,000 objects, 150k per 
shard, and gives a "fill_status OVER 100%" message.

But an "s3 ls" shows 100k objects in the bucket. And a "rados ls" shows 200k 
objects, two per file, one has file data and one is empty.

e.g. for file TEST.89488
$ rados ls -p default.rgw.buckets.index | grep TEST.89488\$
a7fb3a0d-e0a4-401c-b7cb-dbc535f3c1af.114156.2_TEST.89488 (empty)
a7fb3a0d-e0a4-401c-b7cb-dbc535f3c1af.114156.2__:ZuP3m9XRFcarZYrLGTVd8rcOksWkGBr_TEST.89488
 (has data)

Both "bucket check" and "bucket check --check-objects" just return []


How should I go about fixing this?  The bucket *seems* functional, and I don't 
*think* there are extra objects, but the index check thinks there is?  How do I 
find out what the index actually says?  Or whether there really are extra files 
that need removing.


Thanks for any ideas or pointers.


Sean

$ /usr/local/bin/aws --endpoint-url http://test-cluster/ --profile test s3 ls 
s3://test2/ | wc -l
13

$ sudo rados ls -p default.rgw.buckets.index | grep -c TEST
200133

$ sudo radosgw-admin bucket limit check
[
{
"user_id": "test",
"buckets": [
...
{
"bucket": "test2",
"tenant": "",
"num_objects": 300360,
"num_shards": 2,
"objects_per_shard": 150180,
"fill_status": "OVER 100.00%"
}
]
}
]

$ sudo radosgw-admin reshard status --bucket test2
[
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
}
]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

2018-09-05 Thread Sean Purdy
On Wed,  5 Sep 2018, John Spray said:
> On Wed, Sep 5, 2018 at 8:38 AM Marc Roos  wrote:
> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And I
> > also read somewhere that is bad to have your cluster for a long time in
> > an HEALTH_ERR state.
> >
> > But why is this bad?

See https://ceph.com/community/new-luminous-pg-overdose-protection
under "Problems with past intervals"

"if the cluster becomes unhealthy, and especially if it remains unhealthy for 
an extended period of time, a combination of effects can cause problems."

"If a cluster is unhealthy for an extended period of time (e.g., days or even 
weeks), the past interval set can become large enough to require a significant 
amount of memory."


Sean
 
> Aside from the obvious (errors are bad things!), many people have
> external monitoring systems that will alert them on the transitions
> between OK/WARN/ERR.  If the system is stuck in ERR for a long time,
> they are unlikely to notice new errors or warnings.  These systems can
> accumulate faults without the operator noticing.
> 
> > Why is this bad during upgrading?
> 
> It depends what's gone wrong.  For example:
>  - If your cluster is degraded (fewer than desired number of replicas
> of data) then taking more services offline (even briefly) to do an
> upgrade will create greater risk to the data by reducing the number of
> copies available.
> - If your system is in an error state because something has gone bad
> on disk, then recovering it with the same software that wrote the data
> is a more tested code path than running some newer code against a
> system left in a strange state by an older version.
> 
> There will always be exceptions to this (e.g. where the upgrade is the
> fix for whatever caused the error), but the general purpose advice is
> to get a system nice and clean before starting the upgrade.
> 
> John
> 
> > Can I quantify how bad it is? (like with large log/journal file?)
> >
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fixing a 12.2.5 reshard

2018-09-06 Thread Sean Purdy
Hi,


We were on 12.2.5 when a bucket with versioning and 100k objects got stuck when 
autoreshard kicked in.  We could download but not upload files.  But upgrading 
to 12.2.7 then running bucket check now shows twice as many objects, according 
to bucket limit check.  How do I fix this?


Sequence:

12.2.5 autoshard happened, "radosgw-admin reshard list" showed a reshard 
happening but no action.
12.2.7 upgrade went fine, didn't fix anything straightaway. "radosgw-admin 
reshard list" same.  Still no file uploads.  bucket limit check showed 100k 
files in the bucket as expected, and no shards.
Ran "radosgw-admin bucket check --fix"

Now "reshard list" shows no reshards in progress, but bucket limit check shows 
200k files in two shards, 100k per shard.  It should be half this.


The output of "bucket check --fix" has 
existing_header: "num_objects": 203344 for "rgw.main"
calculated_header: "num_objects": 101621

Shouldn't it install the calculated_header?



Before:

$ sudo radosgw-admin reshard list
[
  {

"tenant": "",
"bucket_name": "static",
"bucket_id": "a5501bce-1360-43e3-af08-8f3d1e102a79.3475308.1",
"new_instance_id": "static:a5501bce-1360-43e3-af08-8f3d1e102a79.3620665.1",
"old_num_shards": 1,
"new_num_shards": 2
  }
]

$ sudo radosgw-admin bucket limit check
{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 101621,
"num_shards": 0,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}

Output from bucket check --fix

{
"existing_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 37615290807,
"size_actual": 38017675264,
"size_utilized": 0,
"size_kb": 36733683,
"size_kb_actual": 37126636,
"size_kb_utilized": 0,
"num_objects": 203344
}
}
},
"calculated_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 18796589005,
"size_actual": 18997686272,
"size_utilized": 18796589005,
"size_kb": 18356044,
"size_kb_actual": 18552428,
"size_kb_utilized": 18356044,
"num_objects": 101621
}
}
}
}

After:

{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 203242,
"num_shards": 2,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Release for production

2018-09-07 Thread Sean Purdy
On Fri,  7 Sep 2018, Paul Emmerich said:
> Mimic

Unless you run debian, in which case Luminous.

Sean
 
> 2018-09-07 12:24 GMT+02:00 Vincent Godin :
> > Hello Cephers,
> > if i had to go for production today, which release should i choose :
> > Luminous or Mimic ?
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic packages not available for Ubuntu Trusty

2018-09-19 Thread Sean Purdy
I doubt it - Mimic needs gcc v7 I believe, and Trusty's a bit old for that.  
Even the Xenial releases aren't straightforward and rely on some backported 
packages.


Sean, missing Mimic on debian stretch

On Wed, 19 Sep 2018, Jakub Jaszewski said:
> Hi Cephers,
> 
> Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only
> ceph-deploy.
> https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/
> 
> Thanks
> Jakub

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't remove DeleteMarkers in rgw bucket

2018-09-20 Thread Sean Purdy
Hi,


We have a bucket that we are trying to empty.  Versioning and lifecycle was 
enabled.  We deleted all the objects in the bucket.  But this left a whole 
bunch of Delete Markers.

aws s3api delete-object --bucket B --key K --version-id V is not deleting the 
delete markers.

Any ideas?  We want to delete the bucket so we can reuse the bucket name.  
Alternatively, is there a way to delete a bucket that still contains delete 
markers?


$ aws --profile=owner s3api list-object-versions --bucket bucket --prefix 
0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7

{
  "DeleteMarkers": [
{
  "Owner": {
"DisplayName": "bucket owner",
"ID": "owner"
  },
  "IsLatest": true,
  "VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd",
  "Key": "0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7",
  "LastModified": "2018-09-17T16:19:58.187Z"
}
  ]
}

$ aws --profile=owner s3api delete-object --bucket bucket --key 
0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7 --version-id 
ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd

returns 0 but the delete marker remains.


This bucket was created in 12.2.2, current version of ceph is 12.2.7 via 12.2.5


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Manually deleting an RGW bucket

2018-09-28 Thread Sean Purdy


Hi,


How do I delete an RGW/S3 bucket and its contents if the usual S3 API commands 
don't work?

The bucket has S3 delete markers that S3 API commands are not able to remove, 
and I'd like to reuse the bucket name.  It was set up for versioning and 
lifecycles under ceph 12.2.5 which broke the bucket when a reshard happened.  
12.2.7 allowed me to remove the regular files but not the delete markers.

There must be a way of removing index files and so forth through rados commands.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Manually deleting an RGW bucket

2018-10-01 Thread Sean Purdy
On Sat, 29 Sep 2018, Konstantin Shalygin said:
> > How do I delete an RGW/S3 bucket and its contents if the usual S3 API 
> > commands don't work?
> > 
> > The bucket has S3 delete markers that S3 API commands are not able to 
> > remove, and I'd like to reuse the bucket name.  It was set up for 
> > versioning and lifecycles under ceph 12.2.5 which broke the bucket when a 
> > reshard happened.  12.2.7 allowed me to remove the regular files but not 
> > the delete markers.
> > 
> > There must be a way of removing index files and so forth through rados 
> > commands.
> 
> 
> What error actually is?
> 
> For delete bucket you should delete all bucket objects ("s3cmd rm -rf
> s3://bucket/") and multipart uploads.


No errors, but I can't remove delete markers from the versioned bucket.


Here's the bucket:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3 ls s3://mybucket/

(no objects returned)

Try removing the bucket:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3 rb s3://mybucket/
remove_bucket failed: s3://mybucket/ An error occurred (BucketNotEmpty) when 
calling the DeleteBucket operation: Unknown

So the bucket is not empty.

List object versions:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api 
list-object-versions --bucket mybucket --prefix someprefix/0/0

Shows lots of delete markers from the versioned bucket:

{
"Owner": {
"DisplayName": "mybucket bucket owner", 
"ID": "mybucket"
}, 
"IsLatest": true, 
"VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd", 
"Key": "someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7", 
}
 
Let's try removing that delete marker object:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api delete-object 
--bucket mybucket --key someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7 
--version-id ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd

Returns 0, has it worked?

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api 
list-object-versions --bucket mybucket --prefix 
someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7

No:

"DeleteMarkers": [
{
"Owner": {
"DisplayName": "static bucket owner", 
"ID": "static"
}, 
"IsLatest": true, 
"VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd", 
"Key": "candidate-photo/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7", 
"LastModified": "2018-09-17T16:19:58.187Z"
}
]


So how do I get rid of the delete markers to empty the bucket?  This is my 
problem.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw lifecycle not removing delete markers

2018-10-15 Thread Sean Purdy
Hi,


Versions 12.2.7 and 12.2.8.  I've set up a bucket with versioning enabled and 
upload a lifecycle configuration.  I upload some files and delete them, 
inserting delete markers.  The configured lifecycle DOES remove the deleted 
binaries (non current versions).  The lifecycle DOES NOT remove the delete 
markers.  With ExpiredObjectDeleteMarker set.

Is this a known issue?  I have an empty bucket full of delete markers.

Does this lifecycle do what I expect?  Remove the non-current version after a 
day, and remove orphaned delete markers:

{
"Rules": [
{
"Status": "Enabled", 
"Prefix": "", 
"NoncurrentVersionExpiration": {
"NoncurrentDays": 1
}, 
"Expiration": {
"ExpiredObjectDeleteMarker": true
}, 
"ID": "Test expiry"
}
]
}


I can't be the only one who wants to use this feature.

Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Sean Purdy
On Wed,  7 Mar 2018, Wei Jin said:
> Same issue here.
> Will Ceph community support Debian Jessie in the future?

Seems odd to stop it right in the middle of minor point releases.  Maybe it was 
an oversight?  Jessie's still supported in Debian as oldstable and not even in 
LTS yet.


Sean

 
> On Mon, Mar 5, 2018 at 6:33 PM, Florent B  wrote:
> > Jessie is no more supported ??
> > https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages
> > only contains ceph-deploy package !
> >
> >
> > On 28/02/2018 10:24, Florent B wrote:
> >> Hi,
> >>
> >> Since yesterday, the "ceph-luminous" repository does not contain any
> >> package for Debian Jessie.
> >>
> >> Is it expected ?
> >>
> >> Thank you.
> >>
> >> Florent
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [rgw] civetweb behind haproxy doesn't work with absolute URI

2018-03-29 Thread Sean Purdy
We had something similar recently.  We had to disable "rgw dns name" in the end.


Sean

On Thu, 29 Mar 2018, Rudenko Aleksandr said:
> 
> Hi friends.
> 
> 
> I'm sorry, maybe it isn't bug, but i don't know how to solve this problem.
> 
> I know that absolute URIs are supported in civetweb and it works fine for me 
> without haproxy in the middle.
> 
> But if client send absolute URIs through reverse proxy(haproxy) to civetweb, 
> civetweb breaks connection without responce.
> 
> i set:
> 
> debug rgw = 20
> debug civetweb = 10
> 
> 
> but no any messgaes in civetweb logs(access, error) and in rgw logs.
> in tcpdump i only see as rgw closes connection after request with absolute 
> URI. Relative URIs in requests work fine with haproxy.
> 
> Client:
> Docker registry v2.6.2, s3 driver based on aws-sdk-go/1.2.4 (go1.7.6; linux; 
> amd64) uses absolute URI in requests.
> 
> s3 driver options of docker registry:
> 
>   s3:
> region: us-east-1
> bucket: docker
> accesskey: 'access_key'
> secretkey: 'secret_key'
> regionendpoint: http://storage.my-domain.ru
> secure: false
> v4auth: true
> 
> 
> ceph.conf for rgw instance:
> 
> [client]
> rgw dns name = storage.my-domain.ru
> rgw enable apis = s3, admin
> rgw dynamic resharding = false
> rgw enable usage log = true
> rgw num rados handles = 8
> rgw thread pool size = 256
> 
> [client.rgw.a]
> host = aj15
> keyring = /var/lib/ceph/radosgw/rgw.a.keyring
> rgw enable static website = true
> rgw frontends = civetweb 
> authentication_domain=storage.my-domain.ru 
> num_threads=128 port=0.0.0.0:7480 
> access_log_file=/var/log/ceph/civetweb.rgw.access.log 
> error_log_file=/var/log/ceph/civetweb.rgw.error.log
> debug rgw = 20
> debug civetweb = 10
> 
> 
> very simple haproxy.cfg:
> 
> global
> chroot /var/empty
> # /log is chroot path
> log /haproxy-log local2
> 
> pidfile /var/run/haproxy.pid
> 
> user haproxy
> group haproxy
> daemon
> 
> ssl-default-bind-options no-sslv3
> ssl-default-bind-ciphers 
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> ssl-dh-param-file /etc/pki/tls/dhparams.pem
> 
> defaults
> mode http
> log global
> 
> frontend s3
> 
> bind *:80
> bind *:443 ssl crt /etc/pki/tls/certs/s3.pem crt 
> /etc/pki/tls/certs/s3-buckets.pem
> 
> use_backend rgw
> 
> backend rgw
> 
> balance roundrobin
> 
> server a aj15:7480 check fall 1
> server a aj16:7480 check fall 1
> 
> 
> http haeder from tcpdump before and after haproxy:
> 
> GET http://storage.my-domain.ru/docker?max-keys=1&prefix= HTTP/1.1
> Host: storage.my-domain.ru
> User-Agent: aws-sdk-go/1.2.4 (go1.7.6; linux; amd64)
> Authorization: AWS4-HMAC-SHA256 
> Credential=user:u...@cloud.croc.ru/20180328/us-east-1/s3/aws4_request,
>  SignedHeaders=host;x-amz-content-sha256;x-amz-date, 
> Signature=10043867bbb2833d50f9fe16a6991436a5c328adc5042556ce1ddf1101ee2cb9
> X-Amz-Content-Sha256: 
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> X-Amz-Date: 20180328T111255Z
> Accept-Encoding: gzip
> 
> i don't understand how use haproxy and absolute URIs in requests(
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] London Ceph day yesterday

2018-04-20 Thread Sean Purdy
Just a quick note to say thanks for organising the London Ceph/OpenStack day.  
I got a lot out of it, and it was nice to see the community out in force.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a faster way of copy files to and from a rgw bucket?

2018-04-23 Thread Sean Purdy
On Sat, 21 Apr 2018, Marc Roos said:
> 
> I wondered if there are faster ways to copy files to and from a bucket, 
> like eg not having to use the radosgw? Is nfs-ganesha doing this faster 
> than s3cmd?

I find the go-based S3 clients e.g. rclone, minio mc, are a bit faster than the 
python-based ones, s3cmd, aws.


Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW bucket lifecycle policy vs versioning

2018-04-26 Thread Sean Purdy
Hi,

Both versioned buckets and lifecycle policies are implemented in ceph, and look 
useful.

But are lifecycle policies implemented for versioned buckets?  i.e. can I set a 
policy that will properly expunge all "deleted" objects after a certain time?  
i.e. objects where the delete marker is the latest version.  This is available 
in AWS for example.


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] The mystery of sync modules

2018-04-27 Thread Sean Purdy
Hi,


Mimic has a new feature, a cloud sync module for radosgw to sync objects to 
some other S3-compatible destination.

This would be a lovely thing to have here, and ties in nicely with object 
versioning and DR.  But I am put off by confusion and complexity with the whole 
multisite/realm/zone group/zone thing, and the docs aren't very forgiving, 
including a recommendation to delete all your data!

Is there a straightforward way to set up the additional zone for a sync module 
with a preexisting bucket?  Whether it's the elasticsearch metadata search or 
the cloud replication, setting up sync modules on your *current* buckets must 
be a FAQ or at least frequently desired option.

Do I need a top-level realm?  I'm not actually using multisite for two 
clusters, I just want to use sync modules.  If I do, how do I transition my 
current default realm and RGW buckets?

Any blog posts to recommend?

It's not a huge cluster, but it does include production data.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mgr dashboard differs from ceph status

2018-05-04 Thread Sean Purdy
I get this too, since I last rebooted a server (one of three).

ceph -s says:

  cluster:
id: a8c34694-a172-4418-a7dd-dd8a642eb545
health: HEALTH_OK

  services:
mon: 3 daemons, quorum box1,box2,box3
mgr: box3(active), standbys: box1, box2
osd: N osds: N up, N in
rgw: 3 daemons active

mgr dashboard says:

Overall status: HEALTH_WARN

MON_DOWN: 1/3 mons down, quorum box1,box3

I wasn't going to worry too much.  I'll check logs and restart an mgr then.

Sean

On Fri,  4 May 2018, John Spray said:
> On Fri, May 4, 2018 at 7:21 AM, Tracy Reed  wrote:
> > My ceph status says:
> >
> >   cluster:
> > id: b2b00aae-f00d-41b4-a29b-58859aa41375
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum ceph01,ceph03,ceph07
> > mgr: ceph01(active), standbys: ceph-ceph07, ceph03
> > osd: 78 osds: 78 up, 78 in
> >
> >   data:
> > pools:   4 pools, 3240 pgs
> > objects: 4384k objects, 17533 GB
> > usage:   53141 GB used, 27311 GB / 80452 GB avail
> > pgs: 3240 active+clean
> >
> >   io:
> > client:   4108 kB/s rd, 10071 kB/s wr, 27 op/s rd, 331 op/s wr
> >
> > but my mgr dashboard web interface says:
> >
> >
> > Health
> > Overall status: HEALTH_WARN
> >
> > PG_AVAILABILITY: Reduced data availability: 2563 pgs inactive
> >
> >
> > Anyone know why the discrepency? Hopefully the dashboard is very
> > mistaken! Everything seems to be operating normally. If I had 2/3 of my
> > pgs inactive I'm sure all of my rbd backing my VMs would be blocked etc.
> 
> A situation like this probably indicates that something is going wrong
> with the mon->mgr synchronisation of health state (it's all calculated
> in one place and the mon updates the mgr every few seconds).
> 
> 1. Look for errors in your monitor logs
> 2. You'll probably find that everything gets back in sync if you
> restart a mgr daemon
> 
> John
> 
> > I'm running ceph-12.2.4-0.el7.x86_64 on CentOS 7. Almost all filestore
> > except for one OSD which recently had to be replaced which I made
> > bluestore. I plan to slowly migrate everything over to bluestore over
> > the course of the next month.
> >
> > Thanks!
> >
> > --
> > Tracy Reed
> > http://tracyreed.org
> > Digital signature attached for your safety.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to configure s3 bucket acl so that one user's bucket is visible to another.

2018-05-09 Thread Sean Purdy
The other way to do it is with policies.

e.g. a bucket owned by user1, but read access granted to user2:

{ 
  "Version":"2012-10-17",
  "Statement":[
{
  "Sid":"user2 policy",
  "Effect":"Allow",
  "Principal": {"AWS": ["arn:aws:iam:::user/user2"]},
  "Action":["s3:GetObject","s3:ListBucket"],
  "Resource":[
"arn:aws:s3:::example1/*",
"arn:aws:s3:::example1"
  ]
}
  ]
}

And set the policy with:
$ s3cmd setpolicy policy.json s3://example1/
or similar.

user2 won't see the bucket in their list of buckets, but will be able to read 
and list the bucket in this case.

More at 
https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html


Sean


On Tue,  8 May 2018, David Turner said:
> Sorry I've been on vacation, but I'm back now.  The command I use to create
> subusers for a rgw user is...
> 
> radosgw-admin user create --gen-access-key --gen-secret --uid=user_a
> --display_name="User A"
> radosgw-admin subuser create --gen-access-key --gen-secret
> --access={read,write,readwrite,full} --key-type=s3 --uid=user_a
> --subuser=subuser_1
> 
> Now all buckets created by user_a (or a subuser with --access=full) can now
> be accessed by user_a and all user_a:subusers.  What you missed was
> changing the default subuser type from swift to s3.  --access=full is
> needed for any user needed to be able to create and delete buckets, the
> others are fairly self explanatory for what they can do inside of existing
> buckets.
> 
> There are 2 approaches to use with subusers depending on your use case.
> The first use case is what I use for buckets.  We create 1 user per bucket
> and create subusers when necessary.  Most of our buckets are used by a
> single service and that's all the service uses... so they get the keys for
> their bucket and that's it.  Subusers are create just for the single bucket
> that the original user is in charge of.
> 
> The second use case is where you want a lot of buckets accessed by a single
> set of keys, but you want multiple people to all be able to access the
> buckets.  In this case I would create a single user and use that user to
> create all of the buckets and then create the subusers for everyone to be
> able to access the various buckets.  Note that with this method you get no
> more granularity to settings other than subuser_2 only has read access to
> every bucket.  You can't pick and choose which buckets a subuser has write
> access to, it's all or none.  That's why I use the first approach and call
> it "juggling" keys because if someone wants access to multiple buckets,
> they have keys for each individual bucket as a subuser.
> 
> On Sat, May 5, 2018 at 6:28 AM Marc Roos  wrote:
> 
> >
> > This 'juggle keys' is a bit cryptic to me. If I create a subuser it
> > becomes a swift user not? So how can that have access to the s3 or be
> > used in a s3 client. I have to put in the client the access and secret
> > key, in the subuser I only have a secret key.
> >
> > Is this multi tentant basically only limiting this buckets namespace to
> > the tenants users and nothing else?
> >
> >
> >
> >
> >
> > -Original Message-
> > From: David Turner [mailto:drakonst...@gmail.com]
> > Sent: zondag 29 april 2018 14:52
> > To: Yehuda Sadeh-Weinraub
> > Cc: ceph-users@lists.ceph.com; Безруков Илья Алексеевич
> > Subject: Re: [ceph-users] How to configure s3 bucket acl so that one
> > user's bucket is visible to another.
> >
> > You can create subuser keys to allow other users to have access to a
> > bucket. You have to juggle keys, but it works pretty well.
> >
> >
> > On Sun, Apr 29, 2018, 4:00 AM Yehuda Sadeh-Weinraub 
> > wrote:
> >
> >
> > You can't. A user can only list the buckets that it owns, it cannot
> > list other users' buckets.
> >
> > Yehuda
> >
> > On Sat, Apr 28, 2018 at 11:10 AM, Безруков Илья Алексеевич
> >  wrote:
> > > Hello,
> > >
> > > How to configure s3 bucket acl so that one user's bucket is
> > visible to
> > > another.
> > >
> > >
> > > I can create a bucket, objects in it and give another user
> > access
> > to it.
> > > But another user does not see this bucket in the list of
> > available buckets.
> > >
> > >
> > > ## User1
> > >
> > > ```
> > > s3cmd -c s3cfg_user1 ls s3://
> > >
> > > 2018-04-28 07:50  s3://example1
> > >
> > > #set ACL
> > > s3cmd -c s3cfg_user1 setacl --acl-grant=all:user2 s3://example1
> > > s3://example1/: ACL updated
> > >
> > > # Check
> > > s3cmd -c s3cfg_user1 info s3://example1
> > > s3://example1/ (bucket):
> > >Location:  us-east-1
> > >Payer: BucketOwner
> > >Expiration Rule: none
> > >Policy:none
> > >CORS:  none
> > >ACL:   User1: FULL_CONTROL
> > 

Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-13 Thread Sean Purdy
On Wed, 13 Jun 2018, Fabian Grünbichler said:
> I hope we find some way to support Mimic+ for Stretch without requiring
> a backport of gcc-7+, although it unfortunately seems unlikely at this
> point.

Me too.  I picked ceph luminous on debian stretch because I thought it would be 
maintained going forwards, and we're a debian shop.  I appreciate Mimic is a 
non-LTS release, I hope issues of debian support are resolved by the time of 
the next LTS.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] luminous radosgw hung at logrotate time

2018-06-23 Thread Sean Purdy
Hi,


All our radosgw hung at logrotate time.  Logs show:

  ERROR: keystone revocation processing returned error r=-22

(we're not running keystone)

Killing radosgw manually and running manually fixed this - but systemctl 
commands did not.

We're running luminous 12.2.1 on debian stretch.  Is 
http://tracker.ceph.com/issues/22365 a fix for this?  (12.2.3)

In addition, systemctl start/stop/restart radosgw isn't working and I seem to 
have to run the radosgw command and options manually.


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pre-sharding s3 buckets

2018-06-29 Thread Sean Purdy
On Wed, 27 Jun 2018, Matthew Vernon said:
> Hi,
> 
> On 27/06/18 11:18, Thomas Bennett wrote:
> 
> > We have a particular use case that we know that we're going to be
> > writing lots of objects (up to 3 million) into a bucket. To take
> > advantage of sharding, I'm wanting to shard buckets, without the
> > performance hit of resharding.
> 
> I assume you're running Jewel (Luminous has dynamic resharding); you can
> set rgw_override_bucket_index_max_shards = X in your ceph.conf, which
> will cause all new buckets to have X shards for the indexes.
> 
> HTH,
> 
> Matthew

But watch out if you are running Luminous - manual and automatic
resharding breaks if you have versioning or lifecycles on your bucket.
Fix in next stable release 12.2.6 apparently.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023968.html
http://tracker.ceph.com/issues/23886


Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-mon and existing zookeeper servers

2017-05-23 Thread Sean Purdy
Hi,


This is my first ceph installation.  It seems to tick our boxes.  Will be
using it as an object store with radosgw.

I notice that ceph-mon uses zookeeper behind the scenes.  Is there a way to
point ceph-mon at an existing zookeeper cluster, using a zookeeper chroot?

Alternatively, might ceph-mon coexist peacefully with a different zookeeper
already on the same machine?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Object store backups

2017-05-23 Thread Sean Purdy
Hi,

Another newbie question.  Do people using radosgw mirror their buckets
to AWS S3 or compatible services as a backup?  We're setting up a
small cluster and are thinking of ways to mitigate total disaster.
What do people recommend?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-02-01 Thread Sean Purdy
On Fri, 1 Feb 2019 08:47:47 +0100
Wido den Hollander  wrote:

> 
> 
> On 2/1/19 8:44 AM, Abhishek wrote:
> > We are glad to announce the eleventh bug fix release of the Luminous
> > v12.2.x long term stable release series. We recommend that all users

> > * There have been fixes to RGW dynamic and manual resharding, which no
> > longer
> >   leaves behind stale bucket instances to be removed manually. For
> > finding and
> >   cleaning up older instances from a reshard a radosgw-admin command
> > `reshard
> >   stale-instances list` and `reshard stale-instances rm` should do the
> > necessary
> >   cleanup.
> > 
> 
> Great news! I hope this works! This has been biting a lot of people in
> the last year. I have helped a lot of people to manually clean this up,
> but it's great that this is now available as a regular command.
> 
> Wido

I hope so too, especially when bucket lifecycles and versioning is enabled.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v14.2.0 Nautilus released

2019-03-19 Thread Sean Purdy
Hi,


Will debian packages be released?  I don't see them in the nautilus repo.  I 
thought that Nautilus was going to be debian-friendly, unlike Mimic.


Sean

On Tue, 19 Mar 2019 14:58:41 +0100
Abhishek Lekshmanan  wrote:

> 
> We're glad to announce the first release of Nautilus v14.2.0 stable
> series. There have been a lot of changes across components from the
> previous Ceph releases, and we advise everyone to go through the release
> and upgrade notes carefully.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Lifecycle and dynamic resharding

2019-08-02 Thread Sean Purdy
Hi,

A while back I reported a bug in luminous where lifecycle on a versioned bucket 
wasn't removing delete markers.

I'm interested in this phrase in the pull request:

"you can't expect lifecycle to work with dynamic resharding enabled."

Why not?


https://github.com/ceph/ceph/pull/29122
https://tracker.ceph.com/issues/36512

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com