[ceph-users] Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
Hi,

I am currently dealing with a cluster that's been in use for 5 years and
during that time, has never had its radosgw usage log trimmed. Now that
the cluster has been upgraded to Nautilus (and has completed a full
deep-scrub), it is in a permanent state of HEALTH_WARN because of one
large omap object:

$ ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.usage'


As far as I can tell, there are two thresholds that can trigger that
warning:

* The default omap object size warning threshold,
osd_deep_scrub_large_omap_object_value_sum_threshold, is 1G.

* The default omap object key count warning threshold,
osd_deep_scrub_large_omap_object_key_threshold, is 20.


In this case, this was the original situation:

osd.6 [WRN] : Large omap object found. Object:
15:169282cd:::usage.20:head Key count: 5834118 Size (bytes): 917351868

So that's 5.8M keys (way above threshold) and 875 MiB total object size
(below threshold, but not by much).


The usage log in this case was no longer needed that far back, so I
trimmed it to keep only the entries from this year (radosgw-admin usage
trim --end-date 2018-12-31), a process that took upward of an hour.

After the trim (and a deep-scrub of the PG in question¹), my situation
looks like this:

osd.6 [WRN] Large omap object found. Object: 15:169282cd:::usage.20:head
Key count: 1185694 Size (bytes): 187061564

So both the key count and the total object size have diminished by about
80%, which is about what you expect when you trim 5 years of usage log
down to 1 year of usage log. However, my key count is still almost 6
times the threshold.


I am aware that I can silence the warning by increasing
osd_deep_scrub_large_omap_object_key_threshold by a factor of 10, but
that's not my question. My question is what I can do to prevent the
usage log from creating such large omap objects in the first place.

Now, there's something else that you should know about this radosgw,
which is that it is configured with the defaults for usage log sharding:

rgw_usage_max_shards = 32
rgw_usage_max_user_shards = 1

... and this cluster's radosgw is pretty much being used by a single
application user. So the fact that it's happy to shard the usage log 32
ways is irrelevant as long as it puts the usage log for one user all
into one shard.


So, I am assuming that if I bump rgw_usage_max_user_shards up to, say,
16 or 32, all *new* usage log entries will be sharded. But I am not
aware of any way to reshard the *existing* usage log. Is there such a
thing?

Otherwise, it seems like the only option in this situation would be to
clear the usage log altogether, and tweak the sharding knobs, which
should at least make the problem not reappear. Or, else, bump
osd_deep_scrub_large_omap_object_key_threshold and just live with the
large object.


Also, is anyone aware of any adverse side effects of increasing these
thresholds, and/or changing the usage log sharding settings, that I
should keep in mind here?

Thanks in advance for your thoughts.

Cheers,
Florian


¹For anyone reading this in the archives because they've run into the
same problem, and wondering how you find out which PGs in a pool have
too-large objects, here's a jq one-liner:

ceph --format=json pg ls-by-pool  \
  | jq '.pg_stats[]|select(.stat_sum.num_large_omap_objects>0)'
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
On 09/10/2019 09:07, Florian Haas wrote:
> Also, is anyone aware of any adverse side effects of increasing these
> thresholds, and/or changing the usage log sharding settings, that I
> should keep in mind here?

Sorry, I should have checked the latest in the list archives; Paul
Emmerich has just recently commented here on the threshold setting:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037087.html

So that one looks OK to bump, but the question with about resharding the
usage log still stands. (The untrimmed usage log, in my case, would have
blasted the old 2M keys threshold, too.)

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS no permissions for subdir

2019-10-09 Thread Lars Täuber
Hi!

Is it possible and if yes how to remove any permission to a subdir for a user.

I'd tried to make this:
ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws path=/XYZ, 
allow path=/ABC' osd 'allow rw pool=cephfs_data'

but got:
Error EINVAL: mds capability parse failed, stopped at ', allow path=/ABC' of 
'allow r, allow rws path=/XYZ, allow path=/ABC'

Thanks
Lars
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr Module "zabbix" cannot send Data

2019-10-09 Thread Ingo Schmidt
Thx for the hint. 
I fiddled around with the configuration and found this:

> root@vm-2:~# ceph zabbix  send
> Failed to send data to Zabbix

while

> root@vm-2:~# zabbix_sender -vv -z 192.168.15.253 -p 10051 -s vm-2 -k 
> ceph.num_osd -o 32
> zabbix_sender [1724513]: DEBUG: answer 
> [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds 
> spent: 0.41"}]
> info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.41"
> sent: 1; skipped: 0; total: 1

works just fine. I figured out that it could be a hostname mismatch betweend 
what "ceph zabbix send" transmits, and the hostname that is configured on the 
zabbix server. And well... it's almost embarassing that I missed this for about 
3 months now but:
The hostname the ceph zabbix module was submitting was in capital letters, 
while the hostname configured in zabbix was lowercase, even though, the 
hostname for that machine is in fact lowercase.
I don't know why the ceph zabbix module makes it uppercase. 

I configured the host on zabbix with capital letters and now it works...


kind regards
Ingo Schmidt 

IT-Department
Island municipality Langeoog 
with in-house operations
Tourismus Service and Schiffahrt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr Module "zabbix" cannot send Data

2019-10-09 Thread i . schmidt
Sorry, somehow my reply created a new thread. This message originally belongs 
here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z3DQN4RVZDP7ZEQTKXFQB6DTQZMJ5ONV/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is it possible to have a 2nd cephfs_data volume? [Openstack]

2019-10-09 Thread Jeremi Avenant
Good morning

Q: Is it possible to have a 2nd cephfs_data volume and exposing it to the
same openstack environment?

Reason being:

Our current profile is configured with erasure code value of k=3,m=1 (rack
level) but we looking to buy another +- 6PB of storage w/ controllers and
was thinking of moving to an erasure profile of k=2,m=1 since we're not so
big on data redundancy but more on disk space + performance.
For what I understand you can't change erasure profiles, therefor we need
to essentially build a new ceph cluster but we're trying to understand if
we can attach it to the existing openstack platform, then gradually move
all the data over from the old cluster into the new cluster, destroy the
old cluster and integrated it with the new one.

If anyone has any recommendations to get more space out + performance at
the cost of data redundancy with at least 1 rack please let me know as
well.

Regards
-- 




*Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
Inter-University Institute for Data Intensive Astronomy
5th Floor, Department of Physics and Astronomy,
University of Cape Town

Tel: 021 959 4137 <0219592327>
Web: www.idia.ac.za 
E-mail (IDIA): jer...@idia.ac.za 
Rondebosch, Cape Town, 7600
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS no permissions for subdir

2019-10-09 Thread Eugen Block

Hi,


I'd tried to make this:
ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws  
path=/XYZ, allow path=/ABC' osd 'allow rw pool=cephfs_data'


do you want to remove all permissions from path "/ABC"? If so you  
should simply remove that from the command:


ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rwx  
path=/XYZ' osd 'allow rw pool=cephfs_data'


I'm not aware of a 'rws' capability, using that also leads to an  
error. You should either set 'rw' or 'rwx' (or maybe it was just a  
typo?).


Regards,
Eugen


Zitat von Lars Täuber :


Hi!

Is it possible and if yes how to remove any permission to a subdir  
for a user.


I'd tried to make this:
ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws  
path=/XYZ, allow path=/ABC' osd 'allow rw pool=cephfs_data'


but got:
Error EINVAL: mds capability parse failed, stopped at ', allow  
path=/ABC' of 'allow r, allow rws path=/XYZ, allow path=/ABC'


Thanks
Lars
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS no permissions for subdir

2019-10-09 Thread Lars Täuber
Hi Eugen,

Wed, 09 Oct 2019 08:44:28 +
Eugen Block  ==> ceph-users@ceph.io :
> Hi,
> 
> > I'd tried to make this:
> > ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws  
> > path=/XYZ, allow path=/ABC' osd 'allow rw pool=cephfs_data'  
> 
> do you want to remove all permissions from path "/ABC"? If so you  
> should simply remove that from the command:
> 
> > ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rwx  
> > path=/XYZ' osd 'allow rw pool=cephfs_data'  

then the client has read permissions to path=/ABC.

> 
> I'm not aware of a 'rws' capability, using that also leads to an  
> error. You should either set 'rw' or 'rwx' (or maybe it was just a  
> typo?).

It is for being able to make snapshots in cephfs (not rgw or rbd). I get no 
error in nautilus.

Thanks,
Lars
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS no permissions for subdir

2019-10-09 Thread Eugen Block

then the client has read permissions to path=/ABC.


This is because of "mds 'allow r, allow rws ...". Remove the 'allow r'  
caps from mds section and then the client only gets read permissions  
to the specified paths.


It is for being able to make snapshots in cephfs (not rgw or rbd). I  
get no error in nautilus.


You're right, I was in the wrong lab cluster.

Zitat von Lars Täuber :


Hi Eugen,

Wed, 09 Oct 2019 08:44:28 +
Eugen Block  ==> ceph-users@ceph.io :

Hi,

> I'd tried to make this:
> ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws
> path=/XYZ, allow path=/ABC' osd 'allow rw pool=cephfs_data'

do you want to remove all permissions from path "/ABC"? If so you
should simply remove that from the command:

> ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rwx
> path=/XYZ' osd 'allow rw pool=cephfs_data'


then the client has read permissions to path=/ABC.



I'm not aware of a 'rws' capability, using that also leads to an
error. You should either set 'rw' or 'rwx' (or maybe it was just a
typo?).


It is for being able to make snapshots in cephfs (not rgw or rbd). I  
get no error in nautilus.


Thanks,
Lars
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it possible to have a 2nd cephfs_data volume? [Openstack]

2019-10-09 Thread Paul Emmerich
On Wed, Oct 9, 2019 at 10:45 AM Jeremi Avenant  wrote:

> Good morning
>
> Q: Is it possible to have a 2nd cephfs_data volume and exposing it to the
> same openstack environment?
>

yes, see documentation for cephfs layouts:
https://docs.ceph.com/docs/master/cephfs/file-layouts/


>
> Reason being:
>
> Our current profile is configured with erasure code value of k=3,m=1 (rack
> level) but we looking to buy another +- 6PB of storage w/ controllers and
> was thinking of moving to an erasure profile of k=2,m=1 since we're not so
> big on data redundancy but more on disk space + performance.
>

Your new configuration uses *more* space than the old one. Also, m=1 is a
bad idea.


> For what I understand you can't change erasure profiles, therefor we need
> to essentially build a new ceph cluster but we're trying to understand if
> we can attach it to the existing openstack platform, then gradually move
> all the data over from the old cluster into the new cluster, destroy the
> old cluster and integrated it with the new one.
>

No, you can just create a new directory in cephfs in the new pool, see
layout documentation linked above


>
> If anyone has any recommendations to get more space out + performance at
> the cost of data redundancy with at least 1 rack please let me know as
> well.
>

Depends on how many racks you have. Common erasure coding setups are 4+2
for low redundancy and something like 8+3 for higher redundancy.
I'd never run a production setup with x+1 (but I guess it does depend on
how much you care about availability vs. durability)


Paul


>
> Regards
> --
>
>
>
>
> *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
> Inter-University Institute for Data Intensive Astronomy
> 5th Floor, Department of Physics and Astronomy,
> University of Cape Town
>
> Tel: 021 959 4137 <0219592327>
> Web: www.idia.ac.za 
> E-mail (IDIA): jer...@idia.ac.za 
> Rondebosch, Cape Town, 7600
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr Module "zabbix" cannot send Data

2019-10-09 Thread Wido den Hollander



On 10/7/19 9:15 AM, i.schm...@langeoog.de wrote:
> Hi Folks
> 
> We are using Ceph as our storage backend on our 6 Node Proxmox VM Cluster. To 
> Monitor our systems we use Zabbix and i would like to get some Ceph Data into 
> our Zabbix to get some alarms when something goes wrong.
> 
> Ceph mgr has a module, "zabbix" that uses "zabbix-sender" to actively send 
> data, but i cannot get the module working. It always responds with "failed to 
> send data"
> 
> The network side seems to be fine:
> 
> root@vm-2:~# traceroute 192.168.15.253
> traceroute to 192.168.15.253 (192.168.15.253), 30 hops max, 60 byte packets
>  1  192.168.15.253 (192.168.15.253)  0.411 ms  0.402 ms  0.393 ms
> root@vm-2:~# nmap -p 10051 192.168.15.253
> Starting Nmap 7.70 ( https://nmap.org ) at 2019-09-18 08:40 CEST
> Nmap scan report for 192.168.15.253
> Host is up (0.00026s latency).
> 
> PORT  STATE SERVICE
> 10051/tcp open  zabbix-trapper
> MAC Address: BA:F5:30:EF:40:EF (Unknown)
> 
> Nmap done: 1 IP address (1 host up) scanned in 0.61 seconds
> root@vm-2:~# ceph zabbix config-show
> {"zabbix_port": 10051, "zabbix_host": "192.168.15.253", "identifier": "VM-2", 
> "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 60}
> root@vm-2:~#
> 

I recommend to NOT set the identifier to 'VM-2'.

I wrote the Zabbix module and a bit of background.

If you don't set the identifier the fsid (uuid) of the cluster will be used.

You now used 'VM-2', but it's not guaranteed that the ceph-mgr will run
on that host. If the Mgr fails over to a different host it will not be
'vm-2' sending the data.

Usually I just leave identifier empty and just use the fsid of the
cluster as hostname in Zabbix.

> But if i try "ceph zabbix send" i get "failed to send data to zabbix" and 
> this show up in the systems journal:
> Sep 18 08:41:13 vm-2 ceph-mgr[54445]: 2019-09-18 08:41:13.272 7fe360fe4700 -1 
> mgr.server reply reply (1) Operation not permitted
> 
> The log of ceph-mgr on that machine states:
> 2019-09-18 08:42:18.188 7fe359fd6700  0 mgr[zabbix] Exception when sending: 
> /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253392]: DEBUG: 
> answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; 
> seconds spent: 0.000179"}]
> 2019-09-18 08:43:18.217 7fe359fd6700  0 mgr[zabbix] Exception when sending: 
> /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253629]: DEBUG: 
> answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; 
> seconds spent: 0.000321"}]
> 
> I'm guessing, this could have something to do with user rights. But i have no 
> idea where to start to track this down.
> 
> Maybe someone here has a hint?
> If more information is needed, i will gladly provide it.
> 
> greetings
> Ingo
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Sick Nautilus cluster, OOM killing OSDs, lots of osdmaps

2019-10-09 Thread Aaron Johnson
Hi all

I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4.  Monthly 
OS patching and reboots that go along with it have resulted in the cluster 
getting very unwell.

Many of the servers in the cluster are OOM-killing the ceph-osd processes when 
they try to start.  (6 OSDs per server running on filestore.). Strace shows the 
ceph-osd processes are spending hours reading through the 220k osdmap files 
after being started.

This behavior started after we recently made it about 72% full to see how 
things behaved.  We also upgraded it to Nautilus 14.2.2 at about the same time.

I’ve tried starting just one OSD per server at a time in hopes of avoiding the 
OOM killer.  Also tried setting noin, rebooting the whole cluster, waiting a 
day, then marking each of the OSDs in manually.  The end result is the same 
either way.  About 60% of PGs are still down, 30% are peering, and the rest are 
in worse shape.

Anyone out there have suggestions about how I should go about getting this 
cluster healthy again?  Any ideas appreciated.

Thanks!

- Aaron
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr Module "zabbix" cannot send Data

2019-10-09 Thread i . schmidt
Thank you very much! This helps a lot!

I'm thinking if it is a good idea at all, to tie ceph data input to a specific 
host of that cluster in zabbix. I could try and set up a new host in zabbix 
called "Ceph", representing the cluster as a whole, just for the monitoring of 
ceph statuses, since ceph as a distributed system is independent of a single 
host like (in my case) vm-2.

I will try that...

greetings from the north sea
Ingo

IT-Department
Island municipality Langeoog
with in-house operations
Tourismus Service and Schiffahrt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr Module "zabbix" cannot send Data

2019-10-09 Thread Wido den Hollander



On 10/9/19 5:20 PM, i.schm...@langeoog.de wrote:
> Thank you very much! This helps a lot!
> 
> I'm thinking if it is a good idea at all, to tie ceph data input to a 
> specific host of that cluster in zabbix. I could try and set up a new host in 
> zabbix called "Ceph", representing the cluster as a whole, just for the 
> monitoring of ceph statuses, since ceph as a distributed system is 
> independent of a single host like (in my case) vm-2.
> 

The Zabbix module will run on the Active Manager (Mgr) which is not tied
to a specific host at all.

Ceph will choose one of the Mgr daemons to be the active on and that
will start the Zabbix module.

But yes, just create a host called 'ceph' with the fsid as hostname.
That is enough.

Wido

> I will try that...
> 
> greetings from the north sea
> Ingo
> 
> IT-Department
> Island municipality Langeoog
> with in-house operations
> Tourismus Service and Schiffahrt
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Sick Nautilus cluster, OOM killing OSDs, lots of osdmaps

2019-10-09 Thread Sage Weil
[adding dev]

On Wed, 9 Oct 2019, Aaron Johnson wrote:
> Hi all
> 
> I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4.  
> Monthly OS patching and reboots that go along with it have resulted in 
> the cluster getting very unwell.
> 
> Many of the servers in the cluster are OOM-killing the ceph-osd 
> processes when they try to start.  (6 OSDs per server running on 
> filestore.). Strace shows the ceph-osd processes are spending hours 
> reading through the 220k osdmap files after being started.

Is the process size growing during this time?  There should be a cap to 
the size of the OSDMap cache; perhaps there is a regression there.

One common thing to do here is 'ceph osd set noup' and restart the OSD, 
and then monitor the OSD's progress catching up on maps with 'ceph daemon 
osd.NN status' (compare the epoch to what you get from 'ceph osd dump | 
head').  This will take a while if you are really 220k maps (!!!) behind,
but the memory usage during that period should be relatively constant.


> This behavior started after we recently made it about 72% full to see 
> how things behaved.  We also upgraded it to Nautilus 14.2.2 at about the 
> same time.
> 
> I’ve tried starting just one OSD per server at a time in hopes of 
> avoiding the OOM killer.  Also tried setting noin, rebooting the whole 
> cluster, waiting a day, then marking each of the OSDs in manually.  The 
> end result is the same either way.  About 60% of PGs are still down, 30% 
> are peering, and the rest are in worse shape.

Usually in instances like this in the past, getting all OSDs to catch up 
on maps and then unsetting 'noup' will let them all come up and peer at 
the same time.  But usually what has happened is many of the OSDs are not 
caught up and it's not immediately obvious, so PGs don't peer.  So setting 
noup and waiting for all osds to be caught up (as per 'ceph daemon osd.NNN 
status') first generally helps.

But none of that explains why you're seeing OOM, so I'm curious what you 
see with memory usage while OSDs are catching up...

Thanks!
sage

 > 
> Anyone out there have suggestions about how I should go about getting 
> this cluster healthy again?  Any ideas appreciated.
> 
> Thanks!
> 
> - Aaron
> ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can't Modify Zone

2019-10-09 Thread Mac Wynkoop
When trying to modify a zone in one of my clusters to promote it to the
master zone, I get this error:

~ $ radosgw-admin zone modify --rgw-zone atl --master
failed to update zonegroup: 2019-10-09 15:41:53.409 7f9ecae26840  0 ERROR:
found existing zone name atl (94d26f94-d64c-40d1-9a33-56afa948d86a) in
zonegroup seast
(17) File exists
 ~ $

Anyone have any ideas what's going on here?

Thanks all,
Mac
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 14.2.4 Deduplication

2019-10-09 Thread Gregory Farnum
So since nobody who's actually working on it has chimed in:
While there is some deduplication functionality built into the system,
AFAIK it's not something considered for users at this point. It's
under ongoing development, doesn't have performance data, and isn't
plumbed through into a lot of the system. Last I heard about it was a
discussion about how to handle the reference counting used to delete
data when it's no longer needed. (See the thread "refcounting chunks
vs snapshots" and https://github.com/ceph/ceph/pull/29283)
-Greg


On Wed, Oct 2, 2019 at 4:48 PM The Zombie Hunter
 wrote:
>
> From my initial testing it looks like 14.2.4 fully supports the deduplication 
> mentioned here:
>
> https://docs.ceph.com/docs/master/dev/deduplication/
>
> However, I'm not sure where the struct object_manifest script goes in 
> relation to foo and foo-chunk, and I'm not aware of what the offsets/caspool 
> should be.
>
> If this still isn't fully implemented how does the dedup tool work? If I 
> remove a file but it exists elsewhere on the volume, will it be purged or 
> would the tool need to run again to clear the data?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 14.2.4 Deduplication

2019-10-09 Thread Alex Gorbachev
On Wed, Oct 9, 2019 at 12:56 PM Gregory Farnum  wrote:

> So since nobody who's actually working on it has chimed in:
> While there is some deduplication functionality built into the system,
> AFAIK it's not something considered for users at this point. It's
> under ongoing development, doesn't have performance data, and isn't
> plumbed through into a lot of the system. Last I heard about it was a
> discussion about how to handle the reference counting used to delete
> data when it's no longer needed. (See the thread "refcounting chunks
> vs snapshots" and https://github.com/ceph/ceph/pull/29283)
> -Greg
>


I was holding off, as this is really not a part of Ceph, but we have done
some preliminary testing of VDO as a client on top of RBD, and it looked
fine.  We also run borg on top of RBD, which is also stable and able to
handle large volumes of data.  Hope this helps.

--
Alex Gorbachev
Intelligent Systems Services Inc.


>
>
> On Wed, Oct 2, 2019 at 4:48 PM The Zombie Hunter
>  wrote:
> >
> > From my initial testing it looks like 14.2.4 fully supports the
> deduplication mentioned here:
> >
> > https://docs.ceph.com/docs/master/dev/deduplication/
> >
> > However, I'm not sure where the struct object_manifest script goes in
> relation to foo and foo-chunk, and I'm not aware of what the
> offsets/caspool should be.
> >
> > If this still isn't fully implemented how does the dedup tool work? If I
> remove a file but it exists elsewhere on the volume, will it be purged or
> would the tool need to run again to clear the data?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected increase in the memory usage of OSDs

2019-10-09 Thread Anthony D'Atri
>>> Do you have statistics on the size of the OSDMaps or count of them
>>> which were being maintained by the OSDs?
>> No, I don't think so. How can I find this information?
> 
> Hmm I don't know if we directly expose the size of maps. There are
> perfcounters which expose the range of maps being kept around but I
> don't know their names off-hand.


FWIW I’ve been told that size of an OSDmap is roughly equivalent to `ceph pg 
dump |wc`, which if true would seem to mean that they’re trivially small for 
most purposes.  Reality of course may be much different and/or nuanced.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Fwd: HeartbeatMap FAILED assert(0 == "hit suicide timeout")

2019-10-09 Thread 潘东元
hi all,
my osd hit suicide timeout.
some log:
2019-10-10 03:53:13.017760 7f1ab886e700  0 -- 192.168.1.5:6810/1028846
>> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=2
pgs=287284 cs=41 l=0 c=0x21431760).fault, initiating reconnect
2019-10-10 03:53:13.017799 7f1ab967c700  0 -- 192.168.1.5:6810/1028846
>> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=1
pgs=287284 cs=42 l=0 c=0x21431760).fault
2019-10-10 03:53:15.890773 7f1acdec3700  0 -- 192.168.1.5:6810/1028846
>> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=2
pgs=423672 cs=85 l=0 c=0x21447900).fault, initiating reconnect
2019-10-10 03:53:15.890890 7f1aba288700  0 -- 192.168.1.5:6810/1028846
>> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=1
pgs=423672 cs=86 l=0 c=0x21447900).fault
2019-10-10 03:53:16.209368 7f1addc3e700  1 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f1ac29a3700' had timed out after 15
2019-10-10 03:53:16.209382 7f1addc3e700  1 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f1ac29a3700' had suicide timed out after 150
2019-10-10 03:53:16.210765 7f1addc3e700 -1 common/HeartbeatMap.cc: In
function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
const char*, time_t)' thread 7f1addc3e700 time 2019-10-10
03:53:16.209415
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x12b) [0xaf2b6b]
 2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xaf3497]
 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xaf3988]
 4: (CephContextServiceThread::entry()+0x13f) [0xb0353f]
 5: (()+0x79d1) [0x7f1ae0b3c9d1]
 6: (clone()+0x6d) [0x7f1adfaccb5d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

can you give some advice on troubleshooting?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: HeartbeatMap FAILED assert(0 == "hit suicide timeout")

2019-10-09 Thread huang jun
If you got a coredump file, then you should check why the thread takes
so long to have a job done.

潘东元  于2019年10月10日周四 上午10:51写道:
>
> hi all,
> my osd hit suicide timeout.
> some log:
> 2019-10-10 03:53:13.017760 7f1ab886e700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=2
> pgs=287284 cs=41 l=0 c=0x21431760).fault, initiating reconnect
> 2019-10-10 03:53:13.017799 7f1ab967c700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=1
> pgs=287284 cs=42 l=0 c=0x21431760).fault
> 2019-10-10 03:53:15.890773 7f1acdec3700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=2
> pgs=423672 cs=85 l=0 c=0x21447900).fault, initiating reconnect
> 2019-10-10 03:53:15.890890 7f1aba288700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=1
> pgs=423672 cs=86 l=0 c=0x21447900).fault
> 2019-10-10 03:53:16.209368 7f1addc3e700  1 heartbeat_map is_healthy
> 'OSD::op_tp thread 0x7f1ac29a3700' had timed out after 15
> 2019-10-10 03:53:16.209382 7f1addc3e700  1 heartbeat_map is_healthy
> 'OSD::op_tp thread 0x7f1ac29a3700' had suicide timed out after 150
> 2019-10-10 03:53:16.210765 7f1addc3e700 -1 common/HeartbeatMap.cc: In
> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7f1addc3e700 time 2019-10-10
> 03:53:16.209415
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
> const*, long)+0x12b) [0xaf2b6b]
>  2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xaf3497]
>  3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xaf3988]
>  4: (CephContextServiceThread::entry()+0x13f) [0xb0353f]
>  5: (()+0x79d1) [0x7f1ae0b3c9d1]
>  6: (clone()+0x6d) [0x7f1adfaccb5d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> can you give some advice on troubleshooting?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io