Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-14 Thread Burkhard Linke

Hi,


AFAIk SD cards (and SATA DOMs) do not have any kind of wear-leveling 
support. Even if the crappy write endurance of these storage systems 
would be enough to operate a server for several years on average, you 
will always have some hot spots with higher than usual write activity. 
This is the case for filesystem journals (xfs, ext4, almost all modern 
filesystems). Been there, done that, had two storage systems failing due 
to SD wear



The only sane setup for SD cards amd DOMs are flash aware filesystems 
like f2fs. Unfortunately most linux distributions do not support these 
in their standard installers.



Short answer: no, do not use SD cards.


Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client hangs

2018-08-14 Thread Yan, Zheng
On Mon, Aug 13, 2018 at 9:55 PM Zhenshi Zhou  wrote:
>
> Hi Burkhard,
> I'm sure the user has permission to read and write. Besides, we're not using 
> EC data pools.
> Now the situation is that any openration to a specific file, the command will 
> hang.
> Operations to any other files won't hang.
>

can ceph-fuse client read the specific file ?

> Burkhard Linke  
> 于2018年8月13日周一 下午9:42写道:
>>
>> Hi,
>>
>>
>> On 08/13/2018 03:22 PM, Zhenshi Zhou wrote:
>> > Hi,
>> > Finally, I got a running server with files /sys/kernel/debug/ceph/xxx/
>> >
>> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat mdsc
>> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat monc
>> > have monmap 2 want 3+
>> > have osdmap 4545 want 4546
>> > have fsmap.user 0
>> > have mdsmap 335 want 336+
>> > fs_cluster_id -1
>> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat osdc
>> > REQUESTS 6 homeless 0
>> > 82580   osd10   1.7f9ddac7  [10,13]/10  [10,13]/10
>> > 1053a04.0x4000241   write
>> > 81019   osd11   1.184ed679  [11,7]/11   [11,7]/11
>> >   105397b.0x4000241   write
>> > 81012   osd12   1.cd98ed57  [12,9]/12   [12,9]/12
>> >   1053971.0x4000241   write,startsync
>> > 82589   osd12   1.7cd5405a  [12,8]/12   [12,8]/12
>> >   1053a13.0x4000241   write,startsync
>> > 80972   osd13   1.91886156  [13,4]/13   [13,4]/13
>> >   1053939.0x4000241   write
>> > 81035   osd13   1.ac5ccb56  [13,4]/13   [13,4]/13
>> >   1053997.0x4000241   write
>> >
>> > The cluster claims nothing, and shows HEALTH_OK still.
>> > What I did is just vim a file storing on cephfs, and then it hung there.
>> > And I got a process with 'D' stat.
>> > By the way, the whole mount directory is still in use and with no error.
>>
>> So there are no pending mds requests, mon seems to be ok, too.
>>
>> But the osd requests seems to be stuck. Are you sure the ceph user used
>> for the mount point is allowed to write to the cephfs data pools? Are
>> you using additional EC data pools?
>>
>> Regards,
>> Burkhard
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client hangs

2018-08-14 Thread Zhenshi Zhou
kernel client

Yan, Zheng  于2018年8月14日周二 下午3:13写道:

> On Mon, Aug 13, 2018 at 9:55 PM Zhenshi Zhou  wrote:
> >
> > Hi Burkhard,
> > I'm sure the user has permission to read and write. Besides, we're not
> using EC data pools.
> > Now the situation is that any openration to a specific file, the command
> will hang.
> > Operations to any other files won't hang.
> >
>
> can ceph-fuse client read the specific file ?
>
> > Burkhard Linke 
> 于2018年8月13日周一 下午9:42写道:
> >>
> >> Hi,
> >>
> >>
> >> On 08/13/2018 03:22 PM, Zhenshi Zhou wrote:
> >> > Hi,
> >> > Finally, I got a running server with files /sys/kernel/debug/ceph/xxx/
> >> >
> >> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]#
> cat mdsc
> >> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]#
> cat monc
> >> > have monmap 2 want 3+
> >> > have osdmap 4545 want 4546
> >> > have fsmap.user 0
> >> > have mdsmap 335 want 336+
> >> > fs_cluster_id -1
> >> > [root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]#
> cat osdc
> >> > REQUESTS 6 homeless 0
> >> > 82580   osd10   1.7f9ddac7  [10,13]/10  [10,13]/10
> >> > 1053a04.0x4000241   write
> >> > 81019   osd11   1.184ed679  [11,7]/11   [11,7]/11
> >> >   105397b.0x4000241   write
> >> > 81012   osd12   1.cd98ed57  [12,9]/12   [12,9]/12
> >> >   1053971.0x4000241   write,startsync
> >> > 82589   osd12   1.7cd5405a  [12,8]/12   [12,8]/12
> >> >   1053a13.0x4000241   write,startsync
> >> > 80972   osd13   1.91886156  [13,4]/13   [13,4]/13
> >> >   1053939.0x4000241   write
> >> > 81035   osd13   1.ac5ccb56  [13,4]/13   [13,4]/13
> >> >   1053997.0x4000241   write
> >> >
> >> > The cluster claims nothing, and shows HEALTH_OK still.
> >> > What I did is just vim a file storing on cephfs, and then it hung
> there.
> >> > And I got a process with 'D' stat.
> >> > By the way, the whole mount directory is still in use and with no
> error.
> >>
> >> So there are no pending mds requests, mon seems to be ok, too.
> >>
> >> But the osd requests seems to be stuck. Are you sure the ceph user used
> >> for the mount point is allowed to write to the cephfs data pools? Are
> >> you using additional EC data pools?
> >>
> >> Regards,
> >> Burkhard
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.


:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   320332.01  1359896.98
2   736360.83  1477975.96
3  1040351.17  1438393.57
4  1392350.94  1437437.51
5  1744350.24  1434576.94
6  2080349.82  1432866.06
7  2416341.73  1399731.23
8  2784348.37  1426930.69
9  3152347.40  1422966.67
   10  3520356.04  1458356.70
   11  3920361.34  1480050.97
elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
$ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd_hdd --image test
rbd journal '10746b8b4567':
header_oid: journal.10746b8b4567
object_oid_prefix: journal_data.2.10746b8b4567.
order: 24 (16 MiB objects)
splay_width: 4
object_pool: rbd_ssd
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   240248.54  1018005.17
2   512263.47  1079154.06
3   768258.74  1059792.10
4  1040258.50  1058812.60
5  1312258.06  1057001.34
6  1536258.21  1057633.14
7  1792253.81  1039604.73
8  2032253.66  1038971.01
9  2256241.41  988800.93
   10  2480237.87  974335.65
   11  2752239.41  980624.20
   12  2992239.61  981440.94
   13  3200233.13  954887.84
   14  3440237.36  972237.80
   15  3680239.47  980853.37
   16  3920238.75  977920.70
elapsed:16  ops: 4096  ops/sec:   245.04  bytes/sec: 1003692.81
$ rados -p rbd_ssd ls | grep journal_data.2.10746b8b4567.
journal_data.2.10746b8b4567.3
journal_data.2.10746b8b4567.0
journal_data.2.10746b8b4567.2
journal_data.2.10746b8b4567.1

rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL
The symptoms that we are experiencing is a huge decrease in write speed ( 1QD 
128K writes from 160MB/s down to 14MB/s ). We see no improvement when moving 
the journal to SSDPOOL ( but we don’t think it is really moving )

If you are trying to optimize for 128KiB writes, you might need to tweak the 
"rbd_journal_max_payload_bytes" setting since it currently is defaulted to 
split journal write events into a maximum of 16KiB payload [1] in order to 
optimize the worst-case memory usage of the rbd-mirror daemon fo

[ceph-users] Ceph upgrade Jewel to Luminous

2018-08-14 Thread Jaime Ibar

Hi all,

we're running Ceph Jewel 10.2.10 in our cluster and we plan to upgrade 
to latest Luminous


release(12.2.7). Jewel 10.2.11 was released one month ago and ours plans 
were upgrade to


this release first and then upgrade to Luminous, but as someone reported 
osd's crashes after


upgrading to Jewel 10.2.11, we wonder if would be possible to skip this 
Jewel release and


upgrade directly to Luminous 12.2.7.

Thanks

Jaime

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph upgrade Jewel to Luminous

2018-08-14 Thread Jaime Ibar
Hi all,

we're running Ceph Jewel 10.2.10 in our cluster and we plan to upgrade to 
latest Luminous

release(12.2.7). Jewel 10.2.11 was released one month ago and ours plans were 
upgrade to

this release first and then upgrade to Luminous, but as someone reported osd's 
crashes after

upgrading to Jewel 10.2.11, we wonder if would be possible to skip this Jewel 
release and

upgrade directly to Luminous 12.2.7.

Thanks

Jaime
Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie (http://www.tchpc.tcd.ie/) | ja...@tchpc.tcd.ie 
(mailto:ja...@tchpc.tcd.ie)
Tel: +353-1-896-3725
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mimic/bluestore cluster can't allocate space for bluefs

2018-08-14 Thread Jakub Stańczak
Hello All!

I am using mimic full bluestore cluster with pure RGW workload. We use AWS
i3 instance family for osd machines - each instance has 1 NVMe disk which
is split into 4 partitions and each of those partitions is devoted to
bluestore block device. We use 1 device per partition - so everything is
managed by bluestore internally.

The problem is that under write heavy conditions DB device is growing fast
and at some point bluefs will stop getting more space which results in osd
death. There is no recovery from this error - when bluefs runs out of space
for rocksdb, osd dies and it cannot be restarted.

With this particular osd there is plenty of free space but we can see that
it cannot allocate more space under weird address '_balance_bluefs_freespace
no allocate on 0x8000'.

I've also did some bluefs tuning cause previously I had similar problems
but it appeared that bluestore could not keep up with providing enough
storage for bluefs.

bluefs settings:
bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio =
0.05 bluestore_bluefs_min_free = 3221225472

snippet from osd logs:

2018-08-13 18:15:10.960 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:11.330 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:11.752 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590:
304401 keys, 68804532 bytes
2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1534184111786253, "cf_name": "default", "job": 41,
"event": "table_file_creation", "file_number": 14590, "file_size":
68804532, "table_properties": {"data_size
": 67112437, "index_size": 92, "filter_size": 913252,
"raw_key_size": 13383306, "raw_average_key_size": 43,
"raw_value_size": 58673606, "raw_average_value_size": 192,
"num_data_blocks": 17090, "num_entries": 304401, "filter_policy_na
me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": "0"}}
2018-08-13 18:15:12.245 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:12.664 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591:
313351 keys, 68830515 bytes
2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1534184112744129, "cf_name": "default", "job": 41,
"event": "table_file_creation", "file_number": 14591, "file_size":
68830515, "table_properties": {"data_size
": 67109446, "index_size": 785852, "filter_size": 934166,
"raw_key_size": 13762246, "raw_average_key_size": 43,
"raw_value_size": 58469928, "raw_average_value_size": 186,
"num_data_blocks": 17124, "num_entries": 313351, "filter_policy_na
me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": "0"}}
2018-08-13 18:15:13.025 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:13.405 7f6a5b882700  1 bluefs _allocate failed to
allocate 0x420 on bdev 1, free 0x350; fallback to bdev 2
2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _allocate failed to
allocate 0x420 on bdev 2, dne
2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range allocated:
0x0 offset: 0x0 length: 0x419db1f
2018-08-13 18:15:13.405 7f6a54073700  0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x8000 min_alloc_size 0x2000
2018-08-13 18:15:13.409 7f6a5b882700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue
FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*,
uint64_t, uint64_t)' thread 7f6a5b882700 time 2018-08-13
18:15:13.406645
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/BlueFS.cc:
1663: FAILED assert(0 == "bluefs
enospc")

 ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
 1:

Re: [ceph-users] Replicating between two datacenters without decompiling CRUSH map

2018-08-14 Thread Paul Emmerich
IIRC this will create a rule that tries to selects n independent data
centers
Check the actual generated rule to validate this.

I think the only way to express "3 copies across two data centers" is by
explicitly
using the two data centers in the rule as in:

(pseudo code)
take dc1
chooseleaf 1 type host
emit
take dc2
chooseleaf 2 type host
emit

Which will always place 1 on dc1 and 2 in dc2. A rule like

take default
choose 2 type datacenter
chooseleafe 2 type host
emit

will select a total of 4 hosts in two different data centers (2 hosts per
dc)

But the real problem here is that 2 data centers in one Ceph cluster is just
a poor fit for Ceph in most scenarios. 3 would be fine. Two independent
clusters and async rbd-mirror or rgw synchronization would also be fine.

But one cluster in two data centers and replicating via CRUSH just isn't
how it works.
Maybe you are looking for something like "3 independent racks" and you
happen
to have two racks in each dc? Really depends on your setup and requirements.


Paul

2018-08-13 14:09 GMT+02:00 Torsten Casselt :

> Hi,
>
> I created a rule with this command:
>
> ceph osd crush rule create-replicated rrd default datacenter
>
> Since chooseleaf type is 1, I expected it to distribute the copies
> evenly on two datacenters with six hosts each. For example, six copies
> would mean a copy on each host.
>
> When I test the resulting CRUSH map with the crush tool I get bad
> mappings. PGs stay in active+clean+remapped and
> active+undersized+remapped. I thought it might help if I increase the
> choose tries, but it stays the same.
>
> What is the best method to distribute at least three copies over two
> datacenters? Since the docs state that it is rarely needed to decompile
> the CRUSH map, I thought it must be possible with a rule create command
> like above. I don’t think it is that rare to have two sites…
>
> Thanks!
> Torsten
>
> --
> Torsten Casselt, IT-Sicherheit, Leibniz Universität IT Services
> Tel: +49-(0)511-762-799095  Schlosswender Str. 5
> Fax: +49-(0)511-762-3003D-30159 Hannover
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stale PG data loss

2018-08-14 Thread Surya Bala
We have configured cephfs .

Yes i can restore the data. But i need to know which files are corrupted.
So that i can delete those files and copy them again.
In normal state i can get inode id of files and mapping inode id with
object id. So that i can get files to object mapping . using ceph osd map
command i can find objects to PG ans OSD mapping.

But once the objects are losts that time how can i find which are the
objects lost. Is there any way to list the objects of a PG

Regards
Surya Balan

On Mon, Aug 13, 2018 at 6:12 PM, Janne Johansson 
wrote:

> "Don't run with replication 1 ever".
>
> Even if this is a test, it tests something for which a resilient cluster
> is specifically designed to avoid.
> As for enumerating what data is missing, it would depend on if the pool(s)
> had cephfs, rbd images or rgw data in them.
>
> When this kind of data loss happens to you, you restore from your backups.
>
>
>
>
> Den mån 13 aug. 2018 kl 14:26 skrev Surya Bala :
>
>> Any suggestion on this please
>>
>> Regards
>> Surya Balan
>>
>> On Fri, Aug 10, 2018 at 11:28 AM, Surya Bala 
>> wrote:
>>
>>> Hi folks,
>>>
>>>  I was trying to test the below case
>>>
>>> Having pool with replication count as 1 and if one osd goes down, then
>>> the PGs mapped to that OSD become stale.
>>>
>>> If the hardware failure happen then the data in that OSD lost. So some
>>> parts of some files are lost . How can i find what are the files which got
>>> currupted.
>>>
>>> Regards
>>> Surya Balan
>>>
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PG could not be repaired

2018-08-14 Thread Arvydas Opulskis
Thanks for suggestion about restarting OSD's, but this doesn't work either.

Anyway, I managed to fix second unrepairing PG by getting object from OSD
and saving it again via rados, but still no luck with first one.
I think, I found main problem why this doesn't work. Seems, object is not
overwritten, even rados command returns no errors. I tried to delete
object, but it still stays in pool untouched. There is an example of what I
see:

# rados -p .rgw.buckets ls | grep -i
"sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d"
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

# rados -p .rgw.buckets get
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d
testfile
error getting
.rgw.buckets/default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d:
(2) No such file or directory

# rados -p .rgw.buckets rm
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

# rados -p .rgw.buckets ls | grep -i
"sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d"
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

I've never seen this in our Ceph clusters before. Should I report a bug
about it? If any of you guys need more diagnostic info - let me know.

Thanks,
Arvydas

On Tue, Aug 7, 2018 at 5:49 PM, Brent Kennedy  wrote:

> Last time I had an inconsistent PG that could not be repaired using the
> repair command, I looked at which OSDs hosted the PG, then restarted them
> one by one(usually stopping, waiting a few seconds, then starting them back
> up ).  You could also stop them, flush the journal, then start them back
> up.
>
>
>
> If that didn’t work, it meant there was data loss and I had to use the
> ceph-objectstore-tool repair tool to export the objects from a location
> that had the latest data and import into the one that had no data.  The
> ceph-objectstore-tool is not a simple thing though and should not be used
> lightly.  When I say data loss, I mean that ceph thinks the last place
> written has the data, that place being the OSD that doesn’t actually have
> the data(meaning it failed to write there).
>
>
>
> If you want to go that route, let me know, I wrote a how to on it.  Should
> be the last resort though.  I also don’t know your setup, so I would hate
> to recommend something so drastic.
>
>
>
> -Brent
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Arvydas Opulskis
> *Sent:* Monday, August 6, 2018 4:12 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Inconsistent PG could not be repaired
>
>
>
> Hi again,
>
>
>
> after two weeks I've got another inconsistent PG in same cluster. OSD's
> are different from first PG, object can not be GET as well:
>
>
> # rados list-inconsistent-obj 26.821 --format=json-pretty
>
> {
>
> "epoch": 178472,
>
> "inconsistents": [
>
> {
>
> "object": {
>
> "name": "default.122888368.52__shadow_
> .3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7",
>
> "nspace": "",
>
> "locator": "",
>
> "snap": "head",
>
> "version": 118920
>
> },
>
> "errors": [],
>
> "union_shard_errors": [
>
> "data_digest_mismatch_oi"
>
> ],
>
> "selected_object_info": "26:8411bae4:::default.
> 122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7:head(126495'118920
> client.142609570.0:41412640 dirty|data_digest|omap_digest s 4194304 uv
> 118920 dd cd142aaa od  alloc_hint [0 0])",
>
> "shards": [
>
> {
>
> "osd": 20,
>
> "errors": [
>
> "data_digest_mismatch_oi"
>
> ],
>
> "size": 4194304,
>
> "omap_digest": "0x",
>
> "data_digest": "0x6b102e59"
>
> },
>
> {
>
> "osd": 44,
>
> "errors": [
>
> "data_digest_mismatch_oi"
>
> ],
>
> "size": 4194304,
>
> "omap_digest": "0x",
>
> "data_digest": "0x6b102e59"
>
> }
>
> ]
>
> }
>
> ]
>
> }
>
> # rados -p .rgw.buckets get default.122888368.52__shadow_.
> 3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7 test_2pg.file
>
> error getting .rgw.

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Jason Dillaman
On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
wrote:

> Hello Jason,
>
>
>
> I can confirm that your tests work on our cluster with a newly created
> image.
>
>
>
> We still can’t get the current images to use a different object pool. Do
> you think that maybe another feature is incompatible with this feature?
> Below is a log of the issue.
>

I wouldn't think so. I used master branch for my testing but I'll try
12.2.7 just in case it's an issue that's only in the luminous release.


> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd: journaling is not enabled for image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
>
>
> :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
> journaling --journal-pool RBD_SSD
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd journal '37c8974b0dc51':
>
> header_oid: journal.37c8974b0dc51
>
> object_oid_prefix: journal_data.1.37c8974b0dc51.
>
> order: 24 (16384 kB objects)
>
> splay_width: 4
>
> *** 
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten, journaling
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
> journal: 37c8974b0dc51
>
> mirroring state: disabled
>
>
>
> Kind regards,
>
> *Glen Baars*
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 12:04 AM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any
> objects. Is this the correct way to view the journal objects?
>
>
>
> You won't see any journal objects in the SSDPOOL until you issue a write:
>
>
>
> $ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
>
> $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M
> rbd_hdd/test --rbd-cache=false
>
> bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
>
>   SEC   OPS   OPS/SEC   BYTES/SEC
>
> 1   320332.01  1359896.98
>
> 2   736360.83  1477975.96
>
> 3  1040351.17  1438393.57
>
> 4  1392350.94  1437437.51
>
> 5  1744350.24  1434576.94
>
> 6  2080349.82  1432866.06
>
> 7  2416341.73  1399731.23
>
> 8  2784348.37  1426930.69
>
> 9  3152347.40  1422966.67
>
>10  3520356.04  1458356.70
>
>11  3920361.34  1480050.97
>
> elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
>
> $ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
>
> $ rbd journal info --pool rbd_hdd --image test
>
> rbd journal '10746b8b4567':
>
> header_oid: journal.10746b8b4567
>
> object_oid_prefix: journal_data.2.10746b8b4567.
>
> order: 24 (16 MiB objects)
>
> splay_width: 4
>
> object_pool: rbd_ssd
>
> $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M
> rbd_hdd/test --rbd-cache=false
>
> bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
>
>   SEC   OPS   OPS/SEC   BYTES/SEC
>
> 1   240248.54  1018005.17
>
> 2   512263.47  1079154.06
>
> 3   768258.74  1059792.10
>
> 4  1040258.50  1058812.60
>
> 5  1312258.06  1057001.34
>
> 6  1536258.21  1057633.14
>
> 7  1792253.81  1039604.73
>
> 8  2032253.66  1038971.01
>
> 9  2256241.41  988800.93
>
>10  2480237.87  974335.65
>
>11  2752239.41  980624.20
>
>12  2992239.61  981440.94
>
>13  3200233.13  954887.84
>
>14  3440237.36  972237.80
>
>15  3680239.47  980853.37
>
>16  3920238.75  977920.70
>
> elapsed:16  ops: 4096  ops/sec:   245.04  bytes/sec: 1003692.81
>
> $ rados -p rbd_ssd ls | grep journal_data.2.10746b8b4567.
>
> journal_data.2.10746b8b4567.3
>
> journal_data.2.10746b8b4567.0
>
> journal_data.2.10746b8b4567.2
>
> journal

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   320332.01  1359896.98
2   736360.83  1477975.96
3  1040351.17  1438393.57
4  1392350.94  1437437.51
5  1744350.24  1434576.94
6  2080349.82  1432866.06
7  2416341.73  1399731.23
8  2784348.37  1426930.69
9  3152347.40  1422966.67
   10  3520356.04  1458356.70
   11  3920361.34  1480050.97
elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
$ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd_hdd --image test
rbd journal '10746b8b4567':
header_oid: journal.10746b8b4567
object_oid_prefix: journal_data.2.10746b8b4567.
order: 24 (16 MiB objects)
splay_width: 4
object_pool: rbd_ssd
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   240248.54  1018005.17
2   512263.47  1079154.06
3   768258.74  1059792.10
4  1040258.50  1058812.60
5  1312258.06  1057001.34
6  1536258.21  1057633.14
7  1792253.81  1039604.73
8  2032253.66  1038971.01
9  2256241.41  988800.93
   10  2480237.87  974335.65
   11  2752239.41  980624.20
   12  2992239.61  981440.94
   13  3200233.13  954887.84
   14  3440237.36  972237.80
   15  3680239.47  980853.37
   16  3920238.75  977920.70
elapsed:16  ops: 4096  ops/sec:   245.0

Re: [ceph-users] bad crc/signature errors

2018-08-14 Thread Ilya Dryomov
On Mon, Aug 13, 2018 at 5:57 PM Nikola Ciprich
 wrote:
>
> Hi Ilya,
>
> hmm, OK, I'm not  sure now whether this is the bug which I'm
> experiencing.. I've had read_partial_message  / bad crc/signature
> problem occurance on the second cluster in short period even though
> we're on the same ceph version (12.2.5) for quite long time (almost since
> its release), so it's starting to pain me.. I suppose this must
> have been caused by some kernel update, (we're currently sticking
> to 4.14.x and lately been upgrading to 4.14.50)

These "bad crc/signature" are usually the sign of faulty hardware.

What was the last "good" kernel and the first "bad" kernel?

You said "on the second cluster".  How is it different from the first?
Are you using the kernel client with both?  Is there Xen involved?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2018-08-14 Thread Nikola Ciprich
> > Hi Ilya,
> >
> > hmm, OK, I'm not  sure now whether this is the bug which I'm
> > experiencing.. I've had read_partial_message  / bad crc/signature
> > problem occurance on the second cluster in short period even though
> > we're on the same ceph version (12.2.5) for quite long time (almost since
> > its release), so it's starting to pain me.. I suppose this must
> > have been caused by some kernel update, (we're currently sticking
> > to 4.14.x and lately been upgrading to 4.14.50)
> 
> These "bad crc/signature" are usually the sign of faulty hardware.
> 
> What was the last "good" kernel and the first "bad" kernel?
> 
> You said "on the second cluster".  How is it different from the first?
> Are you using the kernel client with both?  Is there Xen involved?

it's complicated.. both those clusters are fairly new, running kernel 4.14.50,
ceph 12.2.5. XEN is not involved, but KVM is. 

I think those were already installed with this kernel. 

I was thinking about that, and main difference compared to other (and older)
clusters is, krbd is used much more: before, we were  using krbd only for
postgres, and qemu-kvm accessed RBD volumes using librbd. on new clusters
where problems occured, all volumes are accessed using krbd, since it performs
way much better.. so we'll just revert to librbd and I'll try to find way to
reproduce. If I find some, we can talk about bisect, but it's possible the 
problem
is here for the long time, but since we didn't use krbd heavily, it just didn't
occur..

but I think we can rule out hardware problem here..


> 
> Thanks,
> 
> Ilya
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journal feature

2018-08-14 Thread Jason Dillaman
I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.101e6b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
header_oid: journal.101e6b8b4567
object_oid_prefix: journal_data.1.101e6b8b4567.
order: 24 (16384 kB objects)
splay_width: 4
object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are
overwriting any configuration settings? Do you have any client
configuration overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
wrote:

> Hello Jason,
>
>
>
> I will also complete testing of a few combinations tomorrow to try and
> isolate the issue now that we can get it to work with a new image.
>
>
>
> The cluster started out at 12.2.3 bluestore so there shouldn’t be any old
> issues from previous versions.
>
> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 7:43 PM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I can confirm that your tests work on our cluster with a newly created
> image.
>
>
>
> We still can’t get the current images to use a different object pool. Do
> you think that maybe another feature is incompatible with this feature?
> Below is a log of the issue.
>
>
>
> I wouldn't think so. I used master branch for my testing but I'll try
> 12.2.7 just in case it's an issue that's only in the luminous release.
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd: journaling is not enabled for image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
>
>
> :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
> journaling --journal-pool RBD_SSD
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd journal '37c8974b0dc51':
>
> header_oid: journal.37c8974b0dc51
>
> object_oid_prefix: journal_data.1.37c8974b0dc51.
>
> order: 24 (16384 kB objects)
>
> splay_width: 4
>
> *** 
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten, journaling
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
> journal: 37c8974b0dc51
>
> mirroring state: disabled
>
>
>
> Kind regards,
>
> *Glen Baars*
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 12:04 AM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any
> objects. Is this the correct way to view the journal objects?
>
>
>
> You won't see any journal objects in the SSDPOOL until you issue a write:
>
>
>
> $ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
>
> $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M
> rbd_hdd/test --rbd-cache=false
>
> bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
>
>   SEC   OPS   OPS/SEC   BYTES/SEC
>
> 1   320332.01  1359896.98
>
> 2   736360.83  1477975.96
>
> 3  1040351.17  1438393.57
>
> 4  1392350.94  1437437.51
>
> 5  1744350.24  1434576.94
>
> 6  2080349.82  1432866.06
>
> 7  2416341.73  1399731.23
>
> 8  2784348.37  1426930.69
>
> 9  3152347.40  1422966.67
>
>10  3520356.04  1458356.70
>
>11  3920361.34  1480050.97
>
> elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
>
> $ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
>
> $ rbd journal info --pool rbd_hdd --image test
>

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   320332.01  1359896.98
2   736360.83

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Jason Dillaman
On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
wrote:

> Hello Jason,
>
>
>
> I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf.
> it doesn’t seem to make a difference.
>

It should be SSDPOOL, but regardless, I am at a loss as to why it's not
working for you. You can try appending "--debug-rbd=20" to the end of the
"rbd feature enable" command and provide the generated logs in a pastebin
link.


> Also, here is the output:
>
>
>
> rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> There are 0 metadata on this image.
>
> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 9:00 PM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling
> journaling on a different pool:
>
>
>
> $ rbd info rbd/foo
>
> rbd image 'foo':
>
>size 1024 MB in 256 objects
>
>order 22 (4096 kB objects)
>
>block_name_prefix: rbd_data.101e6b8b4567
>
>format: 2
>
>features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
>flags:
>
>create_timestamp: Tue Aug 14 08:51:19 2018
>
> $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
>
> $ rbd journal info --pool rbd --image foo
>
> rbd journal '101e6b8b4567':
>
>header_oid: journal.101e6b8b4567
>
>object_oid_prefix: journal_data.1.101e6b8b4567.
>
>order: 24 (16384 kB objects)
>
>splay_width: 4
>
>object_pool: rbd_ssd
>
>
>
> Can you please run "rbd image-meta list " to see if you are
> overwriting any configuration settings? Do you have any client
> configuration overrides in your "/etc/ceph/ceph.conf"?
>
>
>
> On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I will also complete testing of a few combinations tomorrow to try and
> isolate the issue now that we can get it to work with a new image.
>
>
>
> The cluster started out at 12.2.3 bluestore so there shouldn’t be any old
> issues from previous versions.
>
> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 7:43 PM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I can confirm that your tests work on our cluster with a newly created
> image.
>
>
>
> We still can’t get the current images to use a different object pool. Do
> you think that maybe another feature is incompatible with this feature?
> Below is a log of the issue.
>
>
>
> I wouldn't think so. I used master branch for my testing but I'll try
> 12.2.7 just in case it's an issue that's only in the luminous release.
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd: journaling is not enabled for image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
>
>
> :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
> journaling --journal-pool RBD_SSD
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd journal '37c8974b0dc51':
>
> header_oid: journal.37c8974b0dc51
>
> object_oid_prefix: journal_data.1.37c8974b0dc51.
>
> order: 24 (16384 kB objects)
>
> splay_width: 4
>
> *** 
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten, journaling
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
> journal: 37c8974b0dc51
>
> mirroring state: disabled
>
>
>
> Kind regards,
>
> *Glen Baars*
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 12:04 AM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any
> objects. Is this the correct 

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Jason Dillaman
On Tue, Aug 14, 2018 at 9:31 AM Glen Baars 
wrote:

> Hello Jason,
>
>
>
> I have now narrowed it down.
>
>
>
> If the image has an exclusive lock – the journal doesn’t go on the correct
> pool.
>

OK, that makes sense. If you have an active client on the image holding the
lock, the request to enable journaling is sent over to that client but it's
missing all the journal options. I'll open a tracker ticket to fix the
issue.

Thanks.


> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 9:29 PM
> *To:* Glen Baars 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
>
>
> On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf.
> it doesn’t seem to make a difference.
>
>
>
> It should be SSDPOOL, but regardless, I am at a loss as to why it's not
> working for you. You can try appending "--debug-rbd=20" to the end of the
> "rbd feature enable" command and provide the generated logs in a pastebin
> link.
>
>
>
> Also, here is the output:
>
>
>
> rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> There are 0 metadata on this image.
>
> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 9:00 PM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling
> journaling on a different pool:
>
>
>
> $ rbd info rbd/foo
>
> rbd image 'foo':
>
>size 1024 MB in 256 objects
>
>order 22 (4096 kB objects)
>
>block_name_prefix: rbd_data.101e6b8b4567
>
>format: 2
>
>features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
>flags:
>
>create_timestamp: Tue Aug 14 08:51:19 2018
>
> $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
>
> $ rbd journal info --pool rbd --image foo
>
> rbd journal '101e6b8b4567':
>
>header_oid: journal.101e6b8b4567
>
>object_oid_prefix: journal_data.1.101e6b8b4567.
>
>order: 24 (16384 kB objects)
>
>splay_width: 4
>
>object_pool: rbd_ssd
>
>
>
> Can you please run "rbd image-meta list " to see if you are
> overwriting any configuration settings? Do you have any client
> configuration overrides in your "/etc/ceph/ceph.conf"?
>
>
>
> On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I will also complete testing of a few combinations tomorrow to try and
> isolate the issue now that we can get it to work with a new image.
>
>
>
> The cluster started out at 12.2.3 bluestore so there shouldn’t be any old
> issues from previous versions.
>
> Kind regards,
>
> *Glen Baars*
>
>
>
> *From:* Jason Dillaman 
> *Sent:* Tuesday, 14 August 2018 7:43 PM
> *To:* Glen Baars 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] RBD journal feature
>
>
>
> On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
> wrote:
>
> Hello Jason,
>
>
>
> I can confirm that your tests work on our cluster with a newly created
> image.
>
>
>
> We still can’t get the current images to use a different object pool. Do
> you think that maybe another feature is incompatible with this feature?
> Below is a log of the issue.
>
>
>
> I wouldn't think so. I used master branch for my testing but I'll try
> 12.2.7 just in case it's an issue that's only in the luminous release.
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
>
> flags:
>
> create_timestamp: Sat May  5 11:39:07 2018
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd: journaling is not enabled for image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
>
>
> :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
> journaling --journal-pool RBD_SSD
>
>
>
> :~# rbd journal info --pool RBD_HDD --image
> 2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd journal '37c8974b0dc51':
>
> header_oid: journal.37c8974b0dc51
>
> object_oid_prefix: journal_data.1.37c8974b0dc51.
>
> order: 24 (16384 kB objects)
>
> splay_width: 4
>
> *** 
>
>
>
> :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
>
> rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
>
> size 51200 MB in 12800 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.37c8974b0dc51
>
> format: 2
>
>

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

Thanks for your help. Here is the output you asked for also.

https://pastebin.com/dKH6mpwk
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:33 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 9:31 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.

OK, that makes sense. If you have an active client on the image holding the 
lock, the request to enable journaling is sent over to that client but it's 
missing all the journal options. I'll open a tracker ticket to fix the issue.

Thanks.

Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~#

Re: [ceph-users] Slow rbd reads (fast writes) with luminous + bluestore

2018-08-14 Thread Emmanuel Lacour
Le 13/08/2018 à 16:58, Jason Dillaman a écrit :
>
> See [1] for ways to tweak the bluestore cache sizes. I believe that by
> default, bluestore will not cache any data but instead will only
> attempt to cache its key/value store and metadata.

I suppose too because default ratio is to cache as much as possible k/v
up to 512M and hdd cache is 1G by default.

I tried to increase hdd cache up to 4G and it seems to be used, 4 osd
processes uses 20GB now.

> In general, however, I would think that attempting to have bluestore
> cache data is just an attempt to optimize to the test instead of
> actual workloads. Personally, I think it would be more worthwhile to
> just run 'fio --ioengine=rbd' directly against a pre-initialized image
> after you have dropped the cache on the OSD nodes.

So with bluestore, I assume that we need to think more of client page
cache (at least when using a VM)  when with old filestore both osd and
client cache where used.
 
For benchmark, I did real benchmark here for the expected app workload
of this new cluster and it's ok for us :)


Thanks for your help Jason.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rhel/centos7 spectre meltdown experience

2018-08-14 Thread Marc Roos



Did anyone notice any performance loss on osd, mon, rgw nodes because of 
the spectre/meltdown updates? What is general practice concerning these 
updates?



Sort of follow up on this discussion.
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg43136.html
https://access.redhat.com/articles/3311301
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic/bluestore cluster can't allocate space for bluefs

2018-08-14 Thread Igor Fedotov

Hi Jakub,

for the crashing OSD could you please set

debug_bluestore=10

bluestore_bluefs_balance_failure_dump_interval=1


and collect more logs.

This will hopefully provide more insight on why additional space isn't 
allocated for bluefs.


Thanks,

Igor


On 8/14/2018 12:41 PM, Jakub Stańczak wrote:

Hello All!

I am using mimic full bluestore cluster with pure RGW workload. We use 
AWS i3 instance family for osd machines - each instance has 1 NVMe 
disk which is split into 4 partitions and each of those partitions is 
devoted to bluestore block device. We use 1 device per partition - so 
everything is managed by bluestore internally.


The problem is that under write heavy conditions DB device is growing 
fast and at some point bluefs will stop getting more space which 
results in osd death. There is no recovery from this error - when 
bluefs runs out of space for rocksdb, osd dies and it cannot be restarted.


With this particular osd there is plenty of free space but we can see 
that it cannot allocate more space under weird address 
'_balance_bluefs_freespace no allocate on 0x8000'.


I've also did some bluefs tuning cause previously I had similar 
problems but it appeared that bluestore could not keep up with 
providing enough storage for bluefs.


bluefs settings:
bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio 
= 0.05 bluestore_bluefs_min_free = 3221225472


snippet from osd logs:
2018-08-13 18:15:10.960 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:11.330 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:11.752 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:11.785 
7f6a5b882700 4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb 
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 
304401 keys, 68804532 bytes 2018-08-13 18:15:11.785 7f6a5b882700 4 
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184111786253, "cf_name": 
"default", "job": 41, "event": "table_file_creation", "file_number": 
14590, "file_size": 68804532, "table_properties": {"data_size ": 
67112437, "index_size": 92, "filter_size": 913252, "raw_key_size": 
13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, 
"raw_average_value_size": 192, "num_data_blocks": 17090, 
"num_entries": 304401, "filter_policy_na me": 
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
"0"}} 2018-08-13 18:15:12.245 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:12.664 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb 
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 
313351 keys, 68830515 bytes 2018-08-13 18:15:12.743 7f6a5b882700 4 
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184112744129, "cf_name": 
"default", "job": 41, "event": "table_file_creation", "file_number": 
14591, "file_size": 68830515, "table_properties": {"data_size ": 
67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": 
13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, 
"raw_average_value_size": 186, "num_data_blocks": 17124, 
"num_entries": 313351, "filter_policy_na me": 
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
"0"}} 2018-08-13 18:15:13.025 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:13.405 
7f6a5b882700 1 bluefs _allocate failed to allocate 0x420 on bdev 
1, free 0x350; fallback to bdev 2 2018-08-13 18:15:13.405 
7f6a5b882700 -1 bluefs _allocate failed to allocate 0x420 on bdev 
2, dne 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range 
allocated: 0x0 offset: 0x0 length: 0x419db1f 2018-08-13 18:15:13.405 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:13.409 7f6a5b882700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue 
FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, 
uint64_t, uint64_t)' thread 7f6a5b882700

Re: [ceph-users] Ceph upgrade Jewel to Luminous

2018-08-14 Thread Thomas White
Hi Jaime,

 

Upgrading directly should not be a problem. It is usually recommended to go to 
the latest minor release before upgrading major versions, but my own migration 
from 10.2.10 to 12.2.5 went seamlessly and I can’t see of any technical 
limitation which would hinder or prevent this process.

 

Kind Regards,

 

Tom

 

From: ceph-users  On Behalf Of Jaime Ibar
Sent: 14 August 2018 10:00
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph upgrade Jewel to Luminous

 

Hi all,

we're running Ceph Jewel 10.2.10 in our cluster and we plan to upgrade to 
latest Luminous

release(12.2.7). Jewel 10.2.11 was released one month ago and ours plans were 
upgrade to

this release first and then upgrade to Luminous, but as someone reported osd's 
crashes after

upgrading to Jewel 10.2.11, we wonder if would be possible to skip this Jewel 
release and

upgrade directly to Luminous 12.2.7.

Thanks

Jaime


Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie   | ja...@tchpc.tcd.ie 
 
Tel: +353-1-896-3725 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PG could not be repaired

2018-08-14 Thread Thomas White
Hi Arvydas,

 

The error seems to suggest this is not an issue with your object data, but the 
expected object digest data. I am unable to access where I stored my very hacky 
diagnosis process for this, but our eventual fix was to locate the bucket or 
files affected and then rename an object within it, forcing a recalculation of 
the digest. Depending on the size of the pool perhaps it would be possible to 
randomly rename a few files to cause this recalculation to occur to see if this 
remedies it?

 

Kind Regards,

 

Tom

 

From: ceph-users  On Behalf Of Arvydas 
Opulskis
Sent: 14 August 2018 12:33
To: Brent Kennedy 
Cc: Ceph Users 
Subject: Re: [ceph-users] Inconsistent PG could not be repaired

 

Thanks for suggestion about restarting OSD's, but this doesn't work either.

 

Anyway, I managed to fix second unrepairing PG by getting object from OSD and 
saving it again via rados, but still no luck with first one. 

I think, I found main problem why this doesn't work. Seems, object is not 
overwritten, even rados command returns no errors. I tried to delete object, 
but it still stays in pool untouched. There is an example of what I see:

 

# rados -p .rgw.buckets ls | grep -i 
"sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d"
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

# rados -p .rgw.buckets get 
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d
 testfile
error getting 
.rgw.buckets/default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d:
 (2) No such file or directory

# rados -p .rgw.buckets rm 
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

# rados -p .rgw.buckets ls | grep -i 
"sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d"
default.142609570.87_20180203.020047/repositories/docker-local/yyy/company.yyy.api.assets/1.2.4/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d

 

I've never seen this in our Ceph clusters before. Should I report a bug about 
it? If any of you guys need more diagnostic info - let me know.

 

Thanks,

Arvydas

 

On Tue, Aug 7, 2018 at 5:49 PM, Brent Kennedy mailto:bkenn...@cfl.rr.com> > wrote:

Last time I had an inconsistent PG that could not be repaired using the repair 
command, I looked at which OSDs hosted the PG, then restarted them one by 
one(usually stopping, waiting a few seconds, then starting them back up ).  You 
could also stop them, flush the journal, then start them back up.  

 

If that didn’t work, it meant there was data loss and I had to use the 
ceph-objectstore-tool repair tool to export the objects from a location that 
had the latest data and import into the one that had no data.  The 
ceph-objectstore-tool is not a simple thing though and should not be used 
lightly.  When I say data loss, I mean that ceph thinks the last place written 
has the data, that place being the OSD that doesn’t actually have the 
data(meaning it failed to write there).

 

If you want to go that route, let me know, I wrote a how to on it.  Should be 
the last resort though.  I also don’t know your setup, so I would hate to 
recommend something so drastic.

 

-Brent

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
 ] On Behalf Of Arvydas Opulskis
Sent: Monday, August 6, 2018 4:12 AM
To: ceph-users@lists.ceph.com  
Subject: Re: [ceph-users] Inconsistent PG could not be repaired

 

Hi again,

 

after two weeks I've got another inconsistent PG in same cluster. OSD's are 
different from first PG, object can not be GET as well: 


# rados list-inconsistent-obj 26.821 --format=json-pretty

{

"epoch": 178472,

"inconsistents": [

{

"object": {

"name": 
"default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7",

"nspace": "",

"locator": "",

"snap": "head",

"version": 118920

},

"errors": [],

"union_shard_errors": [

"data_digest_mismatch_oi"

],

"selected_object_info": 
"26:8411bae4:::default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7:head(126495'118920
 client.142609570.0:41412640 dirty|data_digest|omap_digest s 4194304 uv 118920 
dd cd142aaa od  alloc_hint [0 0])",

"shards": [

{

"osd": 20,

"errors": [

"data_digest_mismatc

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-14 Thread Paul Emmerich
I've seen the OS running on SATA DOMs and cheap USB sticks.
It works well for some time, and then it just falls apart.

Paul

2018-08-14 9:12 GMT+02:00 Burkhard Linke
:
> Hi,
>
>
> AFAIk SD cards (and SATA DOMs) do not have any kind of wear-leveling
> support. Even if the crappy write endurance of these storage systems would
> be enough to operate a server for several years on average, you will always
> have some hot spots with higher than usual write activity. This is the case
> for filesystem journals (xfs, ext4, almost all modern filesystems). Been
> there, done that, had two storage systems failing due to SD wear
>
>
> The only sane setup for SD cards amd DOMs are flash aware filesystems like
> f2fs. Unfortunately most linux distributions do not support these in their
> standard installers.
>
>
> Short answer: no, do not use SD cards.
>
>
> Regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] BlueStore wal vs. db size

2018-08-14 Thread Robert Stanford
I am keeping the wal and db for a ceph cluster on an SSD.  I am using the
masif_bluestore_block_db_size / masif_bluestore_block_wal_size parameters
in ceph.conf to specify how big they should be.  Should these values be the
same, or should one be much larger than the other?

 R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore wal vs. db size

2018-08-14 Thread Wido den Hollander


On 08/15/2018 04:17 AM, Robert Stanford wrote:
> I am keeping the wal and db for a ceph cluster on an SSD.  I am using
> the masif_bluestore_block_db_size / masif_bluestore_block_wal_size
> parameters in ceph.conf to specify how big they should be.  Should these
> values be the same, or should one be much larger than the other?
> 

This has been answered multiple times on this mailinglist in the last
months, a bit of searching would have helped.

Nevertheless, 1GB for the WAL is sufficient and then allocate about 10GB
of DB per TB of storage. That should be enough in most use cases.

Now, if you can spare more DB space, do so!

Wido

>  R
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com