Running into weird issues here as well in a test environment. I don't have a 
solution either but perhaps we can find some things in common..

Setup in a nutshell:
- Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate 
public/cluster network in 10 Gbps)
- iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 
(10 Gbps)
- Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)

Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 
replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.

I'm following the instructions here: 
http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices
No issues with creating and mapping a 100GB RBD image and then creating the 
target.

I'm interested in finding out the overhead/performance impact of re-exporting 
through iSCSI so the idea is to run benchmarks.
Here's a fio test I'm trying to run on the client node on the mounted iscsi 
device:
fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu --bs=1M 
--nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers 
--end_fsync=1 --iodepth=200 --ioengine=libaio

The benchmark will eventually hang towards the end of the test for some long 
seconds before completing.
On the proxy node, the kernel complains with iscsi portal login timeout: 
http://pastebin.com/Q49UnTPr and I also see irqbalance errors in syslog: 
http://pastebin.com/AiRTWDwR

Doing the same test on the machines directly (raw, rbd, on the osd filesystem) 
doesn't yield any issues.

I've tried a couple things to see if I could get things to work...
- Set irqbalance --hintpolicy=ignore (http://sourceforge.net/p/e1000/bugs/394/ 
& https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/1321425)
- Changed size on cache pool to 1 (for the sake of testing, improved 
performance but still hangs)
- Set crush tunables to legacy (and back to optimal)
- Various package and kernel versions and putting the proxy node on Ubuntu 
precise
- Formatting and mounting the iscsi block device and running the test on the 
formatted filesystem

I don't think it's related .. but I don't remember running into issues before 
I've swapped out SSDs for the NVME cards for the cache pool.
I don't have time *right now* but I definitely want to test if I am able to 
reproduce the issue on the SSDs..

Let me know if this gives you any ideas, I'm all ears.
--
David Moreau Simard

> On Oct 28, 2014, at 4:07 PM, Christopher Spearman <neromaver...@gmail.com> 
> wrote:
> 
> Sage:
> 
> That'd be my assumption, performance looked pretty fantastic over loop until 
> it started being used it heavily
> 
> Mike:
> 
> The configs you asked for are at the end of this message I've subtracted & 
> changed some info, iqn/wwn/portal, for security purposes. The raw & loop 
> target configs are all in one since I'm running both types of configs 
> currently. I also included the running config (ls /) of targetcli for anyone 
> interested in what it looks like from the console.
> 
> The tool I used was dd, I ran through various options using dd but didn't 
> really see much difference. The one on top is my go to command for my first 
> test
> 
> time dd if=/dev/zero of=test bs=32M count=32 oflag=direct,sync
> time dd if=/dev/zero of=test bs=32M count=128 oflag=direct,sync
> time dd if=/dev/zero of=test bs=8M count=512 oflag=direct,sync
> time dd if=/dev/zero of=test bs=4M count=1024 oflag=direct,sync
> 
> 
> ---ls / from current targetcli (no mounted ext4 -> image file config)---
> 
> /iscsi> ls /
> o- / 
> .........................................................................................................................
>  [...]
>   o- backstores 
> ..............................................................................................................
>  [...]
>   | o- block 
> ..................................................................................................
>  [Storage Objects: 2]
>   | | o- ceph_lun0 
> ...................................................................... 
> [/dev/loop0 (2.0TiB) write-thru activated]
>   | | o- ceph_noloop00 .............................................. 
> [/dev/rbd/vmiscsi/noloop00 (1.0TiB) write-thru activated]
>   | o- fileio 
> .................................................................................................
>  [Storage Objects: 0]
>   | o- pscsi 
> ..................................................................................................
>  [Storage Objects: 0]
>   | o- ramdisk 
> ................................................................................................
>  [Storage Objects: 0]
>   o- iscsi 
> ............................................................................................................
>  [Targets: 2]
>   | o- iqn.gateway2_01 ..................................................... 
> [TPGs: 1]
>   | | o- tpg1 
> ...............................................................................................
>  [no-gen-acls, no-auth]
>   | |   o- acls 
> ..........................................................................................................
>  [ACLs: 2]
>   | |   | o- iqn.esxhost01 
> ............................................................ [Mapped LUNs: 1]
>   | |   | | o- mapped_lun0 
> ......................................................................... 
> [lun0 block/ceph_noloop00 (rw)]
>   | |   | o- iqn.esxhost02 
> ....................................................... [Mapped LUNs: 1]
>   | |   |   o- mapped_lun0 
> ......................................................................... 
> [lun0 block/ceph_noloop00 (rw)]
>   | |   o- luns 
> ..........................................................................................................
>  [LUNs: 1]
>   | |   | o- lun0 ........................................................... 
> [block/ceph_noloop00 (/dev/rbd/vmiscsi/noloop00)]
>   | |   o- portals 
> ....................................................................................................
>  [Portals: 1]
>   | |     o- xxx.xxx.xxx.xxx:3260 
> ...............................................................................................
>  [OK]
>   | o- iqn.gateway2_02 ..................................................... 
> [TPGs: 1]
>   |   o- tpg1 
> ...............................................................................................
>  [no-gen-acls, no-auth]
>   |     o- acls 
> ..........................................................................................................
>  [ACLs: 2]
>   |     | o- iqn.esxhost01 
> ............................................................ [Mapped LUNs: 1]
>   |     | | o- mapped_lun0 
> ............................................................................. 
> [lun0 block/ceph_lun0 (rw)]
>   |     | o- iqn.esxhost02 
> ............................................................ [Mapped LUNs: 1]
>   |     | | o- mapped_lun0 
> ............................................................................. 
> [lun0 block/ceph_lun0 (rw)]
>   |     o- luns 
> ..........................................................................................................
>  [LUNs: 1]
>   |     | o- lun0 
> ...................................................................................
>  [block/ceph_lun0 (/dev/loop0)]
>   |     o- portals 
> ....................................................................................................
>  [Portals: 1]
>   |       o- xxx.xxx.xxx.xxx:3260 
> ...............................................................................................
>  [OK]
>   o- loopback 
> .........................................................................................................
>  [Targets: 0]
> 
> ---saveconfig.json for mounted ext4 config---
> 
> {
>   "fabric_modules": [], 
>   "storage_objects": [
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 1, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 1, 
>         "max_unmap_lba_count": 8192, 
>         "max_write_same_len": 4096, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 1, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/mnt/ceph_perf_test/mounted_rbd_img_test.img", 
>       "name": "mounted_rbd_img_test", 
>       "plugin": "fileio", 
>       "size": 8589934592, 
>       "write_back": true, 
>       "wwn": "xxxx-xxxx-xxxx-xxxx"
>     }
>   ], 
>   "targets": [
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/fileio/mounted_rbd_img_test"
>             } 
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 } 
>               ], 
>               "node_wwn": "iqn.centoshost01"
>             } 
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway1_01"
>     }, 
>   ]
> }
> 
> ---saveconfig.json for raw & loop targets---
> 
> {
>   "fabric_modules": [], 
>   "storage_objects": [
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 0, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 0, 
>         "max_unmap_lba_count": 0, 
>         "max_write_same_len": 65535, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 0, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/dev/rbd/vmiscsi/noloop00", 
>       "name": "ceph_noloop00", 
>       "plugin": "block", 
>       "readonly": false, 
>       "write_back": false, 
>       "wwn": "xxxx-xxxx-xxxx"
>     }, 
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 0, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 0, 
>         "max_unmap_lba_count": 0, 
>         "max_write_same_len": 65535, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 0, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/dev/loop0", 
>       "name": "ceph_lun0", 
>       "plugin": "block", 
>       "readonly": false, 
>       "write_back": false, 
>       "wwn": "yyyy-yyyy-yyyy"
>     }, 
>   ], 
>   "targets": [
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/block/ceph_noloop00"
>             }
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost01"
>             }, 
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost02"
>             }
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway2_01"
>     }, 
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/block/ceph_lun0"
>             }
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [ 
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost01"
>             }, 
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost02"
>             }, 
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway2_02"
>     }
>   ]
> }
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to