Looking at the code, is it possible that not the guest is causing trouble here, but multiwrite_merge code?
>From what I see the only limit it has when merging requests is the number of >IOVs. Any thoughts? Mine are: a) Introducing bs->bl.max_request_size and set merge = 0 if the result would be too big. Default max request size to 32768 sectors (see below). b) Hardcoding the limit in multiwrite_merge for now limiting the merged size to 16MB (32768 sectors). Which is the limit we already use in bdrv_co_discard and bdrv_co_write_zeroes if we don't know better. Peter Am 02.09.2014 um 17:28 schrieb ronnie sahlberg: > That is one big request. I assume the device reports "no limit" in > the VPD page so we can not state it is the guest/application going > beyond the allowed limit? > > > I am not entirely sure what meaning the target assigns to Protocol > Error means here. > It could be that ~100M is way higher than MaxBurstLength ? What is > the MaxBurstLength that was reported by the server during login > negotiation? > If so, we should make libiscsi check the maxburstlength and fail the > request early. We would still fail the I/O so it will not really solve > anything much > but at least we should not send the request to the server. > > Best would probably be to take the smallest of a non-zero > Block-Limits.max_transfer_length and iscsi-MaxBurstLength/block-size > and pass this back to the guest in the emulated Block-Limits-VPD. > At least then you have tried to tell the guest "never do SCSI I/O > bigger than this". > > I.e. even if the target reports BlockLimits.MaxTransferLength == 0 == > no limit to QEMU, QEMU should probably take the iscsi transport limit > into account and pass this to the guest > by setting the emulated BlockLimits page it passes to scale to the > maximum that MaxBurstLength allows. > > > Then if BTRFS or SG_IO in the guest ignores the BlockLimits it is > clearly a guest problem. > > (A different interpretation for ProtocolError could be the mismatch > between the iscsi expected data transfer length and the scsi transfer > length, but that should result in residuals, not protocol error.) > > > > Hypothetically there could be targets that support really huge > MaxBurstLengths > 32MB. For those you probably want to switch to > WRITE16 when the SCSI transfer length goes > 0xffff. > > - if (iscsilun->use_16_for_rw) { > + if (iscsilun->use_16_for_rw || num_sectors > 0xffff) { > > > regards > ronnie sahlberg > > On Mon, Sep 1, 2014 at 8:21 AM, Peter Lieven <p...@kamp.de> wrote: >> On 17.06.2014 13:46, Paolo Bonzini wrote: >> >> Il 17/06/2014 13:37, Peter Lieven ha scritto: >> >> On 17.06.2014 13:15, Paolo Bonzini wrote: >> >> Il 17/06/2014 08:14, Peter Lieven ha scritto: >> >> >> >> BTW, while debugging a case with a bigger storage supplier I found >> that open-iscsi seems to do exactly this undeterministic behaviour. >> I have a 3TB LUN. If I access < 2TB sectors it uses READ10/WRITE10 and >> if I go beyond 2TB it changes to READ16/WRITE16. >> >> >> Isn't that exactly what your latest patch does for >64K sector writes? :) >> >> >> Not exactly, we choose the default by checking the LUN size. 10 Byte for >> < 2TB and 16 Byte otherwise. >> >> >> Yeah, I meant introducing the non-determinism. >> >> My latest patch makes an exception if a request is bigger than 64K >> sectors and >> switches to 16 Byte requests. These would otherwise end in an I/O error. >> >> >> It could also be split at the block layer, like we do for unmap. I think >> there's also a maximum transfer size somewhere in the VPD, we could to >> READ16/WRITE16 if it is >64K sectors. >> >> >> It seems that there might be a real world example where Linux issues >32MB >> write requests. Maybe someone familiar with btrfs can advise. >> I see iSCSI Protocol Errors in my logs: >> >> Sep 1 10:10:14 libiscsi:0 PDU header: 01 a1 00 00 00 01 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 07 06 8f 30 00 00 00 00 06 00 00 00 0a 2a 00 01 09 9e >> 50 00 47 98 00 00 00 00 00 00 00 [XXX] >> Sep 1 10:10:14 qemu-2.0.0: iSCSI: Failed to write10 data to iSCSI lun. >> Request was rejected with reason: 0x04 (Protocol Error) >> >> Looking at the headers the xferlen in the iSCSI PDU is 110047232 Byte which >> is 214936 sectors. >> 214936 % 65536 = 18328 which is exactly the number of blocks in the SCSI >> WRITE10 CDB. >> >> Can someone advise if this is something that btrfs can cause >> or if I have to >> blame the customer that he issues very big write requests with Direct I/O? >> >> The user sseems something like this in the log: >> [34640.489284] BTRFS: bdev /dev/vda2 errs: wr 8232, rd 0, flush 0, corrupt >> 0, gen 0 >> [34640.490379] end_request: I/O error, dev vda, sector 17446880 >> [34640.491251] end_request: I/O error, dev vda, sector 5150144 >> [34640.491290] end_request: I/O error, dev vda, sector 17472080 >> [34640.492201] end_request: I/O error, dev vda, sector 17523488 >> [34640.492201] end_request: I/O error, dev vda, sector 17536592 >> [34640.492201] end_request: I/O error, dev vda, sector 17599088 >> [34640.492201] end_request: I/O error, dev vda, sector 17601104 >> [34640.685611] end_request: I/O error, dev vda, sector 15495456 >> [34640.685650] end_request: I/O error, dev vda, sector 7138216 >> >> Thanks, >> Peter >>