Thanks for the reply, please find my responses inline.
-Kaustubh -----Original Message----- From: Rick Jones [mailto:rick.jon...@hpe.com] Sent: Tuesday, March 29, 2016 1:43 PM To: openstack@lists.openstack.org Subject: Re: [Openstack] Compute downloading corrupted image from Glance On 03/29/2016 10:17 AM, Kaustubh Kelkar wrote: > Every time I tried to download the image on the compute, I get a new > hash value (albeit, a wrong one). On the compute node, what is the type of NIC and its driver and such? [Kaustubh] It is an Intel X710 NIC with i40e driver. The NIC is part of the integrated card on a Dell R730. lscpi -v | grep -A 1 Ethernet [Kaustubh] (Output redacted to show only the relevant interface) 01:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X710 Adapter (rev 01) Subsystem: Dell Device 0000 ethtool -i <interfacename> [Kaustubh] root@dchi:/home/kkelkar# ethtool -i em2 driver: i40e version: 1.4.25 firmware-version: 4.41 0x80001863 16.5.20 bus-info: 0000:01:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes And are any of the stateless offloads enabled? ethtool -k <interfacename> [Kaustubh] root@dchi:/home/kkelkar# ethtool -k em2 Features for em2: rx-checksumming: off tx-checksumming: off tx-checksum-ipv4: off tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: off tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: off tx-tcp-segmentation: off tx-tcp-ecn-segmentation: off tx-tcp6-segmentation: off udp-fragmentation-offload: off [fixed] generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: off highdma: on rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: on fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] Those would include checksum offload, and things built on top of it like TSO, GSO, LRO and/or GRO. If you find that checksum offload is enabled, and you disable it, does the corrupt image download problem go away? If so, you have a problem with your NIC and/or its driver getting the offloads wrong and/or corrupting the traffic in a place outside the protection of the offloaded checksuming. One of the central assumptions with the likes of checksum offload in a NIC is that anything "above" the checksum offload in the NIC has some sort of data protection - at least parity, if not ECC. This includes components in the NIC itself, the I/O bus etc etc. If disabling checksum offload on the compute node doesn't resolve the matter, you might consider the same on the controller. [Kaustubh] I ended up disabling checksumming, TSO, GSO and GRO on both controller and the compute so the ethtool output looks as above. Now, the problem can only be reproduced intermittently. At times, compute node still gets a corrupted image. rick jones (disabling checksum offload will likely also disable the offloads which depend upon it.) _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack