On Sat, Dec 06, 2014 at 11:51:56AM +0200, Saku Ytti wrote: > a) one particular optic had slow i2c, vendor polled it more aggressively than > it could respond. Vendor polling code didn't handle errors reading from i2c, > but instead crashed whole linecard control-plane. > Vendor claimed it's not bug, because it didn't happen on their optic. I tried > to explain to them, they cannot guarantee that I2C reads won't fail on their > own optics, and it's serious problem, but was unable to convince them to fix > it. > Now I am in possession of good bunch of SFP I can stick to your routers in > colo, have them crash, and you won't have any clue why they crashed. > > b) particular vendor had bug in their SFP microcontroller where after 2**31 > 1/100 of a seconds had passed, it started to write its uptime to a location > where DDM temperature measurements are read. This was obvious from graphs, > because it went linearily from -127 ... 127, then jumped back to -127. > These optics when seated on Vendor1 caused no problems, when seated on Vendor2 > they caused link flapping, even two boxes away! (A-B-C, A having problematic > optic, B-C might flap). Coincidentally Vendor2 is same as in case a), they > didn't consider this was bug in their code. > This was particularly funny, if you rebooted 100 boxes in a maintenance > window, then the bug would trigger at same moment after 2**31 1/100th of a > second, causing potentially major outage.
Who is Vendor2?