I haven't looked at the surrounding code. Are we processing all the COPY data in one long stream or processing each field individually? If we're processing much more than 128 bits and happy to detect NUL errors only at the end after wasting some work then you could hoist that has_zero check entirely out of the loop (removing the branch though it's probably a correctly predicted branch anyways).
Do something like: zero_accumulator = zero_accumulator & next_chunk in the loop and then only at the very end check for zeros in that.