I agree with independent validation.
On Tue, Feb 25, 2020 at 2:55 PM David Li <[email protected]> wrote: > > Hey Bryan, > > Thanks for looking into this issue. I would vote that we should > validate each batch independently, so we can catch issues related to > the structure of the data and not just the content. C++ doesn't do any > detection of empty batches per se, but on both ends it reads all the > data into a table, which would eliminate any empty batches. > > It also wouldn't be reasonable to stop sending batches that are empty, > because Flight lets you attach metadata to batches, and so an empty > batch might still have metadata that the client or server wants. > > Best, > David > > On 2/24/20, Bryan Cutler <[email protected]> wrote: > > While looking into Null type testing for ARROW-7899, a couple small issues > > came up regarding Flight integration testing with empty batches (row count > > == 0) that could be worked out with a quick discussion. It seems there is a > > small difference between the C++ and Java Flight servers when there are > > empty record batches at the end of a stream, more details in PR > > https://github.com/apache/arrow/pull/6476. > > > > The Java server sends all record batches, even the empty ones, and the test > > client verifies each of these batches matches the batches read from a JSON > > file. The C++ servers seems to recognize if the end of the stream is only > > empty batches (please correct me if I'm wrong) and will not serve them. > > This seems reasonable, as there is no more actual data left in the stream. > > The C++ test client reads all batches into a table, does the same for the > > JSON file, and compares final Tables. I also noticed that empty batches in > > the middle of the stream will be served. My questions are: > > > > 1) What is the expected behavior of a Flight server for empty record > > batches, can they be ignored and not sent to the Client? > > > > 2) Is it good enough to test against a final concatenation of all batches > > in the stream or should each batch be verified individually to ensure the > > server is sending out correctly batched data? > > > > Thanks, > > Bryan > >
