vbarua commented on PR #13888: URL: https://github.com/apache/datafusion/pull/13888#issuecomment-2560209539
There are already some tests for TPCH functionality in https://github.com/apache/datafusion/blob/main/datafusion/substrait/tests/cases/consumer_integration.rs, but IMO those are weaker than what you're proposing because they only verify that we can read the TPCH plans that have been generated in https://github.com/substrait-io/consumer-testing/tree/7c1f5f1876f00c2685f722b592dbd00030662d5d/substrait_consumer/testdata/integration/tpch The full roundtrip you're proposing is more comprehensive and lets us find bugs in both consumer and producer. > In most cases it looks like we need to support more of the LogicalPlan nodes when serializing to substrait. It looks like your approach has already found some missing features 🐛 🔨 > After this the recreated plan should be the same In many cases plans roundtrip the same, but not necessarily in every case. DataFusion might consume a plan and then emit a plan that is semantically equivalent but has different nodes. For example, all emit kind remaps might be converted into projections. It is a desirable property that plans roundtrip the same, and probably something we can work towards in most cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org