zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2987414899
Thank you @alamb, I am excited to update today that i resolve the page index conflicts by adding new API in arrow-rs which can write bytes to the buf, and it can make the buf-wirtten metrics consistent, and the buf-wirtten will be used by page index also, so it's safe now, and i enable the page index now for the example, the testing result is good! I am currently using this arrow-rs branch before the code merge: https://github.com/apache/arrow-rs/pull/7714 The example print logs, it's good, thanks! ```rust Writing values: [ByteArray { data: "foo" }, ByteArray { data: "bar" }, ByteArray { data: "foo" }] Writing custom index at offset: 68, length: 7 Finished writing file to /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/a.parquet Writing values: [ByteArray { data: "baz" }, ByteArray { data: "qux" }] Writing custom index at offset: 68, length: 7 Finished writing file to /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/b.parquet Writing values: [ByteArray { data: "foo" }, ByteArray { data: "quux" }, ByteArray { data: "quux" }] Writing custom index at offset: 70, length: 8 Finished writing file to /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/c.parquet Reading index from /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/a.parquet (size: 363) Reading index at offset: 68, length: 7 Read distinct index for a.parquet: "a.parquet" Reading index from /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/b.parquet (size: 363) Reading index at offset: 68, length: 7 Read distinct index for b.parquet: "b.parquet" Reading index from /var/folders/q7/zjtv8rvx2hz0_t_rjjq8p9k00000gp/T/.tmp9zCIJt/c.parquet (size: 368) Reading index at offset: 70, length: 8 Read distinct index for c.parquet: "c.parquet" Filtering for category: foo Pruned files: ["c.parquet", "a.parquet"] +----------+ | category | +----------+ | foo | | foo | | foo | +----------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org