tustvold commented on issue #4658: URL: https://github.com/apache/datafusion/issues/4658#issuecomment-2679171994
> I assume by row format you mean [arrow-row](https://arrow.apache.org/rust/arrow_row/index.html), however it's not clear to me if there's a standard way to serialize these to a file. I could create something simple which just writes the streams of rows as binary (probably a length then the bytes, repeated?). I suspect this is what I was going for, although it was 2 years ago so can't confess to really remembering and a lot has likely changed since then. > It looks like an alternative to using the row format in datafusion might be to support delta-encoded dictionaries in arrow-rs. https://github.com/apache/arrow-rs/issues/6783 This is likely an approach, although it would involve re-encoding the dictionaries as there is no mechanism to remap an existing key. IMO I'd be tempted to start a discussion on the mailing list about why this constraint exists, it seems somewhat nonsensical to me given that the footer identifies where the dictionary blocks are, and therefore it is trivial to determine which dictionary to use. If anything the support for delta dictionaries is more surprising... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org