Hi Jean-Luc, I was able to run your code successfully on my machine but I found it used considerably more memory (~30GB) and took longer to execute (~30s) than expected. Could you please file a JIRA ticket [1] and someone can look into it? The docs have a bug report guide [2] which might be helpful. The discrepancy in behavior between letting arrow handle the join versus DuckDB isn't ideal and can be investigated in the ticket.
Thanks, Bryce [1] https://issues.apache.org/jira/projects/ARROW/issues [2] https://arrow.apache.org/docs/developers/bug_reports.html#bug-reports