kosiew opened a new issue, #20007: URL: https://github.com/apache/datafusion/issues/20007
### Problem ClickBench setup knowledge is currently scattered across multiple locations: 1. `HITS_VIEW_DDL` constant in `benchmarks/src/clickbench.rs` with inline comments 2. View creation SQL in `datafusion/sqllogictest/test_files/clickbench.slt` 3. Brief mention in `benchmarks/README.md` (without critical setup details) This makes it difficult for users to understand: - Why the EventDate column needs special handling - When and why to use the `binary_as_string` option - How to set up ClickBench correctly for DataFusion ### Background Related to #19881. The fix introduces a view that transforms EventDate from UInt16 (days since epoch) to proper DATE type. However, the knowledge needed to run ClickBench effectively is duplicated across files. "I worry that we are spreading the knowledge needed to run DataFusion on ClickBench effectively all over the place. For example, this view definition is now copied twice." - [PR comment](https://github.com/apache/datafusion/pull/19881#discussion_r2725546501) ### Proposed Solution Add comprehensive documentation to the existing ClickBench section in `benchmarks/README.md` that serves as the single source of truth. This documentation should cover: 1. **EventDate UInt16 → DATE transformation** - Why it's needed and how it works 2. **binary_as_string option** - When and why it's required 3. **Complete setup example** - Copy-pasteable SQL showing the full setup 4. **Clarifications** - Differences between full dataset and test subsets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
