alamb commented on PR #16430: URL: https://github.com/apache/datafusion/pull/16430#issuecomment-2985661485
I tried making a reproducer but I could not reproduce the wrong results or panic reported in @andygrove 's comment https://github.com/apache/datafusion/issues/16308#issuecomment-2949516445: Here is what I tried: Data: [tenk.csv](https://github.com/user-attachments/files/20804065/tenk.csv) Repro ```sql create external table tenk1 ( unique1 int, unique2 int, two int, four int, ten int, twenty int, hundred int, thousand int, twothousand int, fivethous int, tenthous int, odd int, even int, stringu1 string, stringu2 string, string4 string ) stored as CSV location 'tenk.csv' OPTIONS('has_header' 'false','format.delimiter' 9); SELECT * from tenk1 limit 10; SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10 ``` But that seems to work just fine: ```sql (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ datafusion-cli -f repro.sql DataFusion CLI v48.0.0 0 row(s) fetched. Elapsed 0.001 seconds. +---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+ | unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 | +---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+ | 8800 | 0 | 0 | 0 | 0 | 0 | 0 | 800 | 800 | 3800 | 8800 | 0 | 1 | MAAAAA | AAAAAA | AAAAxx | | 1891 | 1 | 1 | 3 | 1 | 11 | 91 | 891 | 1891 | 1891 | 1891 | 182 | 183 | TUAAAA | BAAAAA | HHHHxx | | 3420 | 2 | 0 | 0 | 0 | 0 | 20 | 420 | 1420 | 3420 | 3420 | 40 | 41 | OBAAAA | CAAAAA | OOOOxx | | 9850 | 3 | 0 | 2 | 0 | 10 | 50 | 850 | 1850 | 4850 | 9850 | 100 | 101 | WOAAAA | DAAAAA | VVVVxx | | 7164 | 4 | 0 | 0 | 4 | 4 | 64 | 164 | 1164 | 2164 | 7164 | 128 | 129 | OPAAAA | EAAAAA | AAAAxx | | 8009 | 5 | 1 | 1 | 9 | 9 | 9 | 9 | 9 | 3009 | 8009 | 18 | 19 | BWAAAA | FAAAAA | HHHHxx | | 5057 | 6 | 1 | 1 | 7 | 17 | 57 | 57 | 1057 | 57 | 5057 | 114 | 115 | NMAAAA | GAAAAA | OOOOxx | | 6701 | 7 | 1 | 1 | 1 | 1 | 1 | 701 | 701 | 1701 | 6701 | 2 | 3 | TXAAAA | HAAAAA | VVVVxx | | 4321 | 8 | 1 | 1 | 1 | 1 | 21 | 321 | 321 | 4321 | 4321 | 42 | 43 | FKAAAA | IAAAAA | AAAAxx | | 3043 | 9 | 1 | 3 | 3 | 3 | 43 | 43 | 1043 | 3043 | 3043 | 86 | 87 | BNAAAA | JAAAAA | HHHHxx | +---------+---------+-----+------+-----+--------+---------+----------+-------------+-----------+----------+-----+------+----------+----------+---------+ 10 row(s) fetched. Elapsed 0.007 seconds. +-------------------------------------------------------------------+ | count(*) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | +-------------------------------------------------------------------+ | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | | 10 | +-------------------------------------------------------------------+ 10 row(s) fetched. Elapsed 0.004 seconds. ``` Notes for myself of where this came from: https://github.com/apache/spark/blob/a38d1cef73eda8ab765dc168284b9c113c237a8e/sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql#L50 ```sql SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10 ``` I did some digging and found the table definition is https://github.com/apache/spark/blob/a38d1cef73eda8ab765dc168284b9c113c237a8e/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L536-L562 ``` session .read .format("csv") .options(Map("delimiter" -> "\t", "header" -> "false")) .schema( """ |unique1 int, |unique2 int, |two int, |four int, |ten int, |twenty int, |hundred int, |thousand int, |twothousand int, |fivethous int, |tenthous int, |odd int, |even int, |stringu1 string, |stringu2 string, |string4 string """.stripMargin) .load(testFile("test-data/postgresql/onek.data")) ``` The data is here: https://github.com/apache/spark/blob/a38d1cef73eda8ab765dc168284b9c113c237a8e/sql/core/src/test/resources/test-data/postgresql/tenk.data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org