Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-21 Thread via GitHub
Spaarsh closed pull request #14129: Making the data_imdb and clickbench_1 functions atomic. URL: https://github.com/apache/datafusion/pull/14129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-21 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2605437829 The suggested changes have been implemented and have been committed via another PR #14225. Hence I am closing this PR. -- This is an automated message from the Apache Git Service.

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599713336 > > But if we go for the static size, the solution is pretty straightforward. Should I go forward with it then? > > That is my suggestion. I agree it won't cover all the cases

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
alamb commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599712773 BTW thank you very much for working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
alamb commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599712739 > But if we go for the static size, the solution is pretty straightforward. Should I go forward with it then? That is my suggestion. I agree it won't cover all the cases but I th

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599702209 The special handling for 404 was for the case if we went for the dynamic size thing😅. My only concern is that we've already faced one issue due the website owners changing the url. I

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
alamb commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599696694 > @alamb your suggestions do make sense. Perhaps all that is needed is a size checking function. My addition would be to dynamically check for the expected size of the file to be trans

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-18 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2599629191 @alamb your suggestions do make sense. Perhaps all that is needed is a size checking function. My addition would be to dynamically check for the expected size of the file to be trans

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-17 Thread via GitHub
alamb commented on code in PR #14129: URL: https://github.com/apache/datafusion/pull/14129#discussion_r1920568890 ## benchmarks/bench.sh: ## @@ -401,9 +405,14 @@ data_clickbench_1() { else URL="https://datasets.clickhouse.com/hits_compatible/hits.parquet";

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-14 Thread via GitHub
2010YOUY01 commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2591584028 > Even I was thinking the same! But I wanted to ensure first that the same directory isn't being used to store other files. I think the convention followed now is that the

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-14 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2591568792 Even I was thinking the same! But I wanted to ensure first that the same directory isn't being used to store other files. Since I would be using the ```rm -rf``` command here, that