Thanks Sungwoo Park for sharing the details. I'm forwarding this to hdfs-dev@. I haven't had the chance to review the details in the ticket yet, but if you can reproduce the issue, I recommend creating an HDFS ticket and marking it as a blocker with the target versions set to the upcoming releases. (3.4.2 & 3.5.0, If I am not mistaken)
-Ayush On Mon, 3 Feb 2025 at 08:20, Sungwoo Park <c...@pl.postech.ac.kr> wrote: > > I reported a bug in ZStandardCodec in the Hadoop library. If you run Hive with > ZStandard compression for Tez intermediate data, you might be affected by this > bug. > > https://issues.apache.org/jira/browse/HDFS-14099 > > The problem occurs when the input file is large (e.g., 25MB) and does not > compress well. (The zstd native library is fine, as zstd successfully > compresses > and restores the same input data.) I think it rarely occurs in practice, but > we > ran into this problem when testing with 10TB TPC-DS data. A sample query for > reproducing the problem is (with hive.execution.mode=llap): > > select /*+ semi(store_returns, sr_ticket_number, store_sales, 34171240) */ > ss_quantity > from store_sales, store_returns > where ss_ticket_number = sr_ticket_number and > sr_returned_date_sk between 2451789 and 2451818 > limit 100; > > > --- Sungwoo --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org