Thanks Sungwoo Park for sharing the details. I'm forwarding this to
hdfs-dev@. I haven't had the chance to review the details in the
ticket yet, but if you can reproduce the issue, I recommend creating
an HDFS ticket and marking it as a blocker with the target versions
set to the upcoming releases. (3.4.2 & 3.5.0, If I am not mistaken)

-Ayush


On Mon, 3 Feb 2025 at 08:20, Sungwoo Park <c...@pl.postech.ac.kr> wrote:
>
> I reported a bug in ZStandardCodec in the Hadoop library. If you run Hive with
> ZStandard compression for Tez intermediate data, you might be affected by this
> bug.
>
> https://issues.apache.org/jira/browse/HDFS-14099
>
> The problem occurs when the input file is large (e.g., 25MB) and does not
> compress well. (The zstd native library is fine, as zstd successfully 
> compresses
> and restores the same input data.) I think it rarely occurs in practice, but 
> we
> ran into this problem when testing with 10TB TPC-DS data. A sample query for
> reproducing the problem is (with hive.execution.mode=llap):
>
> select /*+ semi(store_returns, sr_ticket_number, store_sales,  34171240)  */
>    ss_quantity
> from store_sales, store_returns
> where ss_ticket_number = sr_ticket_number and
> sr_returned_date_sk between 2451789 and 2451818
> limit 100;
>
>
> --- Sungwoo

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to