Re: [I] [SUPPORT] Hudi 1.0.2 SQL Global_Bloom index not returning newly inserted data [hudi]

2025-08-06 Thread via GitHub
rangareddy commented on issue #13680: URL: https://github.com/apache/hudi/issues/13680#issuecomment-3162485322 Hi @mansipp Thanks for raising this issue; I've successfully reproduced it internally using the Hudi Spark Bundle version 1.0.2. -- This is an automated message from the

Re: [I] [SUPPORT] Hudi 1.0.2 SQL Global_Bloom index not returning newly inserted data [hudi]

2025-08-06 Thread via GitHub
linliu-code commented on issue #13680: URL: https://github.com/apache/hudi/issues/13680#issuecomment-3160388655 @mansipp , I followed your scrip but using the vanilla hudi 1.0.2, which shows the newly added data correctly. `export SPARK_VERSION=3.5 # or 3.4, 3.3 spark-shell --pack

Re: [I] [SUPPORT] Hudi 1.0.2 SQL Global_Bloom index not returning newly inserted data [hudi]

2025-08-05 Thread via GitHub
mansipp commented on issue #13680: URL: https://github.com/apache/hudi/issues/13680#issuecomment-3156297407 cc: @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] [SUPPORT] Hudi 1.0.2 SQL Global_Bloom index not returning newly inserted data [hudi]

2025-08-05 Thread via GitHub
mansipp opened a new issue, #13680: URL: https://github.com/apache/hudi/issues/13680 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr

Re: [I] [SUPPORT] Hudi write taking more time for one partition in AWS glue occasionally [hudi]

2025-07-30 Thread via GitHub
eb280 commented on issue #12685: URL: https://github.com/apache/hudi/issues/12685#issuecomment-3138414589 its a data skew, try bucket index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] [SUPPORT] Hudi COW not honouring file size config [hudi]

2025-07-30 Thread via GitHub
eb280 commented on issue #13303: URL: https://github.com/apache/hudi/issues/13303#issuecomment-3138410091 yea this seems like a bug, why cant the file size be modified once a hudi table has been created, should at a minimum be reconfigured via compaction process. -- This is an automated

Re: [I] [SUPPORT] [hudi]

2025-07-30 Thread via GitHub
eb280 commented on issue #13513: URL: https://github.com/apache/hudi/issues/13513#issuecomment-3138406558 how frequently are compactions scheduled, and how are CDC reads orchestrated in your pipeline? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-07-15 Thread via GitHub
manaskiran commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-3077141155 hi @danny0405 @ad1happy2go, even when we tried to switch it execution engine as MR i was giving us the higher count -- This is an automated message from the Apache Git Service. T

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-07-08 Thread via GitHub
manaskiran commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-3051327446 Hi @danny0405, till hudi 0.14.1 with tez 0.10.4 was giving us the correct count, the issue was we were having issue with the compaction so that's why we have updated to hudi 1.0.1

Re: [I] [SUPPORT] [hudi]

2025-07-03 Thread via GitHub
ad1happy2go commented on issue #13513: URL: https://github.com/apache/hudi/issues/13513#issuecomment-3031993282 @manhhuuha Yes that is correct, unfortunately hudi doesn't have anything like that. For such usecases, you need to increase the commit retained to high number only. -- This is

Re: [I] [SUPPORT] [hudi]

2025-07-02 Thread via GitHub
manhhuuha commented on issue #13513: URL: https://github.com/apache/hudi/issues/13513#issuecomment-3027935693 > you can set up the clean retain commits to make the data files TTL longer. Yes, but that means if I don't continuously listen to CDC, I won't be able to know how the data ha

Re: [I] [SUPPORT] [hudi]

2025-07-02 Thread via GitHub
danny0405 commented on issue #13513: URL: https://github.com/apache/hudi/issues/13513#issuecomment-3027743418 you can set up the clean retain commits to make the data files TTL longer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] [SUPPORT] [hudi]

2025-07-02 Thread via GitHub
manhhuuha opened a new issue, #13513: URL: https://github.com/apache/hudi/issues/13513 I have an issue when reading CDC from a Hudi MOR (Merge-On-Read) table. Even though I’ve configured the following options: ``` 'hoodie.table.cdc.enabled': 'true', 'hoodie.table.cdc.supplemen

Re: [I] [SUPPORT] Hudi 0.12.1: Hive query error after writing and syncing via Flink DataStream API [hudi]

2025-07-01 Thread via GitHub
danny0405 commented on issue #13500: URL: https://github.com/apache/hudi/issues/13500#issuecomment-3023713954 yes, the table needs a rewrite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] [SUPPORT] Hudi 0.12.1: Hive query error after writing and syncing via Flink DataStream API [hudi]

2025-07-01 Thread via GitHub
zhang-yue1 commented on issue #13500: URL: https://github.com/apache/hudi/issues/13500#issuecomment-3023215279 > 此处类似问题:[#6621](https://github.com/apache/hudi/issues/6621),应该已经解决。 thank you. I upgraded from Hudi 0.12.1 to 0.12.3, but it didn’t resolve my original problem and instead i

[I] [SUPPORT] [hudi]

2025-06-29 Thread via GitHub
ummadinagarjuna opened a new issue, #13502: URL: https://github.com/apache/hudi/issues/13502 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at de

Re: [I] [SUPPORT] Hudi 0.12.1: Hive query error after writing and syncing via Flink DataStream API [hudi]

2025-06-29 Thread via GitHub
danny0405 commented on issue #13500: URL: https://github.com/apache/hudi/issues/13500#issuecomment-3017606689 similiar issue here: https://github.com/apache/hudi/issues/6621, should already been resolved. -- This is an automated message from the Apache Git Service. To respond to the messa

[I] [SUPPORT] Hudi 0.12.1: Hive query error after writing and syncing via Flink DataStream API [hudi]

2025-06-29 Thread via GitHub
zhang-yue1 opened a new issue, #13500: URL: https://github.com/apache/hudi/issues/13500 Describe the problem you faced I’m using Hudi 0.12.1 together with Flink (DataStream API) and Hive 3.x. I write data from my Flink job into a Hudi table with Hive-sync enabled. The Flink job compl

Re: [I] [SUPPORT]HUDI ParquetDecodingException caused by gzip stream CRC failure [hudi]

2025-06-16 Thread via GitHub
ligou525 commented on issue #13359: URL: https://github.com/apache/hudi/issues/13359#issuecomment-2978823265 @rangareddy The program use debezium to read the cdc data, and then write to hudi by java. The write logic is simple, commit the insert/upsert/delete records separately. The task

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-06-13 Thread via GitHub
danny0405 commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2971999114 Hmm, there seems some issues for Hudi hive on tez, @ad1happy2go can you help to reproduce? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [SUPPORT] Hudi Sync tool is dependant on Hadoop 2.10.2 and Hadoop AWS 2.10.2. Need upgrade to newer versions like 3.3.4 [hudi]

2025-06-11 Thread via GitHub
alberttwong commented on issue #11850: URL: https://github.com/apache/hudi/issues/11850#issuecomment-2963306055 I believe you have to build a special jar or contact AWS support. Cc @ad1happy2go -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] [SUPPORT] Hudi Sync tool is dependant on Hadoop 2.10.2 and Hadoop AWS 2.10.2. Need upgrade to newer versions like 3.3.4 [hudi]

2025-06-10 Thread via GitHub
sayedabdallah commented on issue #11850: URL: https://github.com/apache/hudi/issues/11850#issuecomment-2961470661 Hi @alberttwong , I am getting a similar exception on EMR Serverless 7.5 which uses Apache Spark 3.5.2 which in turn uses hadoop 3.3.4, https://github.com/apache/spark/blob/v

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-06-10 Thread via GitHub
manaskiran commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2958720368 yes,previously we use spark as the execution engine recently we have updated our hadoop,hive,spark version to latest ones,in the newer versions spark has been deprecated as executi

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-06-10 Thread via GitHub
danny0405 commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2958659946 @manaskiran you also run Hive on tez? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-06-10 Thread via GitHub
manaskiran commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2957947315 even we were also facing the same issue please try find out the root cause try to fix the issue -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-06-03 Thread via GitHub
gowriGH commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2934352983 Hey guys, any update on this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] [SUPPORT]HUDI ParquetDecodingException caused by gzip stream CRC failure [hudi]

2025-05-30 Thread via GitHub
rangareddy commented on issue #13359: URL: https://github.com/apache/hudi/issues/13359#issuecomment-2922301656 Hi @ligou525 I don't believe gzip compression is causing this issue; it's more likely that data corruption has occurred. Could you please share sample reproducible code to

Re: [I] [SUPPORT]HUDI ParquetDecodingException caused by gzip stream CRC failure [hudi]

2025-05-29 Thread via GitHub
ligou525 commented on issue #13359: URL: https://github.com/apache/hudi/issues/13359#issuecomment-2919336917 @cshuo Thanks for your response! As you can see in my first post, pyarrow can not read the parquet file either. I guess, the column md5_text may be conflict with the gzip compression

Re: [I] [SUPPORT]HUDI ParquetDecodingException caused by gzip stream CRC failure [hudi]

2025-05-26 Thread via GitHub
cshuo commented on issue #13359: URL: https://github.com/apache/hudi/issues/13359#issuecomment-2911337035 @ligou525 is the parquet file corrupted? Can other tools read this file correctly, e.g, parquet-tools -- This is an automated message from the Apache Git Service. To respond to the me

[I] [SUPPORT]HUDI ParquetDecodingException caused by gzip stream CRC failure [hudi]

2025-05-26 Thread via GitHub
ligou525 opened a new issue, #13359: URL: https://github.com/apache/hudi/issues/13359 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

Re: [I] [SUPPORT] Hudi COW not honouring file size config [hudi]

2025-05-20 Thread via GitHub
logesr commented on issue #13303: URL: https://github.com/apache/hudi/issues/13303#issuecomment-2893746612 We initially had the config for 128 MB and then reduced it to 64 MB. Also, we could see the newly updated files are big compared to our config going upto 150 MB. Our table has 2

Re: [I] [SUPPORT] Hudi 1.0.1 compaction issue resolved , but facing new issue in hive on tez that so many duplicates are created for each row [hudi]

2025-05-16 Thread via GitHub
ad1happy2go commented on issue #13302: URL: https://github.com/apache/hudi/issues/13302#issuecomment-2887156520 We will try to reproduce this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [SUPPORT] Hudi COW not honouring file size config [hudi]

2025-05-16 Thread via GitHub
ad1happy2go commented on issue #13303: URL: https://github.com/apache/hudi/issues/13303#issuecomment-2887124653 @logesr This config doesn't reduce the file group size once created. Were you always tried to create 64 MB files from first run. Also how many columns are there in your tabl

Re: [I] [SUPPORT] HUDI MULTI TABLE DELTASTREMER INGESTION FAILED WITH NO WABLE CONFIG FOUND [hudi]

2025-05-16 Thread via GitHub
Machos65 closed issue #13245: [SUPPORT] HUDI MULTI TABLE DELTASTREMER INGESTION FAILED WITH NO WABLE CONFIG FOUND URL: https://github.com/apache/hudi/issues/13245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] [SUPPORT] HUDI MULTI TABLE DELTASTREMER INGESTION FAILED WITH NO WABLE CONFIG FOUND [hudi]

2025-05-11 Thread via GitHub
rangareddy commented on issue #13245: URL: https://github.com/apache/hudi/issues/13245#issuecomment-2870858440 Hi @Machos65 Is there any update on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-05-08 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2864220578 What looks like happens is this: https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java#L766 Then infers

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-05-07 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2859909913 I actually do use `logicalType: timestamp-millis` in both actual files. When I threw a debug point in, I saw that Hudi was able to see these files and their contents, but overwrite

Re: [I] [SUPPORT] HUDI MULTI TABLE DELTASTREMER INGESTION FAILED WITH NO WABLE CONFIG FOUND [hudi]

2025-05-04 Thread via GitHub
rangareddy commented on issue #13245: URL: https://github.com/apache/hudi/issues/13245#issuecomment-2849583621 Hi @Machos65, It looks like the Ingest Multiple configuration is wrong. Could you please recheck the configuration files? ```properties hoodie.streamer.ingestion.

[I] [SUPPORT] HUDI MULTI TABLE DELTASTREMER INGESTION FAILED WITH NO WABLE CONFIG FOUND [hudi]

2025-05-01 Thread via GitHub
Machos65 opened a new issue, #13245: URL: https://github.com/apache/hudi/issues/13245 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-05-01 Thread via GitHub
codope commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2845145442 As workaround maybe you could try giving the same logical type in source and target schema files. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-05-01 Thread via GitHub
codope commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2845085241 I can repro using the following script: ``` wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.12/0.15.0/hudi-utilities-bundle_2.12-0.15.0.jar mkdi

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-30 Thread via GitHub
codope commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2843898376 @rkwagner do you have `hoodie.schema.on.read.enable` enabled? From my understanding, the internal schema converter code gets exercised when this config is turned on. But, maybe this ch

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-29 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2839738516 As far as "how did this happen", it looks like the part of the `StreamSync` code which used to only come into play on Hudi 14 in the case where a user provided target schema wasn't p

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-29 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2839505456 I've also pasted this into Hudi Slack, but this is the line of code where the loss of differentiation between millis and micros occurs: https://github.com/apache/hudi/blob/master/

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-29 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2839461561 Worth noting this means the bug is current. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-29 Thread via GitHub
rkwagner commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2839460829 I think I've identified the source. There is a class in Hudi called Type.java. It has no concept in Timestamp of millis or micros, so all timestamps are always in micros if th

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-28 Thread via GitHub
rangareddy commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2837678606 Hi @rkwagner Could you please share sample reproducible code so I can investigate further? -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-28 Thread via GitHub
danny0405 commented on issue #13233: URL: https://github.com/apache/hudi/issues/13233#issuecomment-2837186311 can you share with us the data type in your SQL statement and also the field type in `.hoodie/hoodie.properties` create table property. -- This is an automated message from the Ap

[I] [SUPPORT] Hudi 0.15.0 Timestamp Write Precision Incorrect [hudi]

2025-04-28 Thread via GitHub
rkwagner opened a new issue, #13233: URL: https://github.com/apache/hudi/issues/13233 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-07 Thread via GitHub
gowriGH commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2782704004 hi @danny0405 . Its working fine for hudi 1.0.1. for hudi below 1 versions are only having this issue.. Thanks for the replies.. Iam closing this issue as 1 version is working for

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-07 Thread via GitHub
gowriGH closed issue #13072: [SUPPORT] Hudi Compaction Error Despite Using COW Table Type URL: https://github.com/apache/hudi/issues/13072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-04-05 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2731942489 Another user has same question and he is also trying to do the same. Attaching the slack link for reference. https://apache-hudi.slack.com/archives/C4D716NPQ/p174132107

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-05 Thread via GitHub
gowriGH commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2774698286 Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics; I checked all

[I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-04 Thread via GitHub
gowriGH opened a new issue, #13072: URL: https://github.com/apache/hudi/issues/13072 We are trying to create a Hudi table in HDFS using spark-submit, but we encounter the following error during compaction: [2025-03-11T01:01:27.119+] {ssh.py:478} INFO - Caused by: org.apache.h

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-04 Thread via GitHub
danny0405 commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2777977743 did you package the hudi bundle jar with hadoop2 as the dependency? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-03 Thread via GitHub
gowriGH commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2777670345 i have the same jar **hadoop-hdfs-client -3.4.1.jar** in hadoop/hdfs , spark/jars and hive/lib still facing same issue.. There is no such method like this "org.apache.hadoop.hdfs.cl

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-03 Thread via GitHub
danny0405 commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2774893742 hadoop-hdfs-client jar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] [SUPPORT] Hudi Compaction Error Despite Using COW Table Type [hudi]

2025-04-02 Thread via GitHub
danny0405 commented on issue #13072: URL: https://github.com/apache/hudi/issues/13072#issuecomment-2774563203 > Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics; Should be

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-28 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2733316995 @danny0405 there is no other failure logs are in JM. please find attached Dag screenshot. https://github.com/user-attachments/assets/52575874-4b20-40d5-b232-e5541661a68

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-26 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2739325285 You still need to specify the index type for the target delete table, currently the index type is kind of a write config, here is the logic for the index_bootstrap op: ```jav

Re: [I] [SUPPORT] Hudi Quickstart on EMR 7.6 with Hudi 1.0.1 not working [hudi]

2025-03-23 Thread via GitHub
ad1happy2go closed issue #12974: [SUPPORT] Hudi Quickstart on EMR 7.6 with Hudi 1.0.1 not working URL: https://github.com/apache/hudi/issues/12974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-21 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2744811799 Close it now, feel free to fire new ones it if there are other issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-21 Thread via GitHub
danny0405 closed issue #12988: [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate URL: https://github.com/apache/hudi/issues/12988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-21 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2743216536 Hi @danny0405 I tried your suggestion and implemented the index like ingestion job and it worked. So summary is : we will need to enable concurrency to delete the dat

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2739051363 I haven't enabled any config I have just added below ddl with required params. ``` CREATE TABLE IF NOT EXISTS hudi_temp(x STRING,_date STRING,_count BIGINT,type STRING,upd

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2739007933 @danny0405 can you please share the config for disable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2739035892 It is by default false, just check how you enable it should be ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2738701409 Then can you try to remove the bootstrap for the delete SQL? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735954996 Yes it is really small for ingestion job and also it is starting with 000-008 as we 8 bucket. Sample records: https://github.com/user-attachments/assets/897a4347-32f3-4

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735873169 @danny0405 can you please share the config? One more question about the ingestion job: Do we need to add the below config to the ingestion table as well? (I’m referring to

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735918136 > we are already using bucket index for ingestion jo Can you check the flink checkpoint size(bucket index has very little size there) and data file naming convention(to see if

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735721874 For delete statement, there is no need to enable the index_bootstrap which is designed only for regular ingestion, and for regular ingestion, using bucket index could get rid of the

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-19 Thread via GitHub
maheshguptags commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735674684 @danny0405 Here's another example where the job in index_bootstrap scans the entire table. The number of bytes sent by index_bootstrap is exactly the same as the data processed

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-18 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2735034204 I see the index_bootstrap op is busy in your case, we can eliminate it for the `DELETE` use case. -- This is an automated message from the Apache Git Service. To respond to the me

[I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-18 Thread via GitHub
maheshguptags opened a new issue, #12988: URL: https://github.com/apache/hudi/issues/12988 I am experiencing an issue when trying to delete records from a Hudi table where data is ingested using Flink streaming, and deletion is attempted using a Hudi batch processing job. Despite specif

Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-18 Thread via GitHub
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2732019149 Are there any other failures in the JM log? Can you also show me the Flink UI operator DAG? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [SUPPORT] Hudi Quickstart on EMR 7.6 with Hudi 1.0.1 not working [hudi]

2025-03-14 Thread via GitHub
alberttwong commented on issue #12974: URL: https://github.com/apache/hudi/issues/12974#issuecomment-2725068493 I can try it but if we need update the quickstart. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-14 Thread via GitHub
alberttwong commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2725066847 I can try it but if we need update the quickstart. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[I] [SUPPORT] [hudi]

2025-03-13 Thread via GitHub
alberttwong opened a new issue, #12974: URL: https://github.com/apache/hudi/issues/12974 **Describe the problem you faced** Following the quickstart at https://hudi.apache.org/docs/quick-start-guide/ on EMR 7.6. **To Reproduce** Steps to reproduce the behavior: `

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-13 Thread via GitHub
alberttwong commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2722276524 happens also with EMR 7.6. https://github.com/apache/hudi/issues/12974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
yihua commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2719491561 Likely, this is related to the EMR environment. Will check again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
yihua commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2719491732 cc @CTTY -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
alberttwong commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2719180052 ``` [ec2-user@ip-10-0-10-186 ~]$ ssh had...@ip-10-0-111-168.us-west-2.compute.internal The authenticity of host 'ip-10-0-111-168.us-west-2.compute.internal (10.0.111.168)' c

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
yihua commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2718968905 I verified that with OSS Spark 3.4.1, the script works to create a fresh new Hudi table successfully. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
yihua commented on issue #12963: URL: https://github.com/apache/hudi/issues/12963#issuecomment-2718925295 @alberttwong Have you checked if `/tmp/trips_table` does not exist before running the script? -- This is an automated message from the Apache Git Service. To respond to the message, p

[I] [SUPPORT] Hudi Quickstart on EMR 6.15 with Hudi 1.0.1 not working [hudi]

2025-03-12 Thread via GitHub
alberttwong opened a new issue, #12963: URL: https://github.com/apache/hudi/issues/12963 **Describe the problem you faced** Following the quickstart at https://hudi.apache.org/docs/quick-start-guide/ on EMR 6.15. **To Reproduce** Steps to reproduce the behavior:

Re: [I] [SUPPORT] Hudi Spark WEB UI [SQL / DataFrame] Details for Query, It does not display detailed indicator information [hudi]

2025-03-11 Thread via GitHub
pravin1406 commented on issue #9944: URL: https://github.com/apache/hudi/issues/9944#issuecomment-2710615498 Hi @watermelon12138 @danny0405 Was there any fix planned for this and has it landed ? Our SQL workflows are very complex and generally there is where we come to figure exact pain p

[I] [SUPPORT] [hudi]

2025-03-07 Thread via GitHub
jerryleooo opened a new issue, #12940: URL: https://github.com/apache/hudi/issues/12940 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? -> Seems the link is broken - Join the mailing list to engage in conversations and g

Re: [I] [SUPPORT] Hudi CloudWatchReporter error after 0.15.0 upgrade [hudi]

2025-03-03 Thread via GitHub
AlejandroIzuel commented on issue #12182: URL: https://github.com/apache/hudi/issues/12182#issuecomment-2693839637 I use HUDI 0.15.0 & Spark 3.5.2 & EMR 7.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [SUPPORT] Hudi CloudWatchReporter error after 0.15.0 upgrade [hudi]

2025-03-03 Thread via GitHub
kirillklimenko commented on issue #12182: URL: https://github.com/apache/hudi/issues/12182#issuecomment-2693797923 > This helped me. If you are using a spark-submit on the EMR Cluster, just add this > > `--jars /usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/hudi/hudi-aws-bundle.jar`

Re: [I] [SUPPORT] Hudi CloudWatchReporter error after 0.15.0 upgrade [hudi]

2025-03-03 Thread via GitHub
AlejandroIzuel commented on issue #12182: URL: https://github.com/apache/hudi/issues/12182#issuecomment-2693776588 This helped me. If you are using a spark-submit on the EMR Cluster, just add this `--jars /usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/hudi/hudi-aws-bundle.jar` I

Re: [I] [SUPPORT] [hudi]

2025-02-27 Thread via GitHub
ad1happy2go closed issue #12853: [SUPPORT] URL: https://github.com/apache/hudi/issues/12853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsu

Re: [I] [SUPPORT] [hudi]

2025-02-25 Thread via GitHub
danny0405 commented on issue #12853: URL: https://github.com/apache/hudi/issues/12853#issuecomment-2683578713 @BreidyAriasD Can you add more context here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] [SUPPORT] Hudi CLI conf is hard coded to /opt/hudi/packaging/hudi-cli-bundle/conf/hudi-defaults.conf [hudi]

2025-02-24 Thread via GitHub
yihua commented on issue #11909: URL: https://github.com/apache/hudi/issues/11909#issuecomment-2679818734 This is fixed by #12876 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] [SUPPORT] Hudi CLI conf is hard coded to /opt/hudi/packaging/hudi-cli-bundle/conf/hudi-defaults.conf [hudi]

2025-02-24 Thread via GitHub
yihua closed issue #11909: [SUPPORT] Hudi CLI conf is hard coded to /opt/hudi/packaging/hudi-cli-bundle/conf/hudi-defaults.conf URL: https://github.com/apache/hudi/issues/11909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] [SUPPORT] [hudi]

2025-02-19 Thread via GitHub
BreidyAriasD opened a new issue, #12853: URL: https://github.com/apache/hudi/issues/12853 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-s

Re: [I] [SUPPORT] Hudi java client throws Error waiting for async clean service to finish [hudi]

2025-02-13 Thread via GitHub
ad1happy2go closed issue #6565: [SUPPORT] Hudi java client throws Error waiting for async clean service to finish URL: https://github.com/apache/hudi/issues/6565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] [SUPPORT] Hudi delete not working via spark apis [hudi]

2025-02-13 Thread via GitHub
rangareddy commented on issue #6341: URL: https://github.com/apache/hudi/issues/6341#issuecomment-2658322480 Hi @rjmblc Please let me know if this issue has been resolved. If it is resolved, feel free to close it. -- This is an automated message from the Apache Git Service. To res

Re: [I] [SUPPORT] Hudi Upsert operation taking too long under writing data for base table and metadata table [hudi]

2025-02-13 Thread via GitHub
danny0405 commented on issue #12828: URL: https://github.com/apache/hudi/issues/12828#issuecomment-2658164037 @ad1happy2go Can you help here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] [SUPPORT] Hudi Upsert operation taking too long under writing data for base table and metadata table [hudi]

2025-02-11 Thread via GitHub
dataproblems opened a new issue, #12828: URL: https://github.com/apache/hudi/issues/12828 **Describe the problem you faced** I am trying to perform an upsert to hudi with a dataframe of 200 M records and I noticed that it is taking an hour to complete this process. My hudi table has

  1   2   3   4   5   6   >