maabkhan commented on issue #11971:
URL: https://github.com/apache/hudi/issues/11971#issuecomment-2362866809

   @danny0405 in this scenario the reader is reading the inflight commit only , 
the same job with same data source is working fine when write is not happening 
on that, whereas when the table read is also getting updated, it is just 
loading the inflight_commit and moving to other tasks. 
   In the logs of the job with issue and no where it has loaded the 
clean_completed commit . When the job which run fine where concurrent write is 
not happening on the source table there we get in logs something like this - 
   2024-09-17T20:00:07.425377129Z 24/09/17 20:00:07 INFO HoodieActiveTimeline: 
Loaded instants upto : 
Option{val=[20240917195827185__clean__COMPLETED__20240917195837000]}
   which is not there in the logs of job with issue. 
   
   Also the result of the job with issue on completion shows that target is 
containing null data with correct schema. 
   Where as when i re-run the same job when concurrent write is not happening 
on the source table, correct data is loaded in the target.
   
   full logs of job with issue related to referred table-
   ```2024-09-18T19:44:03.501572973Z Start date: '2024-09-17 18:30:00'
   2024-09-18T19:44:03.501578493Z  End date: '2024-09-18 18:30:00'
   2024-09-18T19:44:03.502547582Z 2024-09-18 19:44:03,502 
KF_APP.local.luna_etl.utils.sql_utils(120) INFO: Full load SQL for table with 
no created col: 
   2024-09-18T19:44:03.502561841Z             SELECT *
   2024-09-18T19:44:03.502566391Z             FROM 
lake_tmevents_hourly.account_balance_events;
   2024-09-18T19:44:03.502570121Z             
   2024-09-18T19:44:03.521776686Z 24/09/18 19:44:03 INFO SharedState: Setting 
hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
   2024-09-18T19:44:03.526379251Z 24/09/18 19:44:03 INFO SharedState: Warehouse 
path is 'file:/opt/spark/work-dir/spark-warehouse'.
   2024-09-18T19:44:06.056057981Z 24/09/18 19:44:06 INFO HiveConf: Found 
configuration file null
   2024-09-18T19:44:06.071544306Z 24/09/18 19:44:06 INFO HiveUtils: 
Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
   2024-09-18T19:44:06.676999008Z 24/09/18 19:44:06 INFO HiveClientImpl: 
Warehouse location for Hive client (version 2.3.9) is 
file:/opt/spark/work-dir/spark-warehouse
   2024-09-18T19:44:08.381899474Z 24/09/18 19:44:08 INFO AWSGlueClientFactory: 
Using region from ec2 metadata : ap-south-1
   2024-09-18T19:44:09.749804085Z 24/09/18 19:44:09 INFO AWSGlueClientFactory: 
Using region from ec2 metadata : ap-south-1
   2024-09-18T19:44:13.168847379Z 24/09/18 19:44:13 WARN MetricsConfig: Cannot 
locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   2024-09-18T19:44:13.226672142Z 24/09/18 19:44:13 INFO MetricsSystemImpl: 
Scheduled Metric snapshot period at 10 second(s).
   2024-09-18T19:44:13.228102228Z 24/09/18 19:44:13 INFO MetricsSystemImpl: 
s3a-file-system metrics system started
   2024-09-18T19:44:14.047268479Z 24/09/18 19:44:14 WARN 
DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir 
of hudi-defaults.conf
   2024-09-18T19:44:14.074185695Z 24/09/18 19:44:14 WARN 
DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   2024-09-18T19:44:14.081421948Z 24/09/18 19:44:14 INFO DataSourceUtils: 
Getting table path..
   2024-09-18T19:44:14.082728611Z 24/09/18 19:44:14 INFO TablePathUtils: 
Getting table path from path : 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events
   2024-09-18T19:44:14.167799106Z 24/09/18 19:44:14 INFO DefaultSource: 
Obtained hudi table path: 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events
   2024-09-18T19:44:14.211559950Z 24/09/18 19:44:14 INFO HoodieTableMetaClient: 
Loading HoodieTableMetaClient from 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events
   2024-09-18T19:44:14.255635781Z 24/09/18 19:44:14 INFO HoodieTableConfig: 
Loading table properties from 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events/.hoodie/hoodie.properties
   2024-09-18T19:44:14.335476879Z 24/09/18 19:44:14 INFO HoodieTableMetaClient: 
Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) 
from s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events
   2024-09-18T19:44:14.350218569Z 24/09/18 19:44:14 INFO DefaultSource: Is 
bootstrapped table => false, tableType is: COPY_ON_WRITE, queryType is: snapshot
   2024-09-18T19:44:14.464630233Z 24/09/18 19:44:14 INFO HoodieActiveTimeline: 
Loaded instants upto : 
Option{val=[==>20240918193949292__commit__INFLIGHT__20240918194221000]}
   2024-09-18T19:44:14.791957221Z 24/09/18 19:44:14 INFO TableSchemaResolver: 
Reading schema from 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events/account_address=REVOLVE_LOAN_EMI_PRINCIPAL_BILLED/e5a4ae86-169a-4094-81b3-712545a959b4-0_220-55-5550_20240918183912828.parquet
   2024-09-18T19:44:14.998331976Z 24/09/18 19:44:14 INFO S3AInputStream: 
Switching to Random IO seek policy
   2024-09-18T19:44:15.507286054Z 24/09/18 19:44:15 INFO HoodieTableConfig: 
Loading table properties from 
s3a://trusted-luna-prod/tmevents_hourly/topics/account_balance_events/.hoodie/hoodie.properties
   2024-09-18T19:44:15.596507505Z 24/09/18 19:44:15 INFO HoodieActiveTimeline: 
Loaded instants upto : 
Option{val=[==>20240918193949292__commit__INFLIGHT__20240918194221000]}
   2024-09-18T19:44:15.596940823Z 24/09/18 19:44:15 INFO 
BaseHoodieTableFileIndex: Refresh table account_balance_events, spent: 121 ms
   2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to