nsivabalan opened a new issue #4814:
URL: https://github.com/apache/hudi/issues/4814


   **Describe the problem you faced**
   
   My job fails w/ NPE during archival of data table when metadata table is 
enabled. 
   
   Hi All, I am facing an issue regarding table type after bumping to 0.10.1 
version.
   I have specified COW table type but somehow in logs I can see MERGE_ON_READ 
table type also. Is it for metadata table?
   The jobs are intermittently failing with Null pointer exception.
   Can someone please help me with this error. Attached stack trace and below 
are the write options used
   
   HoodieWriteConfig.TBL_NAME.key -> options.tableName,
   DataSourceWriteOptions.TABLE_TYPE.key -> 
DataSourceWriteOptions.TABLE_TYPE.defaultValue(),
   DataSourceWriteOptions.RECORDKEY_FIELD.key -> options.primaryKey,
   DataSourceWriteOptions.PRECOMBINE_FIELD.key -> RECKO_UPDATED_TIME,
   DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS.key -> 
classOf[MultiPartKeysValueExtractor].getName,
   DataSourceWriteOptions.HIVE_STYLE_PARTITIONING.key -> "true",
   DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> 
options.partitionCols.mkString(",")
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   2022-02-13 13:44:57,475 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,487 [main] INFO  
org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
   2022-02-13 13:44:57,487 [main] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem  - Opening 
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
 for reading
   2022-02-13 13:44:57,505 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,505 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit 
timeline for 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,562 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants 
upto : Option{val=[==>20220213134444484__commit__REQUESTED]}
   2022-02-13 13:44:57,562 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,574 [main] INFO  
org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
   2022-02-13 13:44:57,574 [main] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem  - Opening 
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
 for reading
   2022-02-13 13:44:57,592 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,592 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
   2022-02-13 13:44:57,615 [main] INFO  
org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties
   2022-02-13 13:44:57,615 [main] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem  - Opening 
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties'
 for reading
   2022-02-13 13:44:57,650 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
   2022-02-13 13:44:57,650 [main] INFO  
org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View 
Manager with storage type :REMOTE_FIRST
   2022-02-13 13:44:57,650 [main] INFO  
org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote 
first table view
   2022-02-13 13:44:57,657 [dispatcher-BlockManagerMaster] INFO  
org.apache.spark.storage.BlockManagerInfo  - Removed broadcast_55_piece0 on 
ip-10-9-157-234.ap-south-1.compute.internal:33735 in memory (size: 351.9 KiB, 
free: 693.4 MiB)
   2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO  
org.apache.spark.storage.BlockManagerInfo  - Removed broadcast_55_piece0 on 
ip-10-9-130-208.ap-south-1.compute.internal:39427 in memory (size: 351.9 KiB, 
free: 2004.5 MiB)
   2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO  
org.apache.spark.storage.BlockManagerInfo  - Removed broadcast_55_piece0 on 
ip-10-9-130-208.ap-south-1.compute.internal:37557 in memory (size: 351.9 KiB, 
free: 2004.5 MiB)
   2022-02-13 13:44:57,714 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants 
upto : Option{val=[==>20220213134444484__commit__REQUESTED]}
   2022-02-13 13:44:57,715 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,728 [main] INFO  
org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
   2022-02-13 13:44:57,728 [main] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem  - Opening 
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
 for reading
   2022-02-13 13:44:57,745 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
   2022-02-13 13:44:57,745 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading 
HoodieTableMetaClient from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
   2022-02-13 13:44:57,766 [main] INFO  
org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties
   2022-02-13 13:44:57,766 [main] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem  - Opening 
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties'
 for reading
   2022-02-13 13:44:57,785 [main] INFO  
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of 
type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from 
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
   2022-02-13 13:44:57,841 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants 
upto : Option{val=[20220213134350974001__commit__COMPLETED]}
   2022-02-13 13:44:57,841 [main] INFO  
org.apache.hudi.table.HoodieTimelineArchiveLog  - Limiting archiving of 
instants to latest compaction on metadata table at 20220213134350974001
   2022-02-13 13:44:57,842 [main] INFO  
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient  - Stopping heartbeat 
for instant 20220213134342038
   2022-02-13 13:44:57,842 [main] INFO  
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient  - Stopped heartbeat for 
instant 20220213134342038
   2022-02-13 13:44:57,907 [main] INFO  
org.apache.hudi.client.heartbeat.HeartbeatUtils  - Deleted the heartbeat for 
instant 20220213134342038
   2022-02-13 13:44:57,907 [main] INFO  
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient  - Deleted heartbeat 
file for instant 20220213134342038
   2022-02-13 13:44:57,907 [main] INFO  
org.apache.hudi.client.transaction.TransactionManager  - Transaction ending 
with transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]}
   2022-02-13 13:44:57,907 [main] INFO  
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider  - RELEASING 
lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20
   2022-02-13 13:44:57,908 [main] INFO  
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider  - RELEASED 
lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20
   2022-02-13 13:44:57,908 [main] INFO  
org.apache.hudi.client.transaction.TransactionManager  - Transaction ended with 
transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]}
   2022-02-13 13:44:57,910 [main] INFO  
com.recko.spark.jobs.v1.ingestion.DataValidationJob  - Current State :: 
================ FAILED
   2022-02-13 13:44:57,912 [main] INFO  
com.recko.spark.jobs.v1.ingestion.listeners.TaskStatusLogger  - TimeStamp : 
2022-02-13 13:44:57 =========== TaskId : 420a79fd-b810-47f7-ad28-dfcd7ea03c10 
===== State : FAILED
   2022-02-13 13:44:57,936 [main] INFO  com.recko.spark.utils.db.RDSUtil$  - 
Executing procedure :: INSERT INTO scepter.task_metrics(id, task_id, task_type, 
stage_num, batch_id, policy_id, file_size, start_time, end_time, time_taken, 
estimated_eta, other_attributes, org_id, tenant_id) VALUES (  
'4cf1da7d-2cf9-4d72-8be2-a679797c5a88', '420a79fd-b810-47f7-ad28-dfcd7ea03c10', 
'DATA_VALIDATION', '1', '1b5fbfd4-e793-4b7a-b896-f6b73e033000', 
'691508e5-9af8-4254-8211-3effed593c20', 1, '2022-02-13 13:43:24.041', 
'2022-02-13 13:44:57.913', 93872, 129663, '', 'ego', 'ego');
   2022-02-13 13:44:57,938 [main] INFO  com.recko.spark.utils.db.RDSUtil$  - 
Execution complete, TimeTaken :: 25
   2022-02-13 13:44:57,938 [main] INFO  
org.apache.kafka.clients.producer.ProducerConfig  - ProducerConfig values: 
        acks = 1
        batch.size = 16384
        bootstrap.servers = [ego-kafka.reckostaging.com:9094]
        buffer.memory = 33554432
        client.dns.lookup = use_all_dns_ips
        client.id = producer-5
        compression.type = none
        connections.max.idle.ms = 540000
        delivery.timeout.ms = 120000
        enable.idempotence = false
        interceptor.classes = []
        internal.auto.downgrade.txn.commit = false
        key.serializer = class 
org.apache.kafka.common.serialization.StringSerializer
        linger.ms = 0
        max.block.ms = 60000
        max.in.flight.requests.per.connection = 5
        max.request.size = 1048576
        metadata.max.age.ms = 300000
        metadata.max.idle.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partitioner.class = class 
org.apache.kafka.clients.producer.internals.DefaultPartitioner
        receive.buffer.bytes = 32768
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 30000
        retries = 2147483647
        retry.backoff.ms = 100
        sasl.client.callback.handler.class = null
        sasl.jaas.config = [hidden]
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism = PLAIN
        security.protocol = SASL_PLAINTEXT
        security.providers = null
        send.buffer.bytes = 131072
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.2
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        transaction.timeout.ms = 60000
        transactional.id = null
        value.serializer = class 
org.apache.kafka.common.serialization.StringSerializer
   
   2022-02-13 13:44:57,941 [main] INFO  
org.apache.kafka.common.security.authenticator.AbstractLogin  - Successfully 
logged in.
   2022-02-13 13:44:57,942 [main] INFO  
org.apache.kafka.common.utils.AppInfoParser  - Kafka version: 2.6.0
   2022-02-13 13:44:57,942 [main] INFO  
org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId: 62abe01bee039651
   2022-02-13 13:44:57,942 [main] INFO  
org.apache.kafka.common.utils.AppInfoParser  - Kafka startTimeMs: 1644759897942
   2022-02-13 13:44:57,953 [kafka-producer-network-thread | producer-5] INFO  
org.apache.kafka.clients.Metadata  - [Producer clientId=producer-5] Cluster ID: 
8XeF52xMSRmjjqZb4A0Ceg
   2022-02-13 13:44:57,963 [main] INFO  
com.recko.spark.utils.tasktracker.KafkaTaskTrackerService  - recordMeta offset 
= 6114
   2022-02-13 13:44:57,963 [main] INFO  
org.apache.kafka.clients.producer.KafkaProducer  - [Producer 
clientId=producer-5] Closing the Kafka producer with timeoutMillis = 
9223372036854775807 ms.
   2022-02-13 13:44:57,985 [main] INFO  com.recko.spark.utils.S3Util$  - 
cleaning up : 
s3://recko-ego/datalake_streamline_concurrency_1/journal_duplicates/420a79fd-b810-47f7-ad28-dfcd7ea03c10/
   2022-02-13 13:44:58,047 [main] INFO  
com.recko.spark.services.ingester.RollbackService$  - Query :: ALTER TABLE 
ego_ext.raw_journal DROP IF EXISTS PARTITION (org_id = 'ego', tenant_id = 
'ego', data_source = 'EXPECTED', batch_id = 
'1b5fbfd4-e793-4b7a-b896-f6b73e033000', task_id = 
'420a79fd-b810-47f7-ad28-dfcd7ea03c10' );
   2022-02-13 13:44:58,102 [main] INFO  org.apache.spark.sql.hive.HiveUtils  - 
Initializing HiveMetastoreConnection version 3.1.2-amzn-5 using Spark classes.
   2022-02-13 13:44:58,202 [main] INFO  org.apache.hadoop.hive.conf.HiveConf  - 
Found configuration file 
file:/mnt/yarn/usercache/hadoop/appcache/application_1642451881812_7589/container_1642451881812_7589_01_000001/hive-site.xml
   2022-02-13 13:44:58,397 [main] INFO  SessionState  - Hive Session ID = 
24800b67-fd35-4f59-b8c4-43956e2fdce1
   2022-02-13 13:44:58,495 [main] INFO  
org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: 
/tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1
   2022-02-13 13:44:58,502 [main] INFO  
org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: 
/tmp/yarn/24800b67-fd35-4f59-b8c4-43956e2fdce1
   2022-02-13 13:44:58,503 [main] INFO  
org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: 
/tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1/_tmp_space.db
   2022-02-13 13:44:58,517 [main] INFO  
org.apache.spark.sql.hive.client.HiveClientImpl  - Warehouse location for Hive 
client (version 3.1.2) is hdfs:///user/spark/warehouse
   2022-02-13 13:44:58,663 [dispatcher-CoarseGrainedScheduler] INFO  
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint  - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(10.9.157.234:45834) with ID 3,  ResourceProfileId 0
   2022-02-13 13:44:58,664 [spark-listener-group-executorManagement] INFO  
org.apache.spark.scheduler.dynalloc.ExecutorMonitor  - New executor 3 has 
registered (new total is 4)
   2022-02-13 13:44:58,767 [dispatcher-BlockManagerMaster] INFO  
org.apache.spark.storage.BlockManagerMasterEndpoint  - Registering block 
manager ip-10-9-157-234.ap-south-1.compute.internal:38329 with 2004.6 MiB RAM, 
BlockManagerId(3, ip-10-9-157-234.ap-south-1.compute.internal, 38329, None)
   2022-02-13 13:45:00,299 [block-manager-storage-async-thread-pool-165] INFO  
org.apache.spark.storage.BlockManager  - Removing RDD 94
   2022-02-13 13:45:00,315 [block-manager-storage-async-thread-pool-172] INFO  
org.apache.spark.storage.BlockManager  - Removing RDD 114
   2022-02-13 13:45:00,319 [block-manager-storage-async-thread-pool-5] INFO  
org.apache.spark.storage.BlockManager  - Removing RDD 137
   2022-02-13 13:45:00,322 [block-manager-storage-async-thread-pool-17] INFO  
org.apache.spark.storage.BlockManager  - Removing RDD 122
   2022-02-13 13:45:02,507 [main] INFO  
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController
  - Created SQLStdHiveAccessController for session context : 
HiveAuthzSessionContext [sessionString=24800b67-fd35-4f59-b8c4-43956e2fdce1, 
clientType=HIVECLI]
   2022-02-13 13:45:02,509 [main] WARN  
org.apache.hadoop.hive.ql.session.SessionState  - METASTORE_FILTER_HOOK will be 
ignored, since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
   2022-02-13 13:45:03,707 [main] INFO  
org.sparkproject.jetty.server.AbstractConnector  - Stopped 
Spark@60f662bd{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
   2022-02-13 13:45:03,710 [main] INFO  org.apache.spark.ui.SparkUI  - Stopped 
Spark web UI at http://ip-10-9-157-234.ap-south-1.compute.internal:4040
   2022-02-13 13:45:03,714 [YARN application state monitor] INFO  
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - Interrupting 
monitor thread
   2022-02-13 13:45:03,716 [main] INFO  
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - Shutting down 
all executors
   2022-02-13 13:45:03,716 [dispatcher-CoarseGrainedScheduler] INFO  
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint  - 
Asking each executor to shut down
   2022-02-13 13:45:03,721 [main] INFO  
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - YARN client 
scheduler backend Stopped
   2022-02-13 13:45:03,750 [dispatcher-event-loop-2] INFO  
org.apache.spark.MapOutputTrackerMasterEndpoint  - 
MapOutputTrackerMasterEndpoint stopped!
   2022-02-13 13:45:03,764 [main] INFO  
org.apache.spark.storage.memory.MemoryStore  - MemoryStore cleared
   2022-02-13 13:45:03,764 [main] INFO  org.apache.spark.storage.BlockManager  
- BlockManager stopped
   2022-02-13 13:45:03,772 [main] INFO  
org.apache.spark.storage.BlockManagerMaster  - BlockManagerMaster stopped
   2022-02-13 13:45:03,778 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint
  - OutputCommitCoordinator stopped!
   2022-02-13 13:45:03,784 [main] INFO  org.apache.spark.SparkContext  - 
Successfully stopped SparkContext
   
   <<< Invocation of Spark command completed <<<
   
   Hadoop Job IDs executed by Spark: job_1642451881812_7593
   
   java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412)
        at 
org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54)
        at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219)
        at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143)
   Caused by: java.lang.NullPointerException
        at 
org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230)
        at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
        at 
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
        at 
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
        at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66)
        at 
com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135)
        at 
com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10)
        at 
com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184)
        at 
com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30)
        at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93)
        at 
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60)
        ... 16 more
   Failing Oozie Launcher, null
   java.lang.NullPointerException
        at 
org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230)
        at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at 
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
        at 
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
        at 
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
        at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66)
        at 
com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135)
        at 
com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10)
        at 
com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184)
        at 
com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30)
        at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23)
        at 
com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93)
        at 
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412)
        at 
org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54)
        at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219)
        at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143)
   Oozie Launcher, uploading action data to HDFS sequence file: 
hdfs://qa-emr.reckostaging.com:8020/user/hadoop/oozie-oozi/0005119-220117203901194-oozie-oozi-W/IngestionJob--spark/action-data.seq
   2022-02-13 13:45:03,825 [main] INFO  org.apache.hadoop.io.compress.CodecPool 
 - Got brand-new compressor [.deflate]
   Stopping AM
   2022-02-13 13:45:03,837 [main] INFO  
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Waiting for 
application to be successfully unregistered.
   Callback notification attempts left 0
   Callback notification trying 
http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED
   Callback notification to 
http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED
 succeeded
   Callback notification succeeded
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to