nsivabalan opened a new issue #4814: URL: https://github.com/apache/hudi/issues/4814
**Describe the problem you faced** My job fails w/ NPE during archival of data table when metadata table is enabled. Hi All, I am facing an issue regarding table type after bumping to 0.10.1 version. I have specified COW table type but somehow in logs I can see MERGE_ON_READ table type also. Is it for metadata table? The jobs are intermittently failing with Null pointer exception. Can someone please help me with this error. Attached stack trace and below are the write options used HoodieWriteConfig.TBL_NAME.key -> options.tableName, DataSourceWriteOptions.TABLE_TYPE.key -> DataSourceWriteOptions.TABLE_TYPE.defaultValue(), DataSourceWriteOptions.RECORDKEY_FIELD.key -> options.primaryKey, DataSourceWriteOptions.PRECOMBINE_FIELD.key -> RECKO_UPDATED_TIME, DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS.key -> classOf[MultiPartKeysValueExtractor].getName, DataSourceWriteOptions.HIVE_STYLE_PARTITIONING.key -> "true", DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> options.partitionCols.mkString(",") **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ``` 2022-02-13 13:44:57,475 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,487 [main] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties 2022-02-13 13:44:57,487 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties' for reading 2022-02-13 13:44:57,505 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,505 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,562 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants upto : Option{val=[==>20220213134444484__commit__REQUESTED]} 2022-02-13 13:44:57,562 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,574 [main] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties 2022-02-13 13:44:57,574 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties' for reading 2022-02-13 13:44:57,592 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,592 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata 2022-02-13 13:44:57,615 [main] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties 2022-02-13 13:44:57,615 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties' for reading 2022-02-13 13:44:57,650 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata 2022-02-13 13:44:57,650 [main] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST 2022-02-13 13:44:57,650 [main] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view 2022-02-13 13:44:57,657 [dispatcher-BlockManagerMaster] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on ip-10-9-157-234.ap-south-1.compute.internal:33735 in memory (size: 351.9 KiB, free: 693.4 MiB) 2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on ip-10-9-130-208.ap-south-1.compute.internal:39427 in memory (size: 351.9 KiB, free: 2004.5 MiB) 2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on ip-10-9-130-208.ap-south-1.compute.internal:37557 in memory (size: 351.9 KiB, free: 2004.5 MiB) 2022-02-13 13:44:57,714 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants upto : Option{val=[==>20220213134444484__commit__REQUESTED]} 2022-02-13 13:44:57,715 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,728 [main] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties 2022-02-13 13:44:57,728 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties' for reading 2022-02-13 13:44:57,745 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/ 2022-02-13 13:44:57,745 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata 2022-02-13 13:44:57,766 [main] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties 2022-02-13 13:44:57,766 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties' for reading 2022-02-13 13:44:57,785 [main] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata 2022-02-13 13:44:57,841 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants upto : Option{val=[20220213134350974001__commit__COMPLETED]} 2022-02-13 13:44:57,841 [main] INFO org.apache.hudi.table.HoodieTimelineArchiveLog - Limiting archiving of instants to latest compaction on metadata table at 20220213134350974001 2022-02-13 13:44:57,842 [main] INFO org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Stopping heartbeat for instant 20220213134342038 2022-02-13 13:44:57,842 [main] INFO org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Stopped heartbeat for instant 20220213134342038 2022-02-13 13:44:57,907 [main] INFO org.apache.hudi.client.heartbeat.HeartbeatUtils - Deleted the heartbeat for instant 20220213134342038 2022-02-13 13:44:57,907 [main] INFO org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Deleted heartbeat file for instant 20220213134342038 2022-02-13 13:44:57,907 [main] INFO org.apache.hudi.client.transaction.TransactionManager - Transaction ending with transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]} 2022-02-13 13:44:57,907 [main] INFO org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider - RELEASING lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20 2022-02-13 13:44:57,908 [main] INFO org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider - RELEASED lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20 2022-02-13 13:44:57,908 [main] INFO org.apache.hudi.client.transaction.TransactionManager - Transaction ended with transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]} 2022-02-13 13:44:57,910 [main] INFO com.recko.spark.jobs.v1.ingestion.DataValidationJob - Current State :: ================ FAILED 2022-02-13 13:44:57,912 [main] INFO com.recko.spark.jobs.v1.ingestion.listeners.TaskStatusLogger - TimeStamp : 2022-02-13 13:44:57 =========== TaskId : 420a79fd-b810-47f7-ad28-dfcd7ea03c10 ===== State : FAILED 2022-02-13 13:44:57,936 [main] INFO com.recko.spark.utils.db.RDSUtil$ - Executing procedure :: INSERT INTO scepter.task_metrics(id, task_id, task_type, stage_num, batch_id, policy_id, file_size, start_time, end_time, time_taken, estimated_eta, other_attributes, org_id, tenant_id) VALUES ( '4cf1da7d-2cf9-4d72-8be2-a679797c5a88', '420a79fd-b810-47f7-ad28-dfcd7ea03c10', 'DATA_VALIDATION', '1', '1b5fbfd4-e793-4b7a-b896-f6b73e033000', '691508e5-9af8-4254-8211-3effed593c20', 1, '2022-02-13 13:43:24.041', '2022-02-13 13:44:57.913', 93872, 129663, '', 'ego', 'ego'); 2022-02-13 13:44:57,938 [main] INFO com.recko.spark.utils.db.RDSUtil$ - Execution complete, TimeTaken :: 25 2022-02-13 13:44:57,938 [main] INFO org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values: acks = 1 batch.size = 16384 bootstrap.servers = [ego-kafka.reckostaging.com:9094] buffer.memory = 33554432 client.dns.lookup = use_all_dns_ips client.id = producer-5 compression.type = none connections.max.idle.ms = 540000 delivery.timeout.ms = 120000 enable.idempotence = false interceptor.classes = [] internal.auto.downgrade.txn.commit = false key.serializer = class org.apache.kafka.common.serialization.StringSerializer linger.ms = 0 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 1048576 metadata.max.age.ms = 300000 metadata.max.idle.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner receive.buffer.bytes = 32768 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retries = 2147483647 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = [hidden] sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = PLAIN security.protocol = SASL_PLAINTEXT security.providers = null send.buffer.bytes = 131072 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2] ssl.endpoint.identification.algorithm = https ssl.engine.factory.class = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLSv1.2 ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS transaction.timeout.ms = 60000 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.StringSerializer 2022-02-13 13:44:57,941 [main] INFO org.apache.kafka.common.security.authenticator.AbstractLogin - Successfully logged in. 2022-02-13 13:44:57,942 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.6.0 2022-02-13 13:44:57,942 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 62abe01bee039651 2022-02-13 13:44:57,942 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1644759897942 2022-02-13 13:44:57,953 [kafka-producer-network-thread | producer-5] INFO org.apache.kafka.clients.Metadata - [Producer clientId=producer-5] Cluster ID: 8XeF52xMSRmjjqZb4A0Ceg 2022-02-13 13:44:57,963 [main] INFO com.recko.spark.utils.tasktracker.KafkaTaskTrackerService - recordMeta offset = 6114 2022-02-13 13:44:57,963 [main] INFO org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=producer-5] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. 2022-02-13 13:44:57,985 [main] INFO com.recko.spark.utils.S3Util$ - cleaning up : s3://recko-ego/datalake_streamline_concurrency_1/journal_duplicates/420a79fd-b810-47f7-ad28-dfcd7ea03c10/ 2022-02-13 13:44:58,047 [main] INFO com.recko.spark.services.ingester.RollbackService$ - Query :: ALTER TABLE ego_ext.raw_journal DROP IF EXISTS PARTITION (org_id = 'ego', tenant_id = 'ego', data_source = 'EXPECTED', batch_id = '1b5fbfd4-e793-4b7a-b896-f6b73e033000', task_id = '420a79fd-b810-47f7-ad28-dfcd7ea03c10' ); 2022-02-13 13:44:58,102 [main] INFO org.apache.spark.sql.hive.HiveUtils - Initializing HiveMetastoreConnection version 3.1.2-amzn-5 using Spark classes. 2022-02-13 13:44:58,202 [main] INFO org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/mnt/yarn/usercache/hadoop/appcache/application_1642451881812_7589/container_1642451881812_7589_01_000001/hive-site.xml 2022-02-13 13:44:58,397 [main] INFO SessionState - Hive Session ID = 24800b67-fd35-4f59-b8c4-43956e2fdce1 2022-02-13 13:44:58,495 [main] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1 2022-02-13 13:44:58,502 [main] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /tmp/yarn/24800b67-fd35-4f59-b8c4-43956e2fdce1 2022-02-13 13:44:58,503 [main] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1/_tmp_space.db 2022-02-13 13:44:58,517 [main] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 3.1.2) is hdfs:///user/spark/warehouse 2022-02-13 13:44:58,663 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.9.157.234:45834) with ID 3, ResourceProfileId 0 2022-02-13 13:44:58,664 [spark-listener-group-executorManagement] INFO org.apache.spark.scheduler.dynalloc.ExecutorMonitor - New executor 3 has registered (new total is 4) 2022-02-13 13:44:58,767 [dispatcher-BlockManagerMaster] INFO org.apache.spark.storage.BlockManagerMasterEndpoint - Registering block manager ip-10-9-157-234.ap-south-1.compute.internal:38329 with 2004.6 MiB RAM, BlockManagerId(3, ip-10-9-157-234.ap-south-1.compute.internal, 38329, None) 2022-02-13 13:45:00,299 [block-manager-storage-async-thread-pool-165] INFO org.apache.spark.storage.BlockManager - Removing RDD 94 2022-02-13 13:45:00,315 [block-manager-storage-async-thread-pool-172] INFO org.apache.spark.storage.BlockManager - Removing RDD 114 2022-02-13 13:45:00,319 [block-manager-storage-async-thread-pool-5] INFO org.apache.spark.storage.BlockManager - Removing RDD 137 2022-02-13 13:45:00,322 [block-manager-storage-async-thread-pool-17] INFO org.apache.spark.storage.BlockManager - Removing RDD 122 2022-02-13 13:45:02,507 [main] INFO org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController - Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=24800b67-fd35-4f59-b8c4-43956e2fdce1, clientType=HIVECLI] 2022-02-13 13:45:02,509 [main] WARN org.apache.hadoop.hive.ql.session.SessionState - METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 2022-02-13 13:45:03,707 [main] INFO org.sparkproject.jetty.server.AbstractConnector - Stopped Spark@60f662bd{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 2022-02-13 13:45:03,710 [main] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://ip-10-9-157-234.ap-south-1.compute.internal:4040 2022-02-13 13:45:03,714 [YARN application state monitor] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Interrupting monitor thread 2022-02-13 13:45:03,716 [main] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Shutting down all executors 2022-02-13 13:45:03,716 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint - Asking each executor to shut down 2022-02-13 13:45:03,721 [main] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - YARN client scheduler backend Stopped 2022-02-13 13:45:03,750 [dispatcher-event-loop-2] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped! 2022-02-13 13:45:03,764 [main] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared 2022-02-13 13:45:03,764 [main] INFO org.apache.spark.storage.BlockManager - BlockManager stopped 2022-02-13 13:45:03,772 [main] INFO org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped 2022-02-13 13:45:03,778 [dispatcher-event-loop-3] INFO org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped! 2022-02-13 13:45:03,784 [main] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext <<< Invocation of Spark command completed <<< Hadoop Job IDs executed by Spark: job_1642451881812_7593 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412) at org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54) at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219) at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143) Caused by: java.lang.NullPointerException at org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124) at org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766) at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66) at com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135) at com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10) at com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184) at com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63) at com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30) at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71) at com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23) at com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60) ... 16 more Failing Oozie Launcher, null java.lang.NullPointerException at org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124) at org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766) at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66) at com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135) at com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10) at com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184) at com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63) at com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30) at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71) at com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23) at com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412) at org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54) at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219) at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143) Oozie Launcher, uploading action data to HDFS sequence file: hdfs://qa-emr.reckostaging.com:8020/user/hadoop/oozie-oozi/0005119-220117203901194-oozie-oozi-W/IngestionJob--spark/action-data.seq 2022-02-13 13:45:03,825 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate] Stopping AM 2022-02-13 13:45:03,837 [main] INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. Callback notification attempts left 0 Callback notification trying http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED Callback notification to http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED succeeded Callback notification succeeded ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org