Souldiv opened a new issue, #13057: URL: https://github.com/apache/hudi/issues/13057
**Describe the problem you faced** I am trying to store table metadata in hive metastore using the following spark command. I have followed the config as shown [here](https://hudi.apache.org/docs/0.15.0/configurations/#META_SYNC). And the following command is run: ```bash spark-submit --class org.apache.hudi.utilities.streamer.HoodieStr eamer $HUDI_UTILITIES_BUNDLE \ --table-type COPY_ON_WRITE \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --source-ordering-field ts \ --target-base-path hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 \ --target-table stock_ticks_cow_2 \ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \ --hoodie-conf hoodie.streamer.schemaprovider.registry.url=http://localhost:8081/subjects/stock_ticks-value/versions/latest \ --hoodie-conf hoodie.streamer.source.kafka.topic=stock_ticks \ --hoodie-conf hoodie.datasource.write.recordkey.field=key \ --hoodie-conf hoodie.datasource.write.partitionpath.field=date \ --hoodie-conf schema.registry.url=http://localhost:8081 \ --hoodie-conf auto.offset.reset=earliest \ --hoodie-conf bootstrap.servers=localhost:9092 \ --hoodie-conf hoodie.upsert.shuffle.parallelism=2 \ --hoodie-conf hoodie.insert.shuffle.parallelism=2 \ --hoodie-conf hoodie.delete.shuffle.parallelism=2 \ --hoodie-conf hoodie.bulkinsert.shuffle.parallelism=2 \ --hoodie-conf hoodie.datasource.hive_sync.mode=hms \ --hoodie-conf hoodie.datasource.hive_sync.enable=true \ --hoodie-conf hoodie.datasource.hive_sync.metastore.uris=thrift://localhost:9083 \ --hoodie-conf hoodie.datasource.hive_sync.table=stock_ticks_cow_2 \ --hoodie-conf hoodie.datasource.meta.sync.enable=true \ --hoodie-conf hoodie.datasource.hive_sync.batch_num=10 \ --props file:///dev/null ``` spark writes the table as intended to hdfs but I don't see the table metadata in hive through beeline. Please let me know if I am missing any required configuration or If I have misunderstood the purpose of this configuration. **To Reproduce** Steps to reproduce the behavior: 1. push stock data to `stock_ticks` topic 2. run above spark command 3. check from beeline if tables shows up using `show tables;` **Expected behavior** I was expecting the table metadata to be synced with hive upon running the spark command with hive configuration. **Environment Description** * Hudi version : 0.15 * Spark version : 3.5.5 * Hive version : 2.3.9 * Hadoop version : 3.4.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : No **Stacktrace** ``` 25/03/30 17:42:33 WARN Utils: Your hostname, hudi resolves to a loopback address: 127.0.1.1; using 10.0.0.108 instead (on interface eth0) 25/03/30 17:42:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 25/03/30 17:42:33 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs 25/03/30 17:42:34 INFO SparkContext: Running Spark version 3.5.5 25/03/30 17:42:34 INFO SparkContext: OS info Linux, 6.8.4-3-pve, amd64 25/03/30 17:42:34 INFO SparkContext: Java version 1.8.0_442 25/03/30 17:42:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/03/30 17:42:34 INFO ResourceUtils: ============================================================== 25/03/30 17:42:34 INFO ResourceUtils: No custom resources configured for spark.driver. 25/03/30 17:42:34 INFO ResourceUtils: ============================================================== 25/03/30 17:42:34 INFO SparkContext: Submitted application: streamer-stock_ticks_cow_2 25/03/30 17:42:34 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/03/30 17:42:34 INFO ResourceProfile: Limiting resource is cpu 25/03/30 17:42:34 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/03/30 17:42:34 INFO SecurityManager: Changing view acls to: conuser 25/03/30 17:42:34 INFO SecurityManager: Changing modify acls to: conuser 25/03/30 17:42:34 INFO SecurityManager: Changing view acls groups to: 25/03/30 17:42:34 INFO SecurityManager: Changing modify acls groups to: 25/03/30 17:42:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: conuser; groups with view permissions: EMPTY; users with modify permissions: conuser; groups with modify permissions: EMPTY 25/03/30 17:42:34 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 25/03/30 17:42:34 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 25/03/30 17:42:34 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 25/03/30 17:42:34 INFO Utils: Successfully started service 'sparkDriver' on port 44127. 25/03/30 17:42:34 INFO SparkEnv: Registering MapOutputTracker 25/03/30 17:42:34 INFO SparkEnv: Registering BlockManagerMaster 25/03/30 17:42:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/03/30 17:42:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/03/30 17:42:34 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/03/30 17:42:34 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-970f83dc-4465-4290-a3dd-b6a401ed3feb 25/03/30 17:42:34 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB 25/03/30 17:42:34 INFO SparkEnv: Registering OutputCommitCoordinator 25/03/30 17:42:34 INFO JettyUtils: Start Jetty 0.0.0.0:8090 for SparkUI 25/03/30 17:42:34 WARN Utils: Service 'SparkUI' could not bind on port 8090. Attempting port 8091. 25/03/30 17:42:34 INFO Utils: Successfully started service 'SparkUI' on port 8091. 25/03/30 17:42:34 INFO SparkContext: Added JAR file:/home/conuser/downloads/hudi-0.15.0/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.15.0.jar at spark://10.0.0.108:44127/jars/hudi-utilities-bundle_2.12-0.15.0.jar with timestamp 1743356554014 25/03/30 17:42:34 INFO Executor: Starting executor ID driver on host 10.0.0.108 25/03/30 17:42:34 INFO Executor: OS info Linux, 6.8.4-3-pve, amd64 25/03/30 17:42:34 INFO Executor: Java version 1.8.0_442 25/03/30 17:42:34 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 25/03/30 17:42:34 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@365a6a43 for default. 25/03/30 17:42:34 INFO Executor: Fetching spark://10.0.0.108:44127/jars/hudi-utilities-bundle_2.12-0.15.0.jar with timestamp 1743356554014 25/03/30 17:42:34 INFO TransportClientFactory: Successfully created connection to /10.0.0.108:44127 after 19 ms (0 ms spent in bootstraps) 25/03/30 17:42:34 INFO Utils: Fetching spark://10.0.0.108:44127/jars/hudi-utilities-bundle_2.12-0.15.0.jar to /tmp/spark-8b36c157-3895-45ce-86b2-5a063c272795/userFiles-2caada7f-5b56-4053-8db1-5b00562db47c/fetchFileTemp821209291924917814.tmp 25/03/30 17:42:34 INFO Executor: Adding file:/tmp/spark-8b36c157-3895-45ce-86b2-5a063c272795/userFiles-2caada7f-5b56-4053-8db1-5b00562db47c/hudi-utilities-bundle_2.12-0.15.0.jar to class loader default 25/03/30 17:42:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35865. 25/03/30 17:42:34 INFO NettyBlockTransferService: Server created on 10.0.0.108:35865 25/03/30 17:42:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/03/30 17:42:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.108, 35865, None) 25/03/30 17:42:34 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.108:35865 with 366.3 MiB RAM, BlockManagerId(driver, 10.0.0.108, 35865, None) 25/03/30 17:42:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.108, 35865, None) 25/03/30 17:42:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.108, 35865, None) 25/03/30 17:42:35 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 25/03/30 17:42:35 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 25/03/30 17:42:35 INFO UtilHelpers: Adding overridden properties to file properties. 25/03/30 17:42:35 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir. 25/03/30 17:42:35 INFO SharedState: Warehouse path is 'hdfs://localhost:9000/user/hive/warehouse'. 25/03/30 17:42:35 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieTableConfig: Loading table properties from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2/.hoodie/hoodie.properties 25/03/30 17:42:35 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieStreamer: Creating Hudi Streamer with configs: auto.offset.reset: earliest bootstrap.servers: localhost:9092 hoodie.auto.adjust.lock.configs: true hoodie.bulkinsert.shuffle.parallelism: 2 hoodie.datasource.hive_sync.batch_num: 10 hoodie.datasource.hive_sync.enable: true hoodie.datasource.hive_sync.metastore.uris: thrift://localhost:9083 hoodie.datasource.hive_sync.mode: hms hoodie.datasource.hive_sync.table: stock_ticks_cow_2 hoodie.datasource.meta.sync.enable: true hoodie.datasource.write.partitionpath.field: date hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: key hoodie.delete.shuffle.parallelism: 2 hoodie.insert.shuffle.parallelism: 2 hoodie.streamer.schemaprovider.registry.url: http://localhost:8081/subjects/stock_ticks-value/versions/latest hoodie.streamer.source.kafka.topic: stock_ticks hoodie.upsert.shuffle.parallelism: 2 schema.registry.url: http://localhost:8081 25/03/30 17:42:35 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE 25/03/30 17:42:35 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieTableConfig: Loading table properties from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2/.hoodie/hoodie.properties 25/03/30 17:42:35 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20250330173718165__commit__COMPLETED__20250330173723152]} 25/03/30 17:42:35 INFO HoodieIngestionService: Ingestion service starts running in run-once mode 25/03/30 17:42:35 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieTableConfig: Loading table properties from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2/.hoodie/hoodie.properties 25/03/30 17:42:35 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20250330173718165__commit__COMPLETED__20250330173723152]} 25/03/30 17:42:35 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:35 INFO HoodieTableConfig: Loading table properties from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2/.hoodie/hoodie.properties 25/03/30 17:42:35 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://localhost:9000/user/hive/warehouse/stock_ticks_cow_2 25/03/30 17:42:36 INFO StreamSync: Checkpoint to resume from : Option{val=stock_ticks,0:3482} 25/03/30 17:42:36 INFO KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000 25/03/30 17:42:36 INFO KafkaOffsetGen: getNextOffsetRanges set config hoodie.streamer.source.kafka.minPartitions to 0 25/03/30 17:42:36 INFO ConsumerConfig: ConsumerConfig values: allow.auto.create.topics = true auto.commit.interval.ms = 5000 auto.offset.reset = earliest bootstrap.servers = [localhost:9092] check.crcs = true client.dns.lookup = use_all_dns_ips client.id = consumer-null-1 client.rack = connections.max.idle.ms = 540000 default.api.timeout.ms = 60000 enable.auto.commit = true exclude.internal.topics = true fetch.max.bytes = 52428800 fetch.max.wait.ms = 500 fetch.min.bytes = 1 group.id = null group.instance.id = null heartbeat.interval.ms = 3000 interceptor.classes = [] internal.leave.group.on.close = true internal.throw.on.fetch.stable.offset.unsupported = false isolation.level = read_uncommitted key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer max.partition.fetch.bytes = 1048576 max.poll.interval.ms = 300000 max.poll.records = 500 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor] receive.buffer.bytes = 65536 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = PLAINTEXT security.providers = null send.buffer.bytes = 131072 session.timeout.ms = 10000 socket.connection.setup.timeout.max.ms = 30000 socket.connection.setup.timeout.ms = 10000 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2] ssl.endpoint.identification.algorithm = https ssl.engine.factory.class = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.certificate.chain = null ssl.keystore.key = null ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLSv1.2 ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.certificates = null ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer 25/03/30 17:42:36 WARN ConsumerConfig: The configuration 'schema.registry.url' was supplied but isn't a known config. 25/03/30 17:42:36 INFO AppInfoParser: Kafka version: 2.8.0 25/03/30 17:42:36 INFO AppInfoParser: Kafka commitId: ebb1d6e21cc92130 25/03/30 17:42:36 INFO AppInfoParser: Kafka startTimeMs: 1743356556089 25/03/30 17:42:36 INFO Metadata: [Consumer clientId=consumer-null-1, groupId=null] Cluster ID: Nk-xOeixRZGj41miDeXdjQ 25/03/30 17:42:36 INFO Metrics: Metrics scheduler closed 25/03/30 17:42:36 INFO Metrics: Closing reporter org.apache.kafka.common.metrics.JmxReporter 25/03/30 17:42:36 INFO Metrics: Metrics reporters closed 25/03/30 17:42:36 INFO AppInfoParser: App info kafka.consumer for consumer-null-1 unregistered 25/03/30 17:42:36 INFO KafkaOffsetGen: final ranges [OffsetRange(topic: 'stock_ticks', partition: 0, range: [3482 -> 3482])] 25/03/30 17:42:36 INFO KafkaSource: About to read sourceLimit 9223372036854775807 in 0 spark partitions from kafka for topic stock_ticks with offset ranges [OffsetRange(topic: 'stock_ticks', partition: 0, range: [3482 -> 3482])] 25/03/30 17:42:36 INFO KafkaSource: About to read 0 from Kafka for topic :stock_ticks 25/03/30 17:42:36 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20250330173718165__commit__COMPLETED__20250330173723152]} 25/03/30 17:42:36 INFO UtilHelpers: Adding overridden properties to file properties. 25/03/30 17:42:36 INFO StreamSync: No new data, source checkpoint has not changed. Nothing to commit. Old checkpoint=(Option{val=stock_ticks,0:3482}). New Checkpoint=(stock_ticks,0:3482) 25/03/30 17:42:36 INFO StreamSync: Shutting down embedded timeline server 25/03/30 17:42:36 INFO HoodieIngestionService: Ingestion service (run-once mode) has been shut down. 25/03/30 17:42:36 INFO SparkContext: SparkContext is stopping with exitCode 0. 25/03/30 17:42:36 INFO SparkUI: Stopped Spark web UI at http://10.0.0.108:8091 25/03/30 17:42:36 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 25/03/30 17:42:36 INFO MemoryStore: MemoryStore cleared 25/03/30 17:42:36 INFO BlockManager: BlockManager stopped 25/03/30 17:42:36 INFO BlockManagerMaster: BlockManagerMaster stopped 25/03/30 17:42:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 25/03/30 17:42:36 INFO SparkContext: Successfully stopped SparkContext 25/03/30 17:42:36 INFO ShutdownHookManager: Shutdown hook called 25/03/30 17:42:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-37076236-cc75-4ba3-a7bc-65a0778326a0 25/03/30 17:42:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-8b36c157-3895-45ce-86b2-5a063c272795 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org