STORM-938 adds a periodic flush to the HiveBolt using tick tuples that would address this situation.
From: Harshit Raikar <harshit.rai...@gmail.com<mailto:harshit.rai...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Friday, October 9, 2015 at 4:05 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Storm HiveBolt missing records due to batching of Hive transactions To store the processed records I am using HiveBolt in Storm topology with following arguments. - id: "MyHiveOptions" className: "org.apache.storm.hive.common.HiveOptions" - "${metastore.uri}" # metaStoreURI - "${hive.database}" # databaseName - "${hive.table}" # tableName configMethods: - name: "withTxnsPerBatch" args: - 2 - name: "withBatchSize" args: - 100 - name: "withIdleTimeout" args: - 2 #default value 0 - name: "withMaxOpenConnections" args: - 200 #default value 500 - name: "withCallTimeout" args: - 30000 #default value 10000 - name: "withHeartBeatInterval" args: - 240 #default value 240 There are missing transaction in Hive due to batch no being completed and records are flushed. (For example: 1330 records are processed but only 1200 records are in hive. 130 records missing.) How can I overcome this situation? How can I fill the batch so that the transaction is triggered and the records are stored in hive. Topology : Kafka-Spout --> DataProcessingBolt DataProcessingBolt -->HiveBolt (Sink) DataProcessingBolt -->JdbcBolt (Sink) -- Thanks and Regards, Harshit Raikar -- Thanks and Regards, Harshit Raikar Phone No. +4917655471932<tel:%2B4917655471932>