Re: Storm HiveBolt missing records due to batching of Hive transactions

Aaron . Dossett Fri, 09 Oct 2015 05:50:04 -0700

STORM-938 adds a periodic flush to the HiveBolt using tick tuples that would 
address this situation.


From: Harshit Raikar <harshit.rai...@gmail.com<mailto:harshit.rai...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Friday, October 9, 2015 at 4:05 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Storm HiveBolt missing records due to batching of Hive transactions



To store the processed records I am using HiveBolt in Storm topology with 
following arguments.

- id: "MyHiveOptions"
    className: "org.apache.storm.hive.common.HiveOptions"
      - "${metastore.uri}"                       # metaStoreURI
      - "${hive.database}"                       # databaseName
      - "${hive.table}"                          # tableName
    configMethods:
          - name: "withTxnsPerBatch"
            args:
              - 2
          - name: "withBatchSize"
            args:
              - 100
          - name: "withIdleTimeout"
            args:
              - 2      #default value 0
          - name: "withMaxOpenConnections"
            args:
              - 200     #default value 500
          - name: "withCallTimeout"
            args:
              - 30000     #default value 10000
          - name: "withHeartBeatInterval"
            args:
              - 240     #default value 240


There are missing transaction in Hive due to batch no being completed and 
records are flushed. (For example: 1330 records are processed but only 1200 
records are in hive. 130 records missing.)

How can I overcome this situation? How can I fill the batch so that the 
transaction is triggered and the records are stored in hive.

Topology : Kafka-Spout --> DataProcessingBolt
           DataProcessingBolt -->HiveBolt (Sink)
           DataProcessingBolt -->JdbcBolt (Sink)

--
Thanks and Regards,
Harshit Raikar



--
Thanks and Regards,
Harshit Raikar
Phone No. +4917655471932<tel:%2B4917655471932>

Re: Storm HiveBolt missing records due to batching of Hive transactions

Reply via email to