Re: Storm HiveBolt missing records due to batching of Hive transactions

Artem Ervits Wed, 21 Oct 2015 12:29:52 -0700

Please try this version
https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml
On Oct 21, 2015 11:19 AM, "Harshit Raikar" <harshit.rai...@gmail.com> wrote:


> withTickTupleInterval parameter is not available in the storm version
> which I am using.
>
> On 21 October 2015 at 14:02, Harshit Raikar <harshit.rai...@gmail.com>
> wrote:
>
>> Hi Aaron,
>>
>> Thanks for the information.
>> Do I need to update my Storm version? Currently I am using 0.10.0 version.
>> Can you please guide me what parameters need to be set to use tick tuple
>>
>> Regards,
>> Harshit Raikar
>>
>> On 9 October 2015 at 14:49, Aaron.Dossett <aaron.doss...@target.com>
>> wrote:
>>
>>> STORM-938 adds a periodic flush to the HiveBolt using tick tuples that
>>> would address this situation.
>>>
>>> From: Harshit Raikar <harshit.rai...@gmail.com>
>>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
>>> Date: Friday, October 9, 2015 at 4:05 AM
>>> To: "user@hive.apache.org" <user@hive.apache.org>
>>> Subject: Storm HiveBolt missing records due to batching of Hive
>>> transactions
>>>
>>>
>>> To store the processed records I am using HiveBolt in Storm topology
>>> with following arguments.
>>>
>>> - id: "MyHiveOptions"
>>>     className: "org.apache.storm.hive.common.HiveOptions"
>>>       - "${metastore.uri}"                       # metaStoreURI
>>>       - "${hive.database}"                       # databaseName
>>>       - "${hive.table}"                          # tableName
>>>     configMethods:
>>>           - name: "withTxnsPerBatch"
>>>             args:
>>>               - 2
>>>           - name: "withBatchSize"
>>>             args:
>>>               - 100
>>>           - name: "withIdleTimeout"
>>>             args:
>>>               - 2      #default value 0
>>>           - name: "withMaxOpenConnections"
>>>             args:
>>>               - 200     #default value 500
>>>           - name: "withCallTimeout"
>>>             args:
>>>               - 30000     #default value 10000
>>>           - name: "withHeartBeatInterval"
>>>             args:
>>>               - 240     #default value 240
>>>
>>> There are missing transaction in Hive due to batch no being completed
>>> and records are flushed. (For example: 1330 records are processed but only
>>> 1200 records are in hive. 130 records missing.)
>>>
>>> How can I overcome this situation? How can I fill the batch so that the
>>> transaction is triggered and the records are stored in hive.
>>>
>>> Topology : Kafka-Spout --> DataProcessingBolt
>>>            DataProcessingBolt -->HiveBolt (Sink)
>>>            DataProcessingBolt -->JdbcBolt (Sink)
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Harshit Raikar
>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Harshit Raikar
>>> Phone No. +4917655471932
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Harshit Raikar
>> Phone No. +4917655471932
>>
>
>
>
> --
> Thanks and Regards,
> Harshit Raikar
> Phone No. +4917655471932
>

Re: Storm HiveBolt missing records due to batching of Hive transactions

Reply via email to