Geovani,

You can use HiveContext to do inserts into a Hive table in a Streaming app just 
as you would a batch app. A DStream is really a collection of RDDs so you can 
run the insert from within the foreachRDD. You just have to be careful that 
you’re not creating large amounts of small files. So you may want to either 
increase the duration of your Streaming batches or repartition right before you 
insert. You’ll just need to do some testing based on your ingest volume. You 
may also want to consider streaming into another data store though.

Thanks,
Silvio

From: Luiz Geovani Vier <lgv...@gmail.com<mailto:lgv...@gmail.com>>
Date: Thursday, November 6, 2014 at 7:46 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Store DStreams into Hive using Hive Streaming

Hello,

Is there a built-in way or connector to store DStream results into an existing 
Hive ORC table using the Hive/HCatalog Streaming API?
Otherwise, do you have any suggestions regarding the implementation of such 
component?

Thank you,
-Geovani

Reply via email to