Re: Best way to present data collected by Flume through Spark

Sachin Janani Thu, 15 Sep 2016 07:26:28 -0700

Hi Mich,

I agree that the technology stack that you describe is more difficult to
manage due to different components (like HDFS,Flume,Kafka etc) involved.
The solution to this problem could be, to have some DB which has the
capability to support mix workloads (OLTP,OLAP,Streaming etc) and I think
snappydata <http://www.snappydata.io/> fits better for your problem.
Its an open source distributed in-memory data store with spark as
computational engine and supports real-time operational analytics,
delivering stream analytics, OLTP (online transaction processing) and OLAP
(online analytical processing) in a single integrated cluster.As it is
developed on top of spark ,your existing spark code will work as is.Please
have a look:
http://www.snappydata.io/
http://snappydatainc.github.io/snappydata/



Thanks and Regards,
Sachin Janani

On Thu, Sep 15, 2016 at 7:16 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> any ideas on this?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 15 September 2016 at 09:35, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> This is for fishing for some ideas.
>>
>> In the design we get prices directly through Kafka into Flume and store
>> it on HDFS as text files
>> We can then use Spark with Zeppelin to present data to the users.
>>
>> This works. However, I am aware that once the volume of flat files rises
>> one needs to do housekeeping. You don't want to read all files every time.
>>
>> A more viable alternative would be to read data into some form of tables
>> (Hive etc) periodically through an hourly cron set up so batch process will
>> have up to date and accurate data up to last hour.
>>
>> That certainly be an easier option for the users as well.
>>
>> I was wondering what would be the best strategy here. Druid, Hive others?
>>
>> The business case here is that users may want to access older data so a
>> database of some sort will be a better solution? In all likelihood they
>> want a week's data.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>

Re: Best way to present data collected by Flume through Spark

Reply via email to