Re: using Kudu with Spark

Pierce Lamb Mon, 24 Jul 2017 10:51:23 -0700

Hi Mich,

I tried to compile a list of datastores that connect to Spark and provide a
bit of context. The list may help you in your research:


https://stackoverflow.com/a/39753976/3723346

I'm going to add Kudu, Druid and Ampool from this thread.

I'd like to point out SnappyData
<https://github.com/SnappyDataInc/snappydata> as an option you should try.
SnappyData provides many of the features you've discussed (columnar
storage, replication, in-place updates etc) while also integrating the
datastore with Spark directly. That is, there is no "connector" to go over
for database operations; Spark and the datastore share the same JVM and
block manager. Thus, if performance is one of your concerns, this should
give you some of the best performance
<http://www.snappydata.io/highlights/performance> in this area.

Hope this helps,

Pierce

On Mon, Jul 24, 2017 at 10:02 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> now they are bringing up Ampool with spark for real time analytics
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 July 2017 at 11:15, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> sounds like Druid can do the same?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 July 2017 at 08:38, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Yes this storage layer is something I have been investigating in my own
>>> lab for mixed load such as Lambda Architecture.
>>>
>>>
>>>
>>> It offers the convenience of columnar RDBMS (much like Sybase IQ). Kudu
>>> tables look like those in SQL relational databases, each with a primary key
>>> made up of one or more columns that enforce uniqueness and acts as an index
>>> for efficient updates and deletes. Data is partitioned using what is known
>>> as tablets that make up tables. Kudu replicates these tablets to other
>>> nodes for redundancy.
>>>
>>>
>>> As you said there are a number of options. Kudu also claims in-place
>>> updates that needs to be tried for its consistency.
>>>
>>> Cheers
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 24 July 2017 at 08:30, Jörn Franke <jornfra...@gmail.com> wrote:
>>>
>>>> I guess you have to find out yourself with experiments. Cloudera has
>>>> some benchmarks, but it always depends what you test, your data volume and
>>>> what is meant by "fast". It is also more than a file format with servers
>>>> that communicate with each other etc.  - more complexity.
>>>> Of course there are alternatives that you could benchmark again, such
>>>> as Apache HAWQ (which is basically postgres on Hadoop), Apache ignite or
>>>> depending on your analysis even Flink or Spark Streaming.
>>>>
>>>> On 24. Jul 2017, at 09:25, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>> wrote:
>>>>
>>>> hi,
>>>>
>>>> Has anyone had experience of using Kudu for faster analytics with Spark?
>>>>
>>>> How efficient is it compared to usinh HBase and other traditional
>>>> storage for fast changing data please?
>>>>
>>>> Any insight will be appreciated.
>>>>
>>>> Thanks
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: using Kudu with Spark

Reply via email to