Re: using Kudu with Spark

2017-07-24 Thread Mich Talebzadeh
thanks Pierce.That compilation looks very cool. Now as always the question is what is the best fit for the job at hand and I don't think there is a single answer. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: using Kudu with Spark

2017-07-24 Thread Pierce Lamb
Hi Mich, I tried to compile a list of datastores that connect to Spark and provide a bit of context. The list may help you in your research: https://stackoverflow.com/a/39753976/3723346 I'm going to add Kudu, Druid and Ampool from this thread. I'd like to point out SnappyData

Re: using Kudu with Spark

2017-07-24 Thread Mich Talebzadeh
now they are bringing up Ampool with spark for real time analytics Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpre

Re: using Kudu with Spark

2017-07-24 Thread Mich Talebzadeh
sounds like Druid can do the same? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at

Re: using Kudu with Spark

2017-07-24 Thread Mich Talebzadeh
Yes this storage layer is something I have been investigating in my own lab for mixed load such as Lambda Architecture. It offers the convenience of columnar RDBMS (much like Sybase IQ). Kudu tables look like those in SQL relational databases, each with a primary key made up of one or more colum

Re: using Kudu with Spark

2017-07-24 Thread Jörn Franke
I guess you have to find out yourself with experiments. Cloudera has some benchmarks, but it always depends what you test, your data volume and what is meant by "fast". It is also more than a file format with servers that communicate with each other etc. - more complexity. Of course there are

using Kudu with Spark

2017-07-24 Thread Mich Talebzadeh
hi, Has anyone had experience of using Kudu for faster analytics with Spark? How efficient is it compared to usinh HBase and other traditional storage for fast changing data please? Any insight will be appreciated. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?