Hi Mich, I tried to compile a list of datastores that connect to Spark and provide a bit of context. The list may help you in your research:
https://stackoverflow.com/a/39753976/3723346 I'm going to add Kudu, Druid and Ampool from this thread. I'd like to point out SnappyData <https://github.com/SnappyDataInc/snappydata> as an option you should try. SnappyData provides many of the features you've discussed (columnar storage, replication, in-place updates etc) while also integrating the datastore with Spark directly. That is, there is no "connector" to go over for database operations; Spark and the datastore share the same JVM and block manager. Thus, if performance is one of your concerns, this should give you some of the best performance <http://www.snappydata.io/highlights/performance> in this area. Hope this helps, Pierce On Mon, Jul 24, 2017 at 10:02 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > now they are bringing up Ampool with spark for real time analytics > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 24 July 2017 at 11:15, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> sounds like Druid can do the same? >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 24 July 2017 at 08:38, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Yes this storage layer is something I have been investigating in my own >>> lab for mixed load such as Lambda Architecture. >>> >>> >>> >>> It offers the convenience of columnar RDBMS (much like Sybase IQ). Kudu >>> tables look like those in SQL relational databases, each with a primary key >>> made up of one or more columns that enforce uniqueness and acts as an index >>> for efficient updates and deletes. Data is partitioned using what is known >>> as tablets that make up tables. Kudu replicates these tablets to other >>> nodes for redundancy. >>> >>> >>> As you said there are a number of options. Kudu also claims in-place >>> updates that needs to be tried for its consistency. >>> >>> Cheers >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 24 July 2017 at 08:30, Jörn Franke <jornfra...@gmail.com> wrote: >>> >>>> I guess you have to find out yourself with experiments. Cloudera has >>>> some benchmarks, but it always depends what you test, your data volume and >>>> what is meant by "fast". It is also more than a file format with servers >>>> that communicate with each other etc. - more complexity. >>>> Of course there are alternatives that you could benchmark again, such >>>> as Apache HAWQ (which is basically postgres on Hadoop), Apache ignite or >>>> depending on your analysis even Flink or Spark Streaming. >>>> >>>> On 24. Jul 2017, at 09:25, Mich Talebzadeh <mich.talebza...@gmail.com> >>>> wrote: >>>> >>>> hi, >>>> >>>> Has anyone had experience of using Kudu for faster analytics with Spark? >>>> >>>> How efficient is it compared to usinh HBase and other traditional >>>> storage for fast changing data please? >>>> >>>> Any insight will be appreciated. >>>> >>>> Thanks >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>> >> >