thanks Pierce.That compilation looks very cool. Now as always the question is what is the best fit for the job at hand and I don't think there is a single answer.
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 24 July 2017 at 18:51, Pierce Lamb <richard.pierce.l...@gmail.com> wrote: > Hi Mich, > > I tried to compile a list of datastores that connect to Spark and provide > a bit of context. The list may help you in your research: > > https://stackoverflow.com/a/39753976/3723346 > > I'm going to add Kudu, Druid and Ampool from this thread. > > I'd like to point out SnappyData > <https://github.com/SnappyDataInc/snappydata> as an option you should > try. SnappyData provides many of the features you've discussed (columnar > storage, replication, in-place updates etc) while also integrating the > datastore with Spark directly. That is, there is no "connector" to go over > for database operations; Spark and the datastore share the same JVM and > block manager. Thus, if performance is one of your concerns, this should > give you some of the best performance > <http://www.snappydata.io/highlights/performance> in this area. > > Hope this helps, > > Pierce > > On Mon, Jul 24, 2017 at 10:02 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> now they are bringing up Ampool with spark for real time analytics >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 24 July 2017 at 11:15, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> sounds like Druid can do the same? >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 24 July 2017 at 08:38, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> Yes this storage layer is something I have been investigating in my own >>>> lab for mixed load such as Lambda Architecture. >>>> >>>> >>>> >>>> It offers the convenience of columnar RDBMS (much like Sybase IQ). Kudu >>>> tables look like those in SQL relational databases, each with a primary key >>>> made up of one or more columns that enforce uniqueness and acts as an index >>>> for efficient updates and deletes. Data is partitioned using what is known >>>> as tablets that make up tables. Kudu replicates these tablets to other >>>> nodes for redundancy. >>>> >>>> >>>> As you said there are a number of options. Kudu also claims in-place >>>> updates that needs to be tried for its consistency. >>>> >>>> Cheers >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> On 24 July 2017 at 08:30, Jörn Franke <jornfra...@gmail.com> wrote: >>>> >>>>> I guess you have to find out yourself with experiments. Cloudera has >>>>> some benchmarks, but it always depends what you test, your data volume and >>>>> what is meant by "fast". It is also more than a file format with servers >>>>> that communicate with each other etc. - more complexity. >>>>> Of course there are alternatives that you could benchmark again, such >>>>> as Apache HAWQ (which is basically postgres on Hadoop), Apache ignite or >>>>> depending on your analysis even Flink or Spark Streaming. >>>>> >>>>> On 24. Jul 2017, at 09:25, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>> wrote: >>>>> >>>>> hi, >>>>> >>>>> Has anyone had experience of using Kudu for faster analytics with >>>>> Spark? >>>>> >>>>> How efficient is it compared to usinh HBase and other traditional >>>>> storage for fast changing data please? >>>>> >>>>> Any insight will be appreciated. >>>>> >>>>> Thanks >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >