Re: RE: Fast write datastore...

2017-03-16 Thread Sudhir Menon
ate previously computed spark results. > > Regards, > Yohann > > > -- > *De :* Rick Moritz > *Envoyé :* jeudi 16 mars 2017 10:37 > *À :* user > *Objet :* Re: RE: Fast write datastore... > > If you have enough RAM/SSDs available, maybe ti

RE: RE: Fast write datastore...

2017-03-16 Thread Mal Edwin
/aggregate previously computed spark results. > > Regards, > Yohann > > De : Rick Moritz > Envoyé : jeudi 16 mars 2017 10:37 > À : user > Objet : Re: RE: Fast write datastore... > > If you have enough RAM/SSDs available, maybe tiered HDFS storage and Parquet > might al

RE: RE: Fast write datastore...

2017-03-16 Thread yohann jardin
Objet : Re: RE: Fast write datastore... If you have enough RAM/SSDs available, maybe tiered HDFS storage and Parquet might also be an option. Of course, management-wise it has much more overhead than using ES, since you need to manually define partitions and buckets, which is suboptimal. On the

Re: RE: Fast write datastore...

2017-03-16 Thread Rick Moritz
; >> >> *From:* Vova Shelgunov [mailto:vvs...@gmail.com] >> *Sent:* Wednesday, March 15, 2017 11:51 PM >> *To:* Muthu Jayakumar >> *Cc:* vincent gromakowski ; Richard >> Siebeling ; user ; Shiva >> Ramagopal >> *Subject:* Re: Fast write datastore... >&

RE: Fast write datastore...

2017-03-15 Thread jasbir.sing
Hi, Will MongoDB not fit this solution? From: Vova Shelgunov [mailto:vvs...@gmail.com] Sent: Wednesday, March 15, 2017 11:51 PM To: Muthu Jayakumar Cc: vincent gromakowski ; Richard Siebeling ; user ; Shiva Ramagopal Subject: Re: Fast write datastore... Hi Muthu,. I did not catch from

Re: Fast write datastore...

2017-03-15 Thread Muthu Jayakumar
>Reading your original question again, it seems to me probably you don't need a fast data store Shiva, You are right. I only asked about fast-write and never mentioned on read :). For us, Cassandra may not be a choice of read because of its a. limitations on pagination support on the server side b.

Re: Fast write datastore...

2017-03-15 Thread Shiva Ramagopal
Hi, The choice of ES vs Cassandra should really be made depending on your query use-cases. ES and Cassandra have their own strengths which should be matched to what you want to do rather than making a choice based on their respective feature sets. Reading your original question again, it seems to

Re: Fast write datastore...

2017-03-15 Thread Koert Kuipers
we are using elasticsearch for this. the issue of elasticsearch falling over if the number of partitions/cores in spark writing to it is too high does suck indeed. and the answer every time i asked about it on elasticsearch mailing list has been to reduce spark tasks or increase elasticsearch node

Re: Fast write datastore...

2017-03-15 Thread Vova Shelgunov
Hi Muthu,. I did not catch from your message, what performance do you expect from subsequent queries? Regards, Uladzimir On Mar 15, 2017 9:03 PM, "Muthu Jayakumar" wrote: > Hello Uladzimir / Shiva, > > From ElasticSearch documentation (i have to see the logical plan of a > query to confirm), t

Re: Fast write datastore...

2017-03-15 Thread Muthu Jayakumar
Hello Uladzimir / Shiva, >From ElasticSearch documentation (i have to see the logical plan of a query to confirm), the richness of filters (like regex,..) is pretty good while comparing to Cassandra. As for aggregates, i think Spark Dataframes is quite rich enough to tackle. Let me know your thoug

Re: Fast write datastore...

2017-03-15 Thread Shiva Ramagopal
Probably Cassandra is a good choice if you are mainly looking for a datastore that supports fast writes. You can ingest the data into a table and define one or more materialized views on top of it to support your queries. Since you mention that your queries are going to be simple you can define you

Re: Fast write datastore...

2017-03-15 Thread Muthu Jayakumar
Hello Vincent, Cassandra may not fit my bill if I need to define my partition and other indexes upfront. Is this right? Hello Richard, Let me evaluate Apache Ignite. I did evaluate it 3 months back and back then the connector to Apache Spark did not support Spark 2.0. Another drastic thought ma

Re: Fast write datastore...

2017-03-15 Thread Richard Siebeling
maybe Apache Ignite does fit your requirements On 15 March 2017 at 08:44, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Hi > If queries are statics and filters are on the same columns, Cassandra is a > good option. > > Le 15 mars 2017 7:04 AM, "muthu" a écrit : > > Hello there, >

Re: Fast write datastore...

2017-03-15 Thread vincent gromakowski
Hi If queries are statics and filters are on the same columns, Cassandra is a good option. Le 15 mars 2017 7:04 AM, "muthu" a écrit : Hello there, I have one or more parquet files to read and perform some aggregate queries using Spark Dataframe. I would like to find a reasonable fast datastore