Re: Experiences about NoSQL databases with Spark

2015-12-06 Thread ayan guha
Hi I have a general question. I want to do a real time aggrega*tion using spark. I have kinesis as source and planning ES as data source. there might be close to 2000 distinct events possible. I want to keep a runnning count of how many times each event occurs.* *Currently upon receiving an event

Re: Experiences about NoSQL databases with Spark

2015-12-05 Thread Nick Pentreath
I've had great success using Elasticsearch with Spark - the integration works well (both ways - reading and indexing) and ES + Kibana makes a powerful event / time-series storage, aggregation and data visualization stack. — Sent from Mailbox On Sun, Dec 6, 2015 at 9:07 AM, manasdebashiskar

Re: Experiences about NoSQL databases with Spark

2015-12-05 Thread manasdebashiskar
Depends on your need. Have you looked at Elastic search, or Accumulo or Cassandra? If post processing of your data is not your motive and you want to just retrieve the data later greenplum(based on postgresql) can be an alternative. in short there are many NOSQL out there with each having differen

Re: Experiences about NoSQL databases with Spark

2015-11-28 Thread Jörn Franke
I would not use MongoDB because it does not fit well into the Spark or Hadoop architecture. You can use it if your data amount is very small and already preaggregated, but this is a very limited use case. You can use Hbase or with future versions of Hive (if they use TEZ > 0.8) For interactive q

Re: Experiences about NoSQL databases with Spark

2015-11-28 Thread Yu Zhang
BTW, if you decide to try the mongodb, please use the 3.0+ version with "wiredtiger" engine. On Sat, Nov 28, 2015 at 11:30 PM, Yu Zhang wrote: > If you need to construct multiple indexes, hbase will perform better, the > writing speed is slow in mongodb with many indexes and the memory cost is >

Re: Experiences about NoSQL databases with Spark

2015-11-28 Thread Yu Zhang
If you need to construct multiple indexes, hbase will perform better, the writing speed is slow in mongodb with many indexes and the memory cost is huge! But my concern is: with mongodb, you could easily cooperate with js and with some visualization tools like D3.js, the work will become smooth as

Re: Experiences about NoSQL databases with Spark

2015-11-24 Thread Ted Yu
You should consider using HBase as the NoSQL database. w.r.t. 'The data in the DB should be indexed', you need to design the schema in HBase carefully so that the retrieval is fast. Disclaimer: I work on HBase. On Tue, Nov 24, 2015 at 4:46 AM, sparkuser2345 wrote: > I'm interested in knowing wh