MySQL and PgSQL scale to millions. Spark or any distributed/clustered
computing environment would be inefficient for the kind of data size you
mention. That's because of coordination of processes, moving data around
etc.
On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri
wrote:
> Even for 2L records
Hello,
Since RDDs are created from data from Hive tables or HDFS, how do we ensure
they are invalidated when the source data is updated?
Regards,
Ashish
ies on
> the same(custom to voltdb) store. Spark(SQL) is NOT suitable for
> transactions; it is designed for querying immutable data (which may exist
> in several different forms of stores).
>
> > On May 28, 2015, at 7:48 AM, Ashish Mukherjee <
> ashish.mukher...@gmail.com> wr
Hello,
I was wondering if there is any documented comparison of SparkSQL with
MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow
queries to be run in a clustered environment. What is the major
differentiation?
Regards,
Ashish
Hello,
Is there any published community roadmap for SparkSQL and the DataSources
API?
Regards,
Ashish
Hello,
As of now, if I have to execute a Spark job, I need to create a jar and
deploy it. If I need to run a dynamically formed SQL from a Web
application, is there any way of using SparkSQL in this manner? Perhaps,
through a Web Service or something similar.
Regards,
Ashish
grouping
and sorting.
Essentially, I am trying to evaluate if this API can give me much of what
is possible with the Apache MetaModel project.
Regards,
Ashish
On Tue, Mar 24, 2015 at 1:57 PM, Michael Armbrust
wrote:
> On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee <
> ashi
Hello,
I have some questions related to the Data Sources API -
1. Is the Data Source API stable as of Spark 1.3.0?
2. The Data Source API seems to be available only in Scala. Is there any
plan to make it available for Java too?
3. Are only filters and projections pushed down to the data source
Hello,
I understand Spark can be used with Hadoop or standalone. I have certain
questions related to use of the correct FS for Spark data.
What is the efficiency trade-off in feeding data to Spark from NFS v HDFS?
If one is not using Hadoop, is it still usual to house data in HDFS for
Spark to r
Hi,
I have been looking at Spark Streaming , which seems to be for the use case
of live streams which are processed one line at a time generally in
real-time.
Since SparkSQL reads data from some filesystem, I was wondering if there is
something which connects SparkSQL with Spark Streaming, so I c
Hi,
I am exploring SparkSQL for my purposes of performing large relational
operations across a cluster. However, it seems to be in alpha right now. Is
there any indication when it would be considered production-level? I don't
see any info on the site.
Regards,
Ashish
Hello,
I have the following scenario and was wondering if I can use Spark to
address it.
I want to query two different data stores (say, ElasticSearch and MySQL)
and then merge the two result sets based on a join key between the two. Is
it appropriate to use Spark to do this join, if the intermed
Hi,
I am planning to use Spark for a Web-based adhoc reporting tool on massive
date-sets on S3. Real-time queries with filters, aggregations and joins
could be constructed from UI selections.
Online documentation seems to suggest that SharkQL is deprecated and users
should move away from it. I u
13 matches
Mail list logo