.org
>>>
>>> Which then act as a layer between Cassandra and your applications
>>> storing into Cassandra (memory datagrid I think it is called)
>>>
>>> Basically, think of it as a big cache
>>>
>>> It is an in-memory thingi ☺
>>>
&
ssandra (memory datagrid I think it is called)
>>
>> Basically, think of it as a big cache
>>
>> It is an in-memory thingi ☺
>>
>> And then you can run some super fast queries
>>
>>
>>
>> -Tobias
>>
>>
>>
>> *From:
Tobias
>
>
>
> *From: *DuyHai Doan
> *Date: *Thursday, 8 June 2017 at 15:42
> *To: *Tobias Eriksson
> *Cc: *한 승호 , "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> *Subject: *Re: Cassandra & Spark
>
>
>
> Interesting
>
I think that is probably a question for the Spark Connector forum:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
as
it’s much more related to the function of the connector than functionality
of Cassandra itself.
Cheers
Ben
On Sat, 17 Mar 2018 at 21:18 onmsteste
Note that read repairs only occur for QUORUM/equivalent and higher, and
also with a 10% (default) chance on anything less than QUORUM
(ONE/LOCAL_ONE). This is configured at the table level through the
dclocal_read_repair_chance and read_repair_chance settings (which are going
away in 4.0). So if yo
Hi Ben,
That makes sense. I also read about "read repairs". So, once an
inconsistent record is read, cassandra synchronizes its replicas on other
nodes as well. I ran the same spark query again, this time with default
consistency level (LOCAL_ONE) and the result was correct.
Thanks again for the
Hi Faraz
Yes, it likely does mean there is inconsistency in the replicas. However,
you shouldn’t be too freaked out about it - Cassandra is design to allow
for this inconsistency to occur and the consistency levels allow you to
achieve consistent results despite replicas not being consistent. To k
Thanks a lot for the response.
Setting consistency to ALL/TWO started giving me consistent count results
on both cqlsh and spark. As expected, my query time has increased by 1.5x (
Before, it was taking ~1.6 hours but with consistency level ALL, same query
is taking ~2.4 hours to complete.)
Does
Both CQLSH and the Spark Cassandra query at consistent level ONE (LOCAL_ONE
for Spark connector) by default so if there is any inconsistency in your
replicas this can resulting in inconsistent query results.
See http://cassandra.apache.org/doc/latest/tools/cqlsh.html and
https://github.com/datasta
The fact that cqlsh itself gives different results tells me that this has
nothing to do with spark. Moreover, spark results are monotonically
increasing which seem to be more consistent than cqlsh. so I believe
spark can be taken out of the equation.
Now, while you are running these queries is th
ng into
Cassandra (memory datagrid I think it is called)
Basically, think of it as a big cache
It is an in-memory thingi ☺
And then you can run some super fast queries
-Tobias
From: DuyHai Doan
Date: Thursday, 8 June 2017 at 15:42
To: Tobias Eriksson
Cc: 한 승호 , "user@cassandra.apache.org&qu
Interesting
Tobias, when you said "Instead we transferred the data to Apache Kudu", did
you transfer all Cassandra data into Kudu from with a single migration and
then tap into Kudo for aggregation or did you run data import every
day/week/month from Cassandra into Kudu ?
>From my point of view,
Hi
Something to consider before moving to Apache Spark and Cassandra
I have a background where we have tons of data in Cassandra, and we wanted to
use Apache Spark to run various jobs
We loved what we could do with Spark, BUT….
We realized soon that we wanted to run multiple jobs in parallel
Some
If you use Containers like Docker Plan A can work provided you do the
resource and capacity planning. I tend to think that Plan B is more
Standard and easier Although you can wait to hear from others for a second
opinion.
Caution: Data Locality will make sense if the Disk throughput is
significant
Disclaimer: I have worked for DataStax.
Cassandra is fairly good for log analytics and has been used many places
for that (
https://www.usenix.org/conference/lisa14/conference-program/presentation/josephsen
). Of course, requirements vary from place to place, but it has been a good
fit. Spark and
Though DSE cassandra comes with hadoop integration, this is clearly is use case
for hadoop.
Any reason why cassandra is your first choice?
> On 23 Jul 2015, at 6:12 a.m., Pierre Devops wrote:
>
> Cassandra is not very good at massive read/bulk read if you need to retrieve
> and compute a la
Cassandra is not very good at massive read/bulk read if you need to
retrieve and compute a large amount of data on multiple machines using
something like spark or hadoop (or you'll need to hack and process the
sstable directly, something which is not "natively" supported, you'll have
to hack your w
Hi Oleg,
Connectors don't deal with HA, they rely on Spark for that, so neither
the Datastax connector, Stratio Deep nor Calliope have anything to do
with Spark's HA. You should have previously configured Spark so that it
meets your high availability needs. Furthermore, as I mentioned in a
pr
Thank you Rohit.
I sent the email to you.
Thanks
Oleg.
On Thu, Sep 11, 2014 at 10:51 PM, Rohit Rai wrote:
> Hi Oleg,
>
> I am the creator of Calliope. Calliope doesn't force any deployment
> model... that means you can run it with Mesos or Hadoop or Standalone. To
> be fair I don't think the
Hi Oleg,
I am the creator of Calliope. Calliope doesn't force any deployment
model... that means you can run it with Mesos or Hadoop or Standalone. To
be fair I don't think the other libs mentioned here should work too.
The Spark cluster HA can be provided using ZooKeeper even in the standalone
d
Ok.
DataStax , Startio are required mesos, hadoop yarn other third party to
get spark cluster HA.
What in case of calliope?
Is it sufficient to have cassandra + calliope + spark to be able process
aggregations?
In my case we have quite a lot of data so doing aggregation only in memory
- impossi
2. "still uses thrift for minor stuff" --> I think that the only call using
thrift is "describe_ring" to get an estimate of ratio of partition keys
within the token range
3. Stratio has a talk today at the SF Summit, presenting Stratio META. For
the folks not attending the conference, video should
Adding to conversation...
there are 3 great open source options available
1. Calliope http://tuplejump.github.io/calliope/
This is the first library that was out some time late last year (as i
can recall) and I have been using this for a while, mostly very stable,
uses Hadoop i/o in Cassandra
Typo. I am talking about spark only.
Thanks
Oleg.
On Thursday, September 11, 2014, DuyHai Doan wrote:
> Stupid question: do you really need both Storm & Spark ? Can't you
> implement the Storm jobs in Spark ? It will be operationally simpler to
> have less moving parts. I'm not saying that Stor
Hi Oleg.
Spark can be configured to have high availability without the need for
Mesos (
https://spark.apache.org/docs/latest/spark-standalone.html#high-availability),
for instance using Zookeeper and standby masters. If I'm not wrong Storm
doesn't need Mesos to work, so I imagine you use it to mak
Good to know. Thanks, DuyHai! I'll take a look (but most probably tomorrow
;-))
Paco
2014-09-10 20:15 GMT+02:00 DuyHai Doan :
> Source code check for the Java version:
> https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/sp
Stupid question: do you really need both Storm & Spark ? Can't you
implement the Storm jobs in Spark ? It will be operationally simpler to
have less moving parts. I'm not saying that Storm is not the right fit, it
may be totally suitable for some usages.
But if you want to avoid the SPOF thing an
Interesting things actually:
We have hadoop in our eco system. It has single point of failure and I
am not sure about inter data center replication.
Plan is to use cassandra - no single point of failure , there is data
center replication.
For aggregation/transformation using SPARK. BUT storm r
Source code check for the Java version:
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/spark/connector/RDDJavaFunctions.java#L26
It's using the RDDFunctions from scala code so yes, it's Java driver again.
On Wed, Sep 10
"As far as I know, the Datastax connector uses thrift to connect Spark with
Cassandra although thrift is already deprecated, could someone confirm this
point?"
--> the Scala connector is using the latest Java driver, so no there is no
Thrift there.
For the Java version, I'm not sure, have not lo
"can you share please where can I read about mesos integration for HA and
StandAlone mode execution?" --> You can find all the info in the Spark
documentation, read this:
http://spark.apache.org/docs/latest/cluster-overview.html
Basically, you have 3 choices:
1) Stand alone mode: get your hands
Thanks for the info.
can you share please where can I read about mesos integration for HA and
StandAlone mode execution?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan wrote:
> Hello Oleg
>
> Question 2: yes. The official spark cassandra connector can be found here:
> https://git
Great stuff Paco.
Thanks for sharing.
Couple of questions:
Is it required additional installation to be HA like apache mesos?
Are you supporting PySpark?
How stable /ready for production ?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador <
pmad...@stratio.com> wrote:
>
Hello Oleg
Question 2: yes. The official spark cassandra connector can be found here:
https://github.com/datastax/spark-cassandra-connector
There is docs in the doc/ folder. You can read & write directly from/to
Cassandra without EVER using HDFS. You still need a resource manager like
Apache Meso
34 matches
Mail list logo