Typo. I am talking about spark only.
Thanks
Oleg.
On Thursday, September 11, 2014, DuyHai Doan wrote:
> Stupid question: do you really need both Storm & Spark ? Can't you
> implement the Storm jobs in Spark ? It will be operationally simpler to
> have less moving parts. I'm not saying that Stor
Could you post the results of jstack on the process somewhere?
On Thu, Sep 11, 2014 at 7:07 AM, Robert Coli wrote:
> On Wed, Sep 10, 2014 at 1:53 PM, Eduardo Cusa <
> eduardo.c...@usmediaconsulting.com> wrote:
>
>> No, is still running the Mutation Stage.
>>
>
> If you're sure that it is not re
On Wed, Sep 10, 2014 at 1:53 PM, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> No, is still running the Mutation Stage.
>
If you're sure that it is not receiving Hinted Handoff, then the only
mutations in question can be from the replay of the commit log.
The commit log should take
With the official release of 2.1, I highly recommend using the new stress
tool bundled with it - it is improved in many ways over the tool in 2.0,
and is compatible with older clusters.
It supports the same simple mode of operation as the old stress, with
better command line interface and more acc
No, is still running the Mutation Stage.
On Wed, Sep 10, 2014 at 5:38 PM, Robert Coli wrote:
> On Wed, Sep 10, 2014 at 12:16 PM, Eduardo Cusa <
> eduardo.c...@usmediaconsulting.com> wrote:
>
>> Yes, I restarted the node becaouse the write latency was 2500 ms, when
>> usually is 5 ms.
>>
>
> And
On Wed, Sep 10, 2014 at 12:16 PM, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> Yes, I restarted the node becaouse the write latency was 2500 ms, when
> usually is 5 ms.
>
And did that help?
=Rob
agreed
On Sep 10, 2014, at 3:27 PM, olek.stas...@gmail.com wrote:
> You're right, there is no data in tombstone, only a column name. So
> there is only small overhead of disk size after delete. But i must
> agree with post above, it's pointless in deleting prior to inserting.
> Moreover, it needs
You're right, there is no data in tombstone, only a column name. So
there is only small overhead of disk size after delete. But i must
agree with post above, it's pointless in deleting prior to inserting.
Moreover, it needs one op more to compute resulting row.
cheers,
Olek
2014-09-10 22:18 GMT+02
delete inserts a tombstone which is likely smaller than the original record
(though still (currently) has overhead of cost for full key/column name
the data for the insert after a delete would be identical to the data if you
just inserted/updated
no real benefit I can think of for doing the dele
I have a datacenter with a single node, and I want to start using vnodes. I
have followed the instructions (
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html),
and set up a new node in a new datacenter (auto_bootstrap=false, seed=node
in old dc,
I think so.
this is how i see it:
on the very beginning you have such line in datafile:
{key: [col_name, col_value, date_of_last_change]} //something similar,
i don't remember now
after delete you're adding line:
{key:[col_name, last_col_value, date_of_delete, 'd']} //this d
indicates that field i
Yes, I restarted the node becaouse the write latency was 2500 ms, when
usually is 5 ms.
On Wed, Sep 10, 2014 at 4:05 PM, Robert Coli wrote:
> On Wed, Sep 10, 2014 at 12:03 PM, Eduardo Cusa <
> eduardo.c...@usmediaconsulting.com> wrote:
>
>> Yes, the tpstats is printing. The Opcenter show the no
Yes, the tpstats is printing. The Opcenter show the node down.
Also OpCenter show the following status:
Status: Unresponsive - Starting
Gossip:Down
Thrift:Down
Native Transport: Down
Pending Tasks: 0
On Wed, Sep 10, 2014 at 3:54 PM, Robert Coli wrote:
> On Wed, Sep 10, 2014 at 11:38 AM
Hi Oleg.
Spark can be configured to have high availability without the need for
Mesos (
https://spark.apache.org/docs/latest/spark-standalone.html#high-availability),
for instance using Zookeeper and standby masters. If I'm not wrong Storm
doesn't need Mesos to work, so I imagine you use it to mak
On Wed, Sep 10, 2014 at 12:03 PM, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> Yes, the tpstats is printing. The Opcenter show the node down.
>
Have you recently restarted it or anything?
If not, try doing so?
=Rob
On Wed, Sep 10, 2014 at 11:38 AM, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> Actually the node is *down*.
>
The node can't be that "down" if it's printing tpstats...
https://issues.apache.org/jira/browse/CASSANDRA-4162
?
=Rob
Good to know. Thanks, DuyHai! I'll take a look (but most probably tomorrow
;-))
Paco
2014-09-10 20:15 GMT+02:00 DuyHai Doan :
> Source code check for the Java version:
> https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/sp
On 10.09.14 02:09, Robert Coli wrote:
On Tue, Sep 9, 2014 at 2:36 PM, Eugene Voytitsky mailto:viy@gmail.com>> wrote:
As I understand, atomic batch for counters can't work correctly
(atomically) prior to 2.1 because of counters implementation.
[Link:
http://www.datastax.com/de
Hello, I have a node that is in MutationStage for the last 5 hours.
Actually the node is *down*.
The pendings task go from 776 to 110 and then to 964.
There are some way to finish this stage?
The last heavy write workload was 5 days ago.
Pool NameActive Pending Com
Stupid question: do you really need both Storm & Spark ? Can't you
implement the Storm jobs in Spark ? It will be operationally simpler to
have less moving parts. I'm not saying that Storm is not the right fit, it
may be totally suitable for some usages.
But if you want to avoid the SPOF thing an
Interesting things actually:
We have hadoop in our eco system. It has single point of failure and I
am not sure about inter data center replication.
Plan is to use cassandra - no single point of failure , there is data
center replication.
For aggregation/transformation using SPARK. BUT storm r
Source code check for the Java version:
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/spark/connector/RDDJavaFunctions.java#L26
It's using the RDDFunctions from scala code so yes, it's Java driver again.
On Wed, Sep 10
"As far as I know, the Datastax connector uses thrift to connect Spark with
Cassandra although thrift is already deprecated, could someone confirm this
point?"
--> the Scala connector is using the latest Java driver, so no there is no
Thrift there.
For the Java version, I'm not sure, have not lo
Hi Oleg,
Stratio Deep is just a library you must include in your Spark deployment
so it doesn't guarantee any high availability at all. To achieve HA you
must use Mesos or any other 3rd party resource manager.
Stratio doesn't currently support PySpark, just Scala and Java. Perhaps
in the fut
Consider that I have configured 1 Mb of key-cache (Consider it can hold 13000
of keys ).
Then I wrote some records in a column family(say 2).
Then read it at first (All keys sequentially in the same order used to write ),
and keys are started to stored in key-cache.
When the read reached @
"can you share please where can I read about mesos integration for HA and
StandAlone mode execution?" --> You can find all the info in the Spark
documentation, read this:
http://spark.apache.org/docs/latest/cluster-overview.html
Basically, you have 3 choices:
1) Stand alone mode: get your hands
Would the factor before compaction be always 2 ?
On Wed, Sep 10, 2014 at 6:38 PM, olek.stas...@gmail.com <
olek.stas...@gmail.com> wrote:
> IMHO, delete then insert will take two times more disk space then
> single insert. But after compaction the difference will disappear.
> This was true in ver
Thanks for the info.
can you share please where can I read about mesos integration for HA and
StandAlone mode execution?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan wrote:
> Hello Oleg
>
> Question 2: yes. The official spark cassandra connector can be found here:
> https://git
Great stuff Paco.
Thanks for sharing.
Couple of questions:
Is it required additional installation to be HA like apache mesos?
Are you supporting PySpark?
How stable /ready for production ?
Thanks
Oleg.
On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador <
pmad...@stratio.com> wrote:
>
IMHO, delete then insert will take two times more disk space then
single insert. But after compaction the difference will disappear.
This was true in version prior to 2.0, but it should still work this
way. But maybe someone will correct me, if i'm wrong.
Cheers,
Olek
2014-09-10 18:30 GMT+02:00 Mi
One insert would be much better e.g. for performance and network latency.
I wanted to know if there is a significant difference (apart from
additional commit log entry) in the used storage between these 2 use cases.
Multi-dc is available in every version of Cassandra.
On Wed, Sep 10, 2014 at 9:21 AM, Oleg Ruchovets wrote:
> Thank you very much for the links.
> Just to be sure: is this capability available for COMMUNITY ADDITION?
>
> Thanks
> Oleg.
>
> On Wed, Sep 10, 2014 at 11:49 PM, Alain RODRIGUEZ
> wr
Thank you very much for the links.
Just to be sure: is this capability available for COMMUNITY ADDITION?
Thanks
Oleg.
On Wed, Sep 10, 2014 at 11:49 PM, Alain RODRIGUEZ
wrote:
> Hi Oleg,
>
> Yes Replication cross DC is something available for a long time already,
> so it is assumed to be stabl
Hello Oleg
Question 2: yes. The official spark cassandra connector can be found here:
https://github.com/datastax/spark-cassandra-connector
There is docs in the doc/ folder. You can read & write directly from/to
Cassandra without EVER using HDFS. You still need a resource manager like
Apache Meso
My understanding is that a update is the same as an insert. So I would
think delete+insert is a bad idea. Also insert+delete would put 2 entries
in the commit log.
On Sep 10, 2014 9:49 AM, "Michal Budzyn" wrote:
> Is there any serious difference in the used disk and memory storage
> between upser
Hi Oleg,
If you want to use cassandra+spark without hadoop, perhaps Stratio Deep
is your best choice (https://github.com/Stratio/stratio-deep). It's an
open-source Spark + Cassandra connector that doesn't make any use of
Hadoop or Hadoop component.
http://docs.openstratio.org/deep/0.3.3/abou
Hi Oleg,
Yes Replication cross DC is something available for a long time already, so
it is assumed to be stable.
As discussed in this thread, Cassandra documentation is often outdated or
inexistant, the alternative is datastax one.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/in
Is there any serious difference in the used disk and memory storage between
upsert and delete + insert ?
e.g. 2 vs 2A + 2B.
PK ((key), version, c1)
1. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X1”)
...
2. INSERT INTO A (key , version , c1, val) values (1, 1, 4711, “X2”)
Vs.
2A
Hi ,
I try to evaluate different option of spark + cassandra and I have couple
of questions:
My aim is to use cassandra+spark without hadoop:
1) Is it possible to use only cassandra as input/output parameter for
PySpark?
2) In case I'll use Spark (java,scala) is it possible to use only
cass
Hi All.
Is multi datacenter replication capability available in community
addition?
If yes can someone share the experience how stable is it and where can I
read the best practice of it?
Thanks
Oleg.
Will try... Thank you
Rahul Neelakantan
> On Sep 10, 2014, at 12:01 AM, Rahul Menon wrote:
>
> I use jmxterm. http://wiki.cyclopsgroup.org/jmxterm/ attach it to your c*
> process and then use the org.apache.cassandra.db:HintedHandoffManager bean
> and run deleteHintsforEndpoint to drop hint
41 matches
Mail list logo