You can know endpoints which cassandra will store your key to with
getNaturalEndpoints, but you can't specify endpoint you want to use with this
API.
Partitioner decides which key will go to which node. With OPP, you may be able
to predicate which key range will be stored to a node, so you can c
0.8 under load may turn out to be more stable and well behaving than any
release so far
Been doing a few test runs stuffing more than 1 billion records into a 12
node cluster and thing looks better than ever.
VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or
any of the ot
On Sun, Jun 5, 2011 at 11:26 PM, Maki Watanabe wrote:
> getNaturalEndpoints tells you which key will be stored on which nodes,
> but we can't force cassandra to store given key to specific nodes.
>
> maki
I'm confused. Didn't you mention previously that I can use
OrderPreservingPartitioner to sto
Index updates require read-before-write (to find out what the prior
version was, if any, and update the index accordingly). This is
random i/o.
Index creation on the other hand is a lot of sequential i/o, hence
more efficient.
So, the classic bulk load advice to ingest data prior to creating
ind
getNaturalEndpoints tells you which key will be stored on which nodes,
but we can't force cassandra to store given key to specific nodes.
maki
2011/6/6 mcasandra :
>
> Khanh Nguyen wrote:
>>
>> Is there a way to tell where a piece of data is stored in a cluster?
>> For example, can I tell if Last
It may not what you want, but please read about Network Topology Strategy and
DC_QUORUM.
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
You can configure your Cassandra "Data Center aware" . Your read and write will
be resolved in DC local, but will be replica
Fair enough.
I do have to keep reminding myself that a REST interface requires text.
And it does make more sense, at least, when coming from a human as
opposed to when you make a computer spend cycles converting binary to
text just so another computer can spend cycles turning it back again.
On Su
Khanh Nguyen wrote:
>
> Is there a way to tell where a piece of data is stored in a cluster?
> For example, can I tell if LastNameColumn['A'] is stored at node 1 in
> the ring?
>
I have not used it but you can see getNaturalEndpoints in jmx. It will tell
you which nodes are responsible for a gi
You can create columns without values.
Are you talking about reading them back through the API ?
I would suggest looking at your data model to see if there is a better way to
support your read patterns.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://
From what I've seen of CQL there is no comparison between the potential
complexity of a CQL statement and that of a SQL statement. IMHO CQL is more or
less a human readable form of the current API, it does not add features. SQL
statements are arbitrarily complex and may generate many possible qu
Ops, I misread "150 GB" in one of your earlier emails as "150 MB" so forget
what I said before. You have loads of free space :)
How many files do you have in your data directory ? If it's 1 then that log
message was a small bug, that has been fixed.
Cheers
-
Aaron Morton
Freel
I did a insertion test with and without secondary indexes, and found that:
Without secondary index: ~10864 rows inserted per second
With secondary index on one column(BytesType): ~1515 rows inserted per
second
Is this normal? why secondary index would have so much affect?
I noticed that If I bu
It would be definetely useful to be able to have columns (or super columns)
names WITHOUT their values. If these ones are pretty big or if there are a
lot of columns, that would generate traffic not necessarily needed (if in
the end you are just interrested by some column).
Moreover it doesn't seem
So I can have one PagedIndex CF that holdes a row for each data file I am
processing.
The columns for that row (in my example) would have X columns and I can make
those columns values be 100 strings that represent keys in another PagedData
CF
This other PagedData CF for each row would have 10,000
If you need to parallelize (and scale) you need to distribute across
multiple rows. One Big Row means all your 100 workers are hammering
the same 3 (for instance) replicas at the same time.
On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein wrote:
> What is the best practices here to page and slice col
You're going to need to get a lot more specific.
On Sun, Jun 5, 2011 at 12:12 PM, Kevin wrote:
> Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't
> delete any rows from the CLI, regardless of the type they're stored as.
>
>
>
> -Original Message-
> From: Jonathan E
On Sun, Jun 5, 2011 at 2:17 PM, mcasandra wrote:
> Please give more detailed info about what exactly you are worried about or
> trying to solve.
In general, we are trying to devise a partitioning and replication
scheme that takes into account social relations between data.
> Please take a step b
Great. Thank you, Eric.
-k
On Sun, Jun 5, 2011 at 2:13 PM, Eric tamme wrote:
> On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen
> wrote:
>> Hi Maki and Adrian,
>>
>> Thank you very much for the promptness. It's weekend after all :).
>>
>> I realized I forgot a part of my question until Adrian men
What is the best practices here to page and slice columns from a row.
So lets say I have 1,000,000 columns in a row
I read the row but want to have 1 thread read columns 0 - , second
thread (actor in my case) 1 - 1 ... and so on so i can have 100
workers processing 10,000 columns for
Please give more detailed info about what exactly you are worried about or
trying to solve.
Please take a step back and look at cassandra's architecture again and what
it's trying to solve. It's a distributed database so if you do what you are
describing there is a potential of getting hotspots. W
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen wrote:
> Hi Maki and Adrian,
>
> Thank you very much for the promptness. It's weekend after all :).
>
> I realized I forgot a part of my question until Adrian mentioned the
> replication factor. Is it also possible to set where the replicas are
> store
Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't
delete any rows from the CLI, regardless of the type they're stored as.
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Sunday, June 05, 2011 10:56 AM
To: user@cassandra.apache.org
Subject:
Hi Maki and Adrian,
Thank you very much for the promptness. It's weekend after all :).
I realized I forgot a part of my question until Adrian mentioned the
replication factor. Is it also possible to set where the replicas are
stored as well? Thanks.
This is a research experiment we're exploring
On Sun, 2011-06-05 at 00:51 -0400, Jeffrey Kesselman wrote:
> Is CQL really the path for the future for Cassandra?
CQL is no more or less "official" than the Thrift interface, and TTBMK,
there is no secret cabal that met to decide it would be The Way. People
will use what works best for them, and
On Sun, Jun 5, 2011 at 9:38 AM, Timo Nentwig wrote:
> Hmm, worked-around that by setting -Dcassandra.config (hmm, the client needs
> the server's config...?).
Yes, this is fixed for 0.8.1.
> Not very verbose :-\ May have something to do with my l/p being just "/" for
> AllowAll.
Correct, that's
If you're not using 0.8.0 the cli deals poorly with non-string row keys.
On Sat, Jun 4, 2011 at 7:48 PM, Kevin wrote:
> Currently I'm using a client (Pelops) to insert UUIDs (both lexical and
> time) in to Cassandra. I haven't yet implemented a facility to remove them
> with Pelops; i'm testing a
You may be swapping.
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
explains how to check this as well as how to see what threads are busy
in the Java process.
On Sat, Jun 4, 2011 at 5:34 PM, Philippe wrote:
> Hello,
> I am evaluating using cassandra and I'm running into some s
On 6/5/11 16:26, Timo Nentwig wrote:
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell
org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks
2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection -
Connected to localhost:9160
2011-06-05
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell
org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks
2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection -
Connected to localhost:9160
2011-06-05 16:21:54,517 ERROR [main]
org.apache
Perfect thanks!
On Sun, Jun 5, 2011 at 4:43 AM, Victor Kabdebon
wrote:
> Again I don't really know the specifics of Solandra but in Solr (so
> Solandra being a cousin of Solr it should be true too) you have XML fields
> like this :
>
> Just turn indexed to false and it's not going to be indexed.
I found a patch for the php extension here:
https://issues.apache.org/jira/browse/THRIFT-1067
… this seemed to fix the issue. Thank you Jonathan and Aaron for taking time
to provide me with some help!
Regarding the compaction I would still love to hear your feedback on how to
configure Cassandra
I tracked down the timestamp submission and everything was fine within the
PHP Libraries.
The thrift php extension however seems to have an overflow, because it was
now setting now timestamps with also negative values ( -1242277493 ). I
disabled the php extension and as a result I now got correct
Thanks for the feedback Aaron!
The schema of the CF is default, I just defined the name and the rest is
default, have a look:
Keyspace: TestKS
Read Count: 65
Read Latency: 657.8047076923076 ms.
Write Count: 10756
Write Latency: 0.03237039791744143 ms.
Pending Tasks: 0
Column Family: CFTest
SSTa
Again I don't really know the specifics of Solandra but in Solr (so Solandra
being a cousin of Solr it should be true too) you have XML fields like this
:
Just turn indexed to false and it's not going to be indexed...
Thrift won't affect Solandra at all.
2011/6/4 Jean-Nicolas Boulay Desjardins
34 matches
Mail list logo