date:20120223

Re: insert performance

2012-02-23 Thread Philippe

Definitely multi thread writes...probably with a little batching (10 or so).
That's how i get my peak throughput.
Le 23 févr. 2012 04:48, "Deno Vichas"  a écrit :

> all,
>
> would i be better off (i'm in java land) with spawning a bunch of
> threads that all add a single item to a mutator or a single thread that
> adds a bunch of items to a mutator?
>
>
> thanks,
> deno
>
>

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-23 Thread Bill Au

Thanks for the info.

Upgrade within the 1.0.x branch is simply a rolling restart, right?

Bill

On Thu, Feb 16, 2012 at 9:20 PM, Jonathan Ellis  wrote:

> CASSANDRA-3496, fixed in 1.0.4+
>
> On Thu, Feb 16, 2012 at 8:27 AM, Bill Au  wrote:
> > I am running 1.0.2 with the default tiered compaction.  After running a
> > "nodetool compact", I noticed that on about half of the machines in my
> > cluster, both "nodetool ring" and "nodetool info" report that the load is
> > actually higher than before when I expect it to be lower.  It is almost
> > twice as much as before.  I did a du command on the data directory and
> found
> > the the actual disk usage is only about half of what's being reported by
> > nodetool.  Since I am running 1.0.2, there are no compacted sstables
> waiting
> > to be removed.  I manually trigger a full GC in the JVM but that made no
> > difference.  When I restarted Cassandra, nodetool once again report the
> > correct load.
> >
> > Is this a known problem?
> >
> > Bill
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

EC2 Best Practices

2012-02-23 Thread Philip Shon

Are there any good resources for best practices when running Cassandra
within EC2? I'm particularly interested in the security issues, when the
servers communicating w/ Cassandra are outside of EC2.

Thanks,

-Phil

Re: Flume and Cassandra

2012-02-23 Thread Alain RODRIGUEZ

Tanks for all these informations. Twitter Kestrel-Storm-Cassandra solution
looks very powerfull, scalable and well documented. I'll try to use this
solution.

Alain

2012/2/23 Milind Parikh 

> Coolwww.countandra.org calls them cascaded counters and it will be
> also based on Kafka.
>
> /***
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> /
>
> On Feb 22, 2012 7:22 PM, "Edward Capriolo"  wrote:
>
> I have been working on IronCount
> (https://github.com/edwardcapriolo/IronCount/) which is designed to do
> what you are talking about. Kafka takes care of the distributed
> producer/consumer message queues and IronCount sets up custom
> consumers to process those messages.
>
> It might be what your are looking for. It is not as fancy as
> s4/storm/flume but that is supposed to be the charm of it.
>
>
> On Wed, Feb 22, 2012 at 1:55 PM, aaron morton 
> wrote:
> > Maybe Storm is wha...
>
>

Re: Best suitable value for flush_largest_memtables_at

2012-02-23 Thread aaron morton

> flush_largest_memtables_at
Is designed as a safety valve, reducing it may help prevent an oom but it wont 
get to the cause. 

Assuming you cannot just allocate more memory to the JVM, and you are running 
the default settings in cassandra-env.sh (other than the changes mentioned), 
and you are on 1.X

I would start with the following in order…

* set a value for memtable_total_space_in_mb in cassandra.yaml
* reduce CF caches
* reduce in_memory_compaction_limit and/or concurrent_compactors
 
Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/02/2012, at 4:21 PM, Roshan Pradeep wrote:

> Hi Experts
> 
> Under massive write load what would be the best value for Cassandra 
> flush_largest_memtables_at setting? Yesterday I got an OOM exception in one 
> of our production Cassandra node under heavy write load within 5 minute 
> duration. 
> 
> I change the above setting value to .45 and also change the 
> -XX:CMSInitiatingOccupancyFraction=45 in cassandra-env.sh file.
> 
> Previously the flush_largest_memtables_at was .75 and commit logs are flush 
> to SSTables and the size around 40MB. But with the change (reducing it to 
> .45) the flushed SStable size is 90MB.
> 
> Could someone please explain my configuration change will help under heavy 
> write load?
> 
> Thanks.

Re: How to delete a range of columns using first N components of CompositeType Column?

2012-02-23 Thread aaron morton

Unfortunately you can use column ranges for delete operations. 

So while what you want to do is something like...

Delete 'Jack:*:*'...'Jack:*:*' from Test where KEY = "friends";

You cannot do it. 

You need to read and then delete by name.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/02/2012, at 8:08 PM, Praveen Baratam wrote:

> More precisely,
> 
> Lets say we have a CF with the following spec.
> 
> create column family Test
> with comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)'
> and key_validation_class = 'UTF8Type'
> and default_validation_class = 'UTF8Type';
> 
> And I have columns such as:
> 
> Jack:Name:First - Jackson
> Jack:Name:Last -  Samuel
> Jack:Age - 50
> 
> Now To delete all columns related to Jack, I need to use as far as I can 
> comprehend
> 
> Delete 'Jack:Name:First', 'Jack:Name:Last', 'Jack:Age' from Test where KEY = 
> "friends";
> 
> The problem is we do not usually know what meta-data is associated with a 
> user as it may include Timestamp based columns.
> 
> such as: Jack:1234567890:Location - Chicago
> 
> Can something like -
> 
> Delete 'Jack' from Test where KEY = "friends";
> 
> be done using the First N components of the CompositeType?
> 
> Or should we read first and then delete?
> 
> Thank You.
> 
> On Thu, Feb 23, 2012 at 4:47 AM, Praveen Baratam  
> wrote:
> I am using CompositeType columns and its very convenient to query for a range 
> of columns using the First N components but how do I delete a range of 
> columns using the First N components of the CompositeType column.
> 
> In order to specify the exact column names to delete, I would have to read 
> first and then delete.
> 
> Is there a better way?
>

Re: EC2 Best Practices

2012-02-23 Thread aaron morton

General EC2 setup…

http://www.datastax.com/docs/1.0/install/install_ami
http://wiki.apache.org/cassandra/CloudConfig

Cassandra with a VPN on EC2. From memory it talks about using the VPN within 
EC2. 
http://blog.vcider.com/2011/09/running-cassandra-on-a-virtual-network-in-ec2/

Clients need a single port (9160) to talk to the cluster.

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/02/2012, at 3:46 AM, Philip Shon wrote:

> Are there any good resources for best practices when running Cassandra within 
> EC2? I'm particularly interested in the security issues, when the servers 
> communicating w/ Cassandra are outside of EC2.
> 
> Thanks,
> 
> -Phil

Re: GC performance in 1.0.x

2012-02-23 Thread Jonathan Ellis

30ms pauses are on the low side of normal for a 800MB young gen under
parnew.  We're not going to be able to get rid of those, although it
looks like you're seeing objects in the young gen die *very* quickly,
so cutting that to say 200MB might give you shorter (and more
frequent) pauses for the young gen.

If you can mail me the entire log file I can look to see if there were
any stop-the-world full collections, which is where you'd see the
multi-second pauses that we really want to avoid.

On Sat, Feb 18, 2012 at 12:17 PM, Edward Capriolo  wrote:
> MAX_HEAP_SIZE="9G"
> HEAP_NEWSIZE="800M"
>
> 2 socket quad core
> 44GB RAM
> Cassandra 1.0.7
>
> [edward@cdbla120 cassandra]$ free -g
>             total       used       free     shared    buffers     cached
> Mem:            43         34          8          0          0         25
> -/+ buffers/cache:          8         34
> Swap:            0          0          0
>
> # GC logging options -- uncomment to enable
> JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
> JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
> JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
> JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
> JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
> JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
> JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"
> JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
> JVM_OPTS="$JVM_OPTS -verbose:gc"
> JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
>
> Three major column families using compression on two out of three. I
> never really watched the gc logs too much before since i was disk
> bound, but in this case I have all the data in main memory so limiting
> pauses would be big for me.
>
> I see a good number of these:
>
> Total time for which application threads were stopped: 0.0301370 seconds
> Total time for which application threads were stopped: 0.0046390 seconds
>
> [edward@cdbla120 cassandra]$ tail -400 gc-1329578417.log
> Total Free Space: 637346254
> Max   Chunk Size: 138752484
> Number of Blocks: 482066
> Av.  Block  Size: 1322
> Tree      Height: 34
> Before GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 1829376
> Max   Chunk Size: 1829376
> Number of Blocks: 1
> Av.  Block  Size: 1829376
> Tree      Height: 1
> 10516.947: [ParNew
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
> - age   1:      48752 bytes,      48752 total
> : 7689K->5164K(737280K), 0.0173210 secs] 3594487K->3597704K(9355264K)After GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 636661736
> Max   Chunk Size: 138752484
> Number of Blocks: 481773
> Av.  Block  Size: 1321
> Tree      Height: 34
> After GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 1829376
> Max   Chunk Size: 1829376
> Number of Blocks: 1
> Av.  Block  Size: 1829376
> Tree      Height: 1
> , 0.0204840 secs] [Times: user=0.19 sys=0.00, real=0.02 secs]
> Heap after GC invocations=3856 (full 4):
>  par new generation   total 737280K, used 5164K [0x0005bae0,
> 0x0005ece0, 0x0005ece0)
>  eden space 655360K,   0% used [0x0005bae0,
> 0x0005bae0, 0x0005e2e0)
>  from space 81920K,   6% used [0x0005e2e0,
> 0x0005e330b030, 0x0005e7e0)
>  to   space 81920K,   0% used [0x0005e7e0,
> 0x0005e7e0, 0x0005ece0)
>  concurrent mark-sweep generation total 8617984K, used 3592540K
> [0x0005ece0, 0x0007fae0, 0x0007fae0)
>  concurrent-mark-sweep perm gen total 35796K, used 21382K
> [0x0007fae0, 0x0007fd0f5000, 0x0008)
> }
> Total time for which application threads were stopped: 0.0251700 seconds
> Total time for which application threads were stopped: 0.0046210 seconds
> {Heap before GC invocations=3856 (full 4):
>  par new generation   total 737280K, used 660839K [0x0005bae0,
> 0x0005ece0, 0x0005ece0)
>  eden space 655360K, 100% used [0x0005bae0,
> 0x0005e2e0, 0x0005e2e0)
>  from space 81920K,   6% used [0x0005e2e0,
> 0x0005e3359df0, 0x0005e7e0)
>  to   space 81920K,   0% used [0x0005e7e0,
> 0x0005e7e0, 0x0005ece0)
>  concurrent mark-sweep generation total 8617984K, used 3592540K
> [0x0005ece0, 0x0007fae0, 0x0007fae0)
>  concurrent-mark-sweep perm gen total 35796K, used 21382K
> [0x0007fae0, 0x0007fd0f5000, 0x0008)
> 2012-02-18T13:15:35.980-0500: 10518.844: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 636661736
> Max   Chunk Size: 138752484
> Number of Blocks: 481773
> Av.  Block  Size: 1321
> Tree      Height: 34
> Before GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 1829376
> Max   Chunk Size: 1829376
> Number

Re: nodetool ring runs very slow

2012-02-23 Thread Jonathan Ellis

The only time I've seen nodetool be that slow is when it was talking
to a machine that was either swapping or deep into (JVM) GC storming.

On Wed, Feb 22, 2012 at 3:49 PM, Feng Qu  wrote:
> We noticed that nodetool ring sometimes returns in 17-20 sec while it
> normally runs in less than a sec. There were some compaction running when it
> happened. Did compaction cause nodetool slowness? Anything else I should
> check?
 time nodetool -h hostname ring
> 
> real 0m17.595s
> user 0m0.339s
> sys 0m0.054s
>
> Feng Qu



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Best suitable value for flush_largest_memtables_at

2012-02-23 Thread Roshan

Thanks Aaron for the indormation.

I increased the VM size to 2.4G from 1.4G. Please check my current CF in
below.

Keyspace: WCache:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:3]
  Column Families:
ColumnFamily: WStandard
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator:
org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds / keys to save : 1000.0/0/all
  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 21.0/3600
  GC grace seconds: 3600
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

I have already done the below configuration changes after getting the OOM.

/app/Cassandra/conf/cassandra-env.sh

JVM_OPTS -XX:CMSInitiatingOccupancyFraction=45 (reduce it from 75)

/app/Cassandra/conf/Cassandra.yaml

flush_largest_memtables_at: 0.45 (reduce it from .75)
reduce_cache_sizes_at: 0.55 (reduce it from .85)
reduce_cache_capacity_to: 0.3 (reduce it from .6)
concurrent_compactors: 1

I will also apply the configuration you suggest in locally first then to
production. Appreciate your comments regarding this. Thanks. 

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Best-suitable-value-for-flush-largest-memtables-at-tp7310767p7313260.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

hinted handoff 16 s delay

2012-02-23 Thread Hontvári József Levente

I have played with a test cluster, stopping cassandra on one node and 
updating a row on another. I noticed a delay in delivering hinted 
handoffs for which I don't know the rationale. After the node which 
originally received the update noticed that the other server is up, it 
waited 16 s before it started pushing the hints.


Here is the log:

 INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line 
988) Node /192.0.2.1 state jump to normal
 INFO [HintedHandoff:1] 2012-02-23 20:05:49,766 
HintedHandOffManager.java (line 296) Started hinted handoff for token: 1 
with IP: /192.0.2.1
 INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java 
(line 704) Enqueuing flush of 
Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2 ops)
 INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246) 
Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live 
bytes, 2 ops)
 INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283) 
Completed flushing 
/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290 
bytes)
 INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193 
CompactionTask.java (line 113) Compacting 
[SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db'), 
SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-9-Data.db')]
 INFO [HintedHandoff:1] 2012-02-23 20:05:50,195 
HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows 
to endpoint /192.0.2.1

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-23 Thread Tyler Hobbs

On Thu, Feb 23, 2012 at 7:29 AM, Bill Au  wrote:

>
> Upgrade within the 1.0.x branch is simply a rolling restart, right?


Generally, but you should always read NEWS.txt before upgrading.

-- 
Tyler Hobbs
DataStax

Re: hinted handoff 16 s delay

2012-02-23 Thread Todd Burruss

if I remember correctly, cassandra has a random delay in it so hint
deliver is staggered and does not overwhelm the just restarted node.

On 2/23/12 1:46 PM, "Hontvári József Levente" 
wrote:

>I have played with a test cluster, stopping cassandra on one node and
>updating a row on another. I noticed a delay in delivering hinted
>handoffs for which I don't know the rationale. After the node which
>originally received the update noticed that the other server is up, it
>waited 16 s before it started pushing the hints.
>
>Here is the log:
>
>  INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line
>988) Node /192.0.2.1 state jump to normal
>  INFO [HintedHandoff:1] 2012-02-23 20:05:49,766
>HintedHandOffManager.java (line 296) Started hinted handoff for token: 1
>with IP: /192.0.2.1
>  INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java
>(line 704) Enqueuing flush of
>Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2
>ops)
>  INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246)
>Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live
>bytes, 2 ops)
>  INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283)
>Completed flushing
>/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290
>bytes)
>  INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193
>CompactionTask.java (line 113) Compacting
>[SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-h
>c-10-Data.db'), 
>SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc
>-9-Data.db')]
>  INFO [HintedHandoff:1] 2012-02-23 20:05:50,195
>HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows
>to endpoint /192.0.2.1
>

Frequency of Flushing in 1.0

2012-02-23 Thread Xaero S

I recently started using Cassandra 1.0.4 and observed that it takes a lot
longer to flush the commit logs to SSTables, than was observed in versions
0.7.X and 0.8.X under constant load conditions with commitlog_sync as
periodic and commitlog_sync_period_in_ms as 1. As more data gets
retained in commit logs, if a node goes down, it will take longer for
commitlog replay. I am wondering about the configuration options in 1.0
that are related to commit log and memtable flushing. What are the settings
that help flush memtables more frequently or less frequently under constant
load conditions?

Would appreciate any help on this subject.

Re: hinted handoff 16 s delay

2012-02-23 Thread Maki Watanabe

I've verified it in the source: deliverHintsToEndpointInternal in
HintedHandOffManager.java
Yes it add random delay before HH delivery.

2012/2/24 Todd Burruss :
> if I remember correctly, cassandra has a random delay in it so hint
> deliver is staggered and does not overwhelm the just restarted node.
>
> On 2/23/12 1:46 PM, "Hontvári József Levente" 
> wrote:
>
>>I have played with a test cluster, stopping cassandra on one node and
>>updating a row on another. I noticed a delay in delivering hinted
>>handoffs for which I don't know the rationale. After the node which
>>originally received the update noticed that the other server is up, it
>>waited 16 s before it started pushing the hints.
>>
>>Here is the log:
>>
>>  INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line
>>988) Node /192.0.2.1 state jump to normal
>>  INFO [HintedHandoff:1] 2012-02-23 20:05:49,766
>>HintedHandOffManager.java (line 296) Started hinted handoff for token: 1
>>with IP: /192.0.2.1
>>  INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java
>>(line 704) Enqueuing flush of
>>Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2
>>ops)
>>  INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246)
>>Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live
>>bytes, 2 ops)
>>  INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283)
>>Completed flushing
>>/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290
>>bytes)
>>  INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193
>>CompactionTask.java (line 113) Compacting
>>[SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-h
>>c-10-Data.db'),
>>SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc
>>-9-Data.db')]
>>  INFO [HintedHandoff:1] 2012-02-23 20:05:50,195
>>HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows
>>to endpoint /192.0.2.1
>>
>



-- 
w3m

data model advice

2012-02-23 Thread Franc Carter

Hi,

I've finished my first model and experiments with Cassandra with result I'm
pretty happy with - so I thought I'd move on to something harder.

We have a set of data that has a large number of entities (which is our
primary search key), for each of the entities we have a smallish (<100)
number of sets of data. Each set has a further set the contains column/vale
pairs.

The queries will be for an Entity, for one or more days for one or more of
the subsets. Conceptually I would like to model like it like this:-

Entity {
   Day1: {
   TypeA: {col1:val1, col2:val2, . . . }
   TypeB: {col1:val1, col3:val3, . . . }
  .
  .
   }
   .
   .
   .
   DayN: {
   TypeB: {col3:val3, col5:val5, . . . }
   TypeD: {col3:val3, col6:val6, . . . }
  .
  .
   }
}

My understanding of the Cassandra data model is that I run out of map-dept
to do this in my simplistic approach as the Days are super columns, the
types are column and then I don't have a col/val map left for data.

Does anyone have advice on a good approach ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: data model advice

2012-02-23 Thread Indranath Ghosh

How about using a composite row key like the following:

Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
.
.
Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }

It is better to avoid super columns..

-indra

On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter wrote:

>
> Hi,
>
> I've finished my first model and experiments with Cassandra with result
> I'm pretty happy with - so I thought I'd move on to something harder.
>
> We have a set of data that has a large number of entities (which is our
> primary search key), for each of the entities we have a smallish (<100)
> number of sets of data. Each set has a further set the contains column/vale
> pairs.
>
> The queries will be for an Entity, for one or more days for one or more of
> the subsets. Conceptually I would like to model like it like this:-
>
> Entity {
>Day1: {
>TypeA: {col1:val1, col2:val2, . . . }
>TypeB: {col1:val1, col3:val3, . . . }
>   .
>   .
>}
>.
>.
>.
>DayN: {
>TypeB: {col3:val3, col5:val5, . . . }
>TypeD: {col3:val3, col6:val6, . . . }
>   .
>   .
>}
> }
>
> My understanding of the Cassandra data model is that I run out of map-dept
> to do this in my simplistic approach as the Days are super columns, the
> types are column and then I don't have a col/val map left for data.
>
> Does anyone have advice on a good approach ?
>
> thanks
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


-- 
*Indranath Ghosh
Phone: 408-813-9207*

Re: data model advice

2012-02-23 Thread Martin Arrowsmith

Hi Franc,

Or, you can consider using composite columns. It is not recommended to use
Super Columns anymore.

Best wishes,

Martin

On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh wrote:

> How about using a composite row key like the following:
>
> Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
> Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
> .
> .
> Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
> Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }
>
> It is better to avoid super columns..
>
> -indra
>
> On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter 
> wrote:
>
>>
>> Hi,
>>
>> I've finished my first model and experiments with Cassandra with result
>> I'm pretty happy with - so I thought I'd move on to something harder.
>>
>> We have a set of data that has a large number of entities (which is our
>> primary search key), for each of the entities we have a smallish (<100)
>> number of sets of data. Each set has a further set the contains column/vale
>> pairs.
>>
>> The queries will be for an Entity, for one or more days for one or more
>> of the subsets. Conceptually I would like to model like it like this:-
>>
>> Entity {
>>Day1: {
>>TypeA: {col1:val1, col2:val2, . . . }
>>TypeB: {col1:val1, col3:val3, . . . }
>>   .
>>   .
>>}
>>.
>>.
>>.
>>DayN: {
>>TypeB: {col3:val3, col5:val5, . . . }
>>TypeD: {col3:val3, col6:val6, . . . }
>>   .
>>   .
>>}
>> }
>>
>> My understanding of the Cassandra data model is that I run out of
>> map-dept to do this in my simplistic approach as the Days are super
>> columns, the types are column and then I don't have a col/val map left for
>> data.
>>
>> Does anyone have advice on a good approach ?
>>
>> thanks
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  
>>
>> franc.car...@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>
>
> --
> *Indranath Ghosh
> Phone: 408-813-9207*
>
>

Re: data model advice

2012-02-23 Thread Franc Carter

On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith <
arrowsmith.mar...@gmail.com> wrote:

> Hi Franc,
>
> Or, you can consider using composite columns. It is not recommended to use
> Super Columns anymore.
>

Thanks,

I'll look in to composite columns

cheers


>
> Best wishes,
>
> Martin
>
>
> On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh wrote:
>
>> How about using a composite row key like the following:
>>
>> Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
>> Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
>> .
>> .
>> Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
>> Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }
>>
>> It is better to avoid super columns..
>>
>> -indra
>>
>> On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> I've finished my first model and experiments with Cassandra with result
>>> I'm pretty happy with - so I thought I'd move on to something harder.
>>>
>>> We have a set of data that has a large number of entities (which is our
>>> primary search key), for each of the entities we have a smallish (<100)
>>> number of sets of data. Each set has a further set the contains column/vale
>>> pairs.
>>>
>>> The queries will be for an Entity, for one or more days for one or more
>>> of the subsets. Conceptually I would like to model like it like this:-
>>>
>>> Entity {
>>>Day1: {
>>>TypeA: {col1:val1, col2:val2, . . . }
>>>TypeB: {col1:val1, col3:val3, . . . }
>>>   .
>>>   .
>>>}
>>>.
>>>.
>>>.
>>>DayN: {
>>>TypeB: {col3:val3, col5:val5, . . . }
>>>TypeD: {col3:val3, col6:val6, . . . }
>>>   .
>>>   .
>>>}
>>> }
>>>
>>> My understanding of the Cassandra data model is that I run out of
>>> map-dept to do this in my simplistic approach as the Days are super
>>> columns, the types are column and then I don't have a col/val map left for
>>> data.
>>>
>>> Does anyone have advice on a good approach ?
>>>
>>> thanks
>>>
>>> --
>>>
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>  
>>>
>>> franc.car...@sirca.org.au | www.sirca.org.au
>>>
>>> Tel: +61 2 9236 9118
>>>
>>> Level 9, 80 Clarence St, Sydney NSW 2000
>>>
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>
>>
>> --
>> *Indranath Ghosh
>> Phone: 408-813-9207*
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: data model advice

2012-02-23 Thread Franc Carter

On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith <
arrowsmith.mar...@gmail.com> wrote:

> Hi Franc,
>
> Or, you can consider using composite columns. It is not recommended to use
> Super Columns anymore.
>
> Best wishes,
>

On first read it would seem that there is fair bit of overhead with
composite columns as it's my understanding that the column name is stored
with each value - or have I missed something ?

cheers


>
> Martin
>
>
> On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh wrote:
>
>> How about using a composite row key like the following:
>>
>> Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
>> Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
>> .
>> .
>> Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
>> Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }
>>
>> It is better to avoid super columns..
>>
>> -indra
>>
>> On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> I've finished my first model and experiments with Cassandra with result
>>> I'm pretty happy with - so I thought I'd move on to something harder.
>>>
>>> We have a set of data that has a large number of entities (which is our
>>> primary search key), for each of the entities we have a smallish (<100)
>>> number of sets of data. Each set has a further set the contains column/vale
>>> pairs.
>>>
>>> The queries will be for an Entity, for one or more days for one or more
>>> of the subsets. Conceptually I would like to model like it like this:-
>>>
>>> Entity {
>>>Day1: {
>>>TypeA: {col1:val1, col2:val2, . . . }
>>>TypeB: {col1:val1, col3:val3, . . . }
>>>   .
>>>   .
>>>}
>>>.
>>>.
>>>.
>>>DayN: {
>>>TypeB: {col3:val3, col5:val5, . . . }
>>>TypeD: {col3:val3, col6:val6, . . . }
>>>   .
>>>   .
>>>}
>>> }
>>>
>>> My understanding of the Cassandra data model is that I run out of
>>> map-dept to do this in my simplistic approach as the Days are super
>>> columns, the types are column and then I don't have a col/val map left for
>>> data.
>>>
>>> Does anyone have advice on a good approach ?
>>>
>>> thanks
>>>
>>> --
>>>
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>  
>>>
>>> franc.car...@sirca.org.au | www.sirca.org.au
>>>
>>> Tel: +61 2 9236 9118
>>>
>>> Level 9, 80 Clarence St, Sydney NSW 2000
>>>
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>
>>
>> --
>> *Indranath Ghosh
>> Phone: 408-813-9207*
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta1 released

2012-02-23 Thread Maki Watanabe

No, I couldn't download the beta with the first link. The mirrors
returned 404 for that.
After exploring the link and I found the latter uri was worked.
So I don't think we need to wait.

2012/2/22 Sylvain Lebresne :
> Arf, you'r right sorry.
> I've fixed it (but it could take ~1 to get propagated to all apache mirrors).
>
> --
> SYlvain
>
> On Wed, Feb 22, 2012 at 2:46 AM, Maki Watanabe  
> wrote:
>> The link is wrong.
>> http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
>> Should be:
>> http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0-beta1/apache-cassandra-1.1.0-beta1-bin.tar.gz
>>
>>
>> 2012/2/21 Sylvain Lebresne :
>>> The Cassandra team is pleased to announce the release of the first beta for
>>> the future Apache Cassandra 1.1.
>>>
>>> Let me first stress that this is beta software and as such is *not* ready 
>>> for
>>> production use.
>>>
>>> The goal of this release is to give a preview of what will become Cassandra
>>> 1.1 and to get wider testing before the final release. All help in testing
>>> this release would be therefore greatly appreciated and please report any
>>> problem you may encounter[3,4]. Have a look at the change log[1] and the
>>> release notes[2] to see where Cassandra 1.1 differs from the previous 
>>> series.
>>>
>>> Apache Cassandra 1.1.0-beta1[5] is available as usual from the cassandra
>>> website (http://cassandra.apache.org/download/) and a debian package is
>>> available using the 11x branch (see
>>> http://wiki.apache.org/cassandra/DebianPackaging).
>>>
>>> Thank you for your help in testing and have fun with it.
>>>
>>> [1]: http://goo.gl/6iURu (CHANGES.txt)
>>> [2]: http://goo.gl/hWilW (NEWS.txt)
>>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>> [4]: user@cassandra.apache.org
>>> [5]: 
>>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta1
>>
>>
>>
>> --
>> w3m



-- 
w3m

Re: insert performance

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

EC2 Best Practices

Re: Flume and Cassandra

Re: Best suitable value for flush_largest_memtables_at

Re: How to delete a range of columns using first N components of CompositeType Column?

Re: EC2 Best Practices

Re: GC performance in 1.0.x

Re: nodetool ring runs very slow

Re: Best suitable value for flush_largest_memtables_at

hinted handoff 16 s delay

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

Re: hinted handoff 16 s delay

Frequency of Flushing in 1.0

Re: hinted handoff 16 s delay

data model advice

Re: data model advice

Re: data model advice

Re: data model advice

Re: data model advice

Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta1 released

21 matches

Site Navigation

Mail list logo

Footer information