For sanity, I ran the same python script with the same row ids again today and 
it was 10x faster. Must be something going wrong intermittently in my cluster. 

-Allan

On April 11, 2014 at 11:02:11 AM, Allan C (alla...@gmail.com) wrote:

 It’s a fairly standard relational-like CF. Description is the only field 
that’s potentially big (can be up to 1k).

CREATE COLUMN FAMILY 'Event' WITH
  key_validation_class = 'UTF8Type' AND
  comparator = 'UTF8Type' AND
  default_validation_class = 'UTF8Type' AND
  bloom_filter_fp_chance = 0.1 AND
  compaction_strategy = 'LeveledCompactionStrategy' AND
  compaction_strategy_options = {sstable_size_in_mb:160} AND
  compression_options = 
{sstable_compression:SnappyCompressor,chunk_length_kb:64} AND
--  key_alias = 'eventId' AND
  column_metadata = [
      {column_name: 'createdAt', validation_class: 'DateType'},
      {column_name: 'creatorId', validation_class: 'UTF8Type'},
      {column_name: 'creatorName', validation_class: 'UTF8Type'},
      {column_name: 'description', validation_class: 'UTF8Type'},
      {column_name: 'privacy', validation_class: 'UTF8Type'},
      {column_name: 'location', validation_class: 'UTF8Type'},
      {column_name: 'locationId', validation_class: 'UTF8Type'},
      {column_name: 'endTime', validation_class: 'DateType'},
      {column_name: 'name', validation_class: 'UTF8Type'},
      {column_name: 'picture', validation_class: 'UTF8Type'},
      {column_name: 'startTime', validation_class: 'DateType'},
      {column_name: 'updatedAt', validation_class: 'DateType'},

      {column_name: 'lat', validation_class: 'UTF8Type'},
      {column_name: 'lng', validation_class: 'UTF8Type'},
      {column_name: 'street', validation_class: 'UTF8Type'},
      {column_name: 'city', validation_class: 'UTF8Type'},
      {column_name: 'state', validation_class: 'UTF8Type'},
      {column_name: 'zip', validation_class: 'UTF8Type'},
      {column_name: 'country', validation_class: 'UTF8Type'},

      {column_name: '~lastSync', validation_class: 'DateType'},
      {column_name: '~nextSync', validation_class: 'DateType'},

      {column_name: '~syncBlock', validation_class: 'IntegerType'},

      {column_name: 'noCount', validation_class: 'IntegerType'},
      {column_name: 'invitedCount', validation_class: 'IntegerType'},
      {column_name: 'maybeCount', validation_class: 'IntegerType'},
      {column_name: 'yesCount', validation_class: 'IntegerType'},

      {column_name: '~version', validation_class: 'IntegerType'}
];


-Allan

On April 10, 2014 at 4:49:34 PM, Tyler Hobbs (ty...@datastax.com) wrote:


On Thu, Apr 10, 2014 at 6:26 PM, Allan C <alla...@gmail.com> wrote:

Looks like the amount of data returned has a big effect. When I only return one 
column, python reports only 20ms compared to 150ms when returning the whole 
row. Rows are each less than 1k in size, but there must be client overhead.

That's a surprising amount of overhead in pycassa.  What's your schema like for 
this CF?


--
Tyler Hobbs
DataStax

Reply via email to