I work with Amir and further experimentation I can shed a little more light on 
what exactly is going on under the hood.  For background our goal is to take 
data that is currently being read and written to via thrift, switch reads to 
CQL, and then switch writes to CQL.  This is in alternative to deleting all of 
our data and starting over, or being forever struck on super old thrift clients 
(both of those options obviously suck.)  The data models involved are absurdly 
simple (and single key with a handful of static columns).

TLDR: Metadata is complicated.  What is the least dangerous way to make direct 
changes to system.schema_columnfamilies and system.schema_columns?

Anyway, given some super simple Foo and Bar column families:

create keyspace Test with  placement_strategy = 
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
{replication_factor:1};
use Test;
create column family Foo with comparator = UTF8Type and 
key_validation_class=UTF8Type and column_metadata = [ {column_name: title, 
validation_class: UTF8Type}];
create column family Bar with comparator = UTF8Type and 
key_validation_class=UTF8Type;
update column family Bar with column_metadata = [ {column_name: title, 
validation_class: UTF8Type}];

(The salient difference as described by Amir is when the column_metadata is 
set; at the same time as creation or later.)

Now we can inject a little data and see that from thrift everything looks fine:

[default@Test] set Foo['testkey']['title']='mytitle';
Value inserted.
Elapsed time: 19 msec(s).
[default@Test] set Bar['testkey']['title']='mytitle';
Value inserted.
Elapsed time: 4.47 msec(s).

[default@Test] list Foo;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: testkey
=> (name=title, value=mytitle, timestamp=1452108082972000)

1 Row Returned.
Elapsed time: 268 msec(s).
[default@Test] list Bar;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: testkey
=> (name=title, value=mytitle, timestamp=1452108093739000)

1 Row Returned.
Elapsed time: 9.3 msec(s).

But from cql the Bar column does not look like the data we wrote:

cqlsh> select * from "Test"."Foo";

 key     | title
---------+---------
 testkey | mytitle

(1 rows)


cqlsh> select * from "Test"."Bar";

 key     | column1 | value            | title
---------+---------+------------------+---------
 testkey |   title | 0x6d797469746c65 | mytitle


It's not just that these phantom columns are ugly, cql thinks column1 is part 
of a composite primary key.  Since there **is no column1**, that renderes the 
data un-query-able with WHERE clauses.

Just to make sure it's not thrift that is doing something unexpected, the 
sstables show the expected structure:

$ ./tools/bin/sstable2json 
/data/sstables/data/Test/Foo-d3348860b4af11e5b456639406f48f1b/Test-Foo-ka-1-Data.db
 
[
{"key": "testkey",
 "cells": [["title","mytitle",1452110466924000]]}
]


$ ./tools/bin/sstable2json 
/data/sstables/data/Test/Foo-d3348860b4af11e5b456639406f48f1b/Test-Foo-ka-1-Data.db
 
[
{"key": "testkey",
 "cells": [["title","mytitle",1452110466924000]]}
]


So, what appeared as innocent variation made years ago when the thrift schema 
was written causes very different results to cql.

Digging into the schema tables shows what is going on in more detail:

> select 
> keyspace_name,columnfamily_name,column_aliases,comparator,is_dense,key_aliases,value_alias
>  from system.schema_columnfamilies where keyspace_name='Test';

 keyspace_name | columnfamily_name | column_aliases | comparator                
              | is_dense | key_aliases | value_alias
---------------+-------------------+----------------+ 
----------------------------------------+----------+-------------+-------------
          Test |               Bar | ["column1"]   | 
org.apache.cassandra.db.marshal.UTF8Type |     True |     ["key"] |       value
          Test |               Foo |          []   | 
org.apache.cassandra.db.marshal.UTF8Type |    False |     ["key"] |        null

> select keyspace_name,columnfamily_name,column_name,validator from 
> system.schema_columns where keyspace_name='Test';

 keyspace_name | columnfamily_name | column_name | validator
---------------+-------------------+-------------+-------------------------------------------
          Test |               Bar |     column1 |  
org.apache.cassandra.db.marshal.UTF8Type
          Test |               Bar |         key |  
org.apache.cassandra.db.marshal.UTF8Type
          Test |               Bar |       title |  
org.apache.cassandra.db.marshal.UTF8Type
          Test |               Bar |       value | 
org.apache.cassandra.db.marshal.BytesType
          Test |               Foo |         key |  
org.apache.cassandra.db.marshal.UTF8Type
          Test |               Foo |       title |  
org.apache.cassandra.db.marshal.UTF8Type


Now the interesting bit is that the metadata can  be manually "fixed":

UPDATE system.schema_columnfamilies SET 
column_aliases='[]',is_dense=False,value_alias=null WHERE keyspace_name='Test' 
AND columnfamily_name = 'Bar';
DELETE FROM system.schema_columns  where keyspace_name='Test' and 
columnfamily_name='Bar' and column_name = 'column1';
DELETE FROM system.schema_columns  where keyspace_name='Test' and 
columnfamily_name='Bar' and column_name = 'value';
(drain and restart)
> select * from "Test"."Bar";

 key     | title
---------+---------
 testkey | mytitle


The problem with that approach is that manually editing the local schema tables 
in live cluster is wildly dangerous. I *think* this would work:
 * Make triple sure no schema changes are happening on the cluster.
 * Update schema tables on each node --> drain --> restart

Questions:
 * Is there a better/safer way to correct the problem without some sort of bulk 
import/export?
 * Is it possible to choose which node resetlocalschema pulls from?

Reply via email to