Re: Thrift to cql : mixed static and dynamic columns with secondary index

Clement Honore Fri, 17 Jul 2015 01:44:22 -0700

Thanks for your answer Tyler.

Unfortunately, I can't wait for 3.x to be released.


I think I will update my schema to declare explicitely all columns with
predictive names and I will migrate only the dynamic ones to a new table.
This will reduce drastically the amount of data to migrate and I'll be able
to make proper CQL3 query on the old tables.
I will explore this way before thinking of migrating all datas in a new
table that don't really follow the CQL3 logic.

Le jeu. 16 juil. 2015 à 17:39, Tyler Hobbs <[email protected]> a écrit :

> This schema is something that we're providing a better CQL conversion for
> in 3.0.  The one column you defined will become a "static" column, meaning
> there is only one copy of it per partition.  The schema will look something
> like this:
>
> CREATE TABLE ref_file (
>     key text,
>     folder text static,
>     column1 text,
>     value text,
>     PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE;
>
> The "column1" column will hold your dynamic field names, and the "value"
> column will hold your dynamic field values.
>
> Unfortunately, we probably won't support indexing the static column in
> 3.0.0, but we should be able to support that pretty soon afterwards.  The
> ticket for that is https://issues.apache.org/jira/browse/CASSANDRA-8103.
>
> If you don't want to wait for 3.x, migrating to a table like this is
> probably your best option:
>
> CREATE TABLE ref_file (
>     key text PRIMARY KEY,
>     folder text,
>     attributes map<text, text>
> )
>
> In this case, the attributes map would hold your dynamic fields.
>
> On Thu, Jul 16, 2015 at 4:22 AM, Clement Honore <[email protected]>
> wrote:
>
>> Hi,
>>
>> I'm trying to migrate from Cassandra 1.1 and Hector to a more up-to-date
>> stack like Cassandra 1.2+ and CQL3.
>>
>> I have read http://www.datastax.com/dev/blog/thrift-to-cql3
>> <https://webmail.one.grp/owa/redir.aspx?C=d70889e7914440b0ad13875bf00770a8&URL=http%3a%2f%2fwww.datastax.com%2fdev%2fblog%2fthrift-to-cql3>
>>  but
>> my use case adds a complexity which seems not documented : I have a mixed
>> column family with a secondary index.
>>
>> The column family has one explicitly declared column, which is indexed
>> natively.
>> In this column family, I'm also adding columns dynamically : some with
>> predictive names, some with dynamic names.
>>
>> If I try to query this table in cql, I can access only the declared
>> column (as stated in the documentation above).
>>
>> If I change the declaration by removing the explicitly declared column
>> (as explained in the documentation above), I loose the secondary index on
>> it.
>>
>> If I explicitly declare all the columns with an already known name
>> (assuming I accept that I will get plenty of columns with a null value for
>> the lines which don't have those attributes), I still can't manage columns
>> with a dynamic name.
>> And I can't declare a collection as my  comparator is UTF8Type.
>>
>> Should I migrate in a new table if I want to keep all the
>> functionalities? This is really a solution I want to avoid.
>>
>> Here is an example representing my actual schema :
>>
>> I have a column family "REF_File" referencing my files.
>> A file always has a "folder". The "folder" is indexed to easily find my
>> files.
>> A file may have some attributes like "name", "size", "mime ".
>> A file may have some comments referenced by a column "COM_X" where "X" is
>> the comment ID.
>>
>> Column family creation :
>>
>> Create column family REF_File with comparator=UTF8Type and
>> default_validation_class=UTF8Type and key_validation_class=UTF8Type and
>> column_metadata=[{column_name: folder, validation_class: UTF8Type,
>> index_type: KEYS}];
>>
>> set REF_File['id1']['folder']=folder1;
>> set REF_File['id1']['name']=file1;
>> set REF_File['id1']['size']=1234;
>> set REF_File['id1']['COM_1']='';
>> set REF_File['id1']['COM_2']='';
>> set REF_File['id2']['folder']=folder1;
>> set REF_File['id2']['name']=file2;
>> set REF_File['id2']['mime']='image/jpeg';
>> set REF_File['id2']['COM_1']='';
>>
>> Requesting :
>>
>> [default@DUNE_metadonnees] list REF_File;
>> Using default limit of 100 Using default cell limit of 100
>> -------------------
>> RowKey: id1
>> => (name=COM_1, value=, timestamp=1437034903045000) => (name=COM_2,
>> value=, timestamp=1437034911121000) => (name=folder, value=folder1,
>> timestamp=1437034833452000) => (name=name, value=file1,
>> timestamp=1437034851993000) => (name=size, value=1234,
>> timestamp=1437034871356000)
>> -------------------
>> RowKey: id2
>> => (name=COM_1, value=, timestamp=1437035169011000) => (name=folder,
>> value=folder1, timestamp=1437035062080000) => (name=mime, value=image/jpeg,
>> timestamp=1437035145227000) => (name=name, value=file2,
>> timestamp=1437035073596000)
>>
>> Thanks for your help !
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Thrift to cql : mixed static and dynamic columns with secondary index

Reply via email to