> 3) check the system.schema_columns if these column_name(s) exist in the
table
> 4) If the column don't exist in the table "ALTER table tablename add new
column_name text"
Unless you have some external control on this so that you know two
processors will never attempt the same operation within a few minutes of
each other, that's a race condition that will eventually cause you to end
in schema disagreement.  Even if you have an external control, it sounds
like a landmine that will get you some day.

On Fri, Nov 20, 2015 at 8:58 AM Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Thank you Alex for answering.
>
> Yes the columns are dynamic.
> The scenario is like this
> 1) Initial create table with id uuid, name text and insert some data into
> these fields
> 2) Parse an XML from which we retrieve one or more new column names from
> this XML
> 3) check the system.schema_columns if these column_name(s) exist in the
> table
> 4) If the column don't exist in the table "ALTER table tablename add new
> column_name text"
> 5)  Inject data into this new column "Update table name set column_name
> =value where id=blah"
>
> We did tried the map columns, but the query part is the pain.
>
> All these pain we need to take because the column names have special
> character like " ' _- ( ) '' ¬ " etc.
> This solved our loading issue, then we hit with another problem while
> reading these column values via Spark.
>
> Exception was "*Some of types cannot be determined by the first 100 rows,
> please try again with sampling.* "
>
> We solved by setting the SamplinRatio to 1 in the Spark end while create
> DataFrame.
>
>
> ------------------------------
> *From:* Alex Popescu [al...@datastax.com]
> *Sent:* 13 November 2015 19:00
> *To:* user
>
> *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column
> name ... while executing ALTER statement
> I'm glad to hear that you got it working; but I'd suggest trying to answer
> these questions before moving forward with this solution:
>
> 1. is the set of columns really that dynamic? if not, then define them
> upfront. there's no weight to empty columns.
> 2. if the set of columns is really dynamic, would using 1/more map
> column(s) be better?
>
> Avoiding to modify the schema dynamically and avoid concurrent schema
> changes is always better.
>
>
> On Fri, Nov 13, 2015 at 7:40 AM, Rajesh Radhakrishnan <
> rajesh.radhakrish...@phe.gov.uk> wrote:
>
>> We got a work around now!
>>
>> Thank you Laing for the reply.
>> Yes I do agree with your point, but we got a scenario where the columns
>> need to be added in the later stage of the process.
>> We are doing the following:
>>
>>    1. CREATE THE TABLE IF NOT EXISTS
>>    2. INSERT IDS INTO THE TABLE
>>    3. CHECK THE COLUMN NAMES OF THE TABLE
>>    4. GET A LIST OF _NAMES (PYTHON SCRIPT)
>>    5. ALTER TABLE IF THE _NAME(S) DONT EXIST IN COLUMN NAME
>>    6. UPDATE TABLE WITH THE VALUE OF THE NEW COLUMN
>>
>> In our process step 3 to 6 are repeated.
>>
>> Now what I did is replaced session.execute(...) to
>> session.execute_async(...) only for ALTER and UPDATE statements.
>>
>> And introduced 1 sec sleep for each ALTER statement and 5 sec sleep
>> before UPDATE statement.
>>
>> It WORKS! now. I dont know this is right solution, but its a work around.
>>
>> So clearly some config value need to be updated for some parameter in
>> cassandra.yaml
>>
>> Do you know which one?
>>
>>
>>
>> ------------------------------
>> *From:* Laing, Michael [michael.la...@nytimes.com]
>> *Sent:* 13 November 2015 12:26
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column
>> name ... while executing ALTER statement
>>
>> Dynamic schema changes are generally a bad idea, especially if they are
>> rapid.
>>
>> You should rethink your approach.
>>
>> On Fri, Nov 13, 2015 at 7:20 AM, Rajesh Radhakrishnan <
>> rajesh.radhakrish...@phe.gov.uk> wrote:
>>
>>>
>>> Thank you Carlos for looking.
>>> But when I rand the nodetool describecluster.
>>> It is showing the same schema versions for both nodes?
>>>
>>> So it is something else! Please help me from this bottleneck. Thank you.
>>>
>>> ------------------------------
>>> *From:* Carlos Alonso [i...@mrcalonso.com]
>>> *Sent:* 13 November 2015 11:55
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column
>>> name ... while executing ALTER statement
>>>
>>> Maybe schema disagreement?
>>>
>>> Run nodetool describecluster to discover
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 13 November 2015 at 11:14, Rajesh Radhakrishnan <
>>> rajesh.radhakrish...@phe.gov.uk> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am using Cassandra 2.1.5 in a cluster of two nodes (running CentOS)
>>>> and using Python driver to connect to Cassandra.
>>>> My Python code snippet is show here:
>>>>
>>>>
>>>> #-------------------------------------------------------------------------------------------------------------------
>>>> import time, os, datetime, keyword
>>>> import uuid
>>>> from cassandra.cluster import Cluster
>>>> import os.path, sys
>>>> ....
>>>> from cassandra.auth import PlainTextAuthProvider
>>>> ....
>>>>        auth_provider = PlainTextAuthProvider(username, password)
>>>>        cluster = Cluster([node1,node2],auth_provider=auth_provider)
>>>>        session = cluster.connect();
>>>>
>>>>         session.execute("CREATE table IF NOT EXISTS test.iau ("
>>>>                     "id uuid, "
>>>>                     "sample_id text, "
>>>>                     "PRIMARY KEY (sample_id) )");
>>>>
>>>>         print " \n created the table"
>>>>         #--------
>>>>
>>>>         sqlInsertSampleIdUid = "INSERT INTO test.iau (id, sample_id)
>>>> VALUES ("+str(uuid.uuid1())+",'sample123')"
>>>>         session.execute(sqlInsertSampleIdUid)
>>>>         print " \n Inserted main ids into the table"
>>>>         #-------
>>>>
>>>>         colNames
>>>> =['col1','col2','col3','col4','col5','col6','col7','col8','col9']
>>>>
>>>>         for colName in colNames :
>>>>
>>>>             sqlAlterStatement1 = " ALTER TABLE test.iau ADD "+colName+"
>>>> text"
>>>>             print sqlAlterStatement1
>>>>             session.execute(sqlAlterStatement1)
>>>>             sqlAlterStatement1 = None
>>>>
>>>>         print " Altered tables :: "
>>>>         # ----------------------------------------
>>>>         count = 0
>>>>         for colName in colNames :
>>>>             count = count +1
>>>>             sqlUpdateGeneDetection = " UPDATE test.iau SET "+colName+"
>>>> = '"+str(count)+"' WHERE sample_id = 'sample123'"
>>>>             session.execute(sqlUpdateGeneDetection)
>>>>             sqlUpdateGeneDetection = None
>>>>
>>>>         print " Updated tables :: "
>>>>         session.cluster.shutdown()
>>>> ....
>>>>
>>>> #-------------------------------------------------------------------------------------------------------------------
>>>>
>>>> Very rarely this code works, but most of the time it fails when it
>>>> reach ALTER statement.
>>>> FYI, I tried preparedstatement with binding in INSERT, UPDATE
>>>> statements too.
>>>>
>>>> The error with output is shown here:
>>>>
>>>> #------
>>>> created the table
>>>>
>>>>  Inserted main ids into the table
>>>>  ALTER TABLE test.iau ADD col1 text
>>>>  ALTER TABLE test.iau ADD col2 text
>>>>  ALTER TABLE test.iau ADD col3 text
>>>>  ALTER TABLE test.iau ADD col4 text
>>>>  ALTER TABLE test.iau ADD col5 text
>>>>  ALTER TABLE test.iau ADD col6 text
>>>>  ALTER TABLE test.iau ADD col7 text
>>>>  ALTER TABLE test.iau ADD col8 text
>>>>  ALTER TABLE test.iau ADD col9 text
>>>> E
>>>> ======================================================================
>>>>
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File "UnitTests.py", line 313, in test_insert_data
>>>>     session.execute(sqlAlterStatement1)
>>>>   File "/usr/local/lib/python2.7/site-packages/cassandra/cluster.py",
>>>> line 1405, in execute
>>>>     result = future.result(timeout)
>>>>   File "/usr/local/lib/python2.7/site-packages/cassandra/cluster.py",
>>>> line 2976, in result
>>>>     raise self._final_exception
>>>> InvalidRequest: code=2200 [Invalid query] message="Invalid column name
>>>> col9 because it conflicts with an existing column"
>>>>
>>>> ----------------------------------------------------------------------
>>>> Ran 1 test in 9.856s
>>>> #------
>>>>
>>>> But when I checked the table using CQL col9 is not there.
>>>>
>>>> Is there' schema' refresh issue or is it bug in Cassandra 2.1.5?
>>>>
>>>> Thank you.
>>>> Kind regards
>>>> Rajesh R
>>>>
>>>>
>>>>
>>>>
>>>> **************************************************************************
>>>> The information contained in the EMail and any attachments is
>>>> confidential and intended solely and for the attention and use of the named
>>>> addressee(s). It may not be disclosed to any other person without the
>>>> express authority of Public Health England, or the intended recipient, or
>>>> both. If you are not the intended recipient, you must not disclose, copy,
>>>> distribute or retain this message or any part of it. This footnote also
>>>> confirms that this EMail has been swept for computer viruses by
>>>> Symantec.Cloud, but please re-sweep any attachments before opening or
>>>> saving. http://www.gov.uk/PHE
>>>>
>>>> **************************************************************************
>>>>
>>>
>>>
>>>
>>> **************************************************************************
>>> The information contained in the EMail and any attachments is
>>> confidential and intended solely and for the attention and use of the named
>>> addressee(s). It may not be disclosed to any other person without the
>>> express authority of Public Health England, or the intended recipient, or
>>> both. If you are not the intended recipient, you must not disclose, copy,
>>> distribute or retain this message or any part of it. This footnote also
>>> confirms that this EMail has been swept for computer viruses by
>>> Symantec.Cloud, but please re-sweep any attachments before opening or
>>> saving. http://www.gov.uk/PHE
>>>
>>> **************************************************************************
>>>
>>
>>
>> **************************************************************************
>> The information contained in the EMail and any attachments is
>> confidential and intended solely and for the attention and use of the named
>> addressee(s). It may not be disclosed to any other person without the
>> express authority of Public Health England, or the intended recipient, or
>> both. If you are not the intended recipient, you must not disclose, copy,
>> distribute or retain this message or any part of it. This footnote also
>> confirms that this EMail has been swept for computer viruses by
>> Symantec.Cloud, but please re-sweep any attachments before opening or
>> saving. http://www.gov.uk/PHE
>> **************************************************************************
>>
>
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>
>
> **************************************************************************
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **************************************************************************
>

Reply via email to