> 3) check the system.schema_columns if these column_name(s) exist in the table > 4) If the column don't exist in the table "ALTER table tablename add new column_name text"
Unless you have some external control on this so that you know two processors will never attempt the same operation within a few minutes of each other, that's a race condition that will eventually cause you to end in schema disagreement. Even if you have an external control, it sounds like a landmine that will get you some day. On Fri, Nov 20, 2015 at 8:58 AM Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > Thank you Alex for answering. > > Yes the columns are dynamic. > The scenario is like this > 1) Initial create table with id uuid, name text and insert some data into > these fields > 2) Parse an XML from which we retrieve one or more new column names from > this XML > 3) check the system.schema_columns if these column_name(s) exist in the > table > 4) If the column don't exist in the table "ALTER table tablename add new > column_name text" > 5) Inject data into this new column "Update table name set column_name > =value where id=blah" > > We did tried the map columns, but the query part is the pain. > > All these pain we need to take because the column names have special > character like " ' _- ( ) '' ¬ " etc. > This solved our loading issue, then we hit with another problem while > reading these column values via Spark. > > Exception was "*Some of types cannot be determined by the first 100 rows, > please try again with sampling.* " > > We solved by setting the SamplinRatio to 1 in the Spark end while create > DataFrame. > > > ------------------------------ > *From:* Alex Popescu [al...@datastax.com] > *Sent:* 13 November 2015 19:00 > *To:* user > > *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column > name ... while executing ALTER statement > I'm glad to hear that you got it working; but I'd suggest trying to answer > these questions before moving forward with this solution: > > 1. is the set of columns really that dynamic? if not, then define them > upfront. there's no weight to empty columns. > 2. if the set of columns is really dynamic, would using 1/more map > column(s) be better? > > Avoiding to modify the schema dynamically and avoid concurrent schema > changes is always better. > > > On Fri, Nov 13, 2015 at 7:40 AM, Rajesh Radhakrishnan < > rajesh.radhakrish...@phe.gov.uk> wrote: > >> We got a work around now! >> >> Thank you Laing for the reply. >> Yes I do agree with your point, but we got a scenario where the columns >> need to be added in the later stage of the process. >> We are doing the following: >> >> 1. CREATE THE TABLE IF NOT EXISTS >> 2. INSERT IDS INTO THE TABLE >> 3. CHECK THE COLUMN NAMES OF THE TABLE >> 4. GET A LIST OF _NAMES (PYTHON SCRIPT) >> 5. ALTER TABLE IF THE _NAME(S) DONT EXIST IN COLUMN NAME >> 6. UPDATE TABLE WITH THE VALUE OF THE NEW COLUMN >> >> In our process step 3 to 6 are repeated. >> >> Now what I did is replaced session.execute(...) to >> session.execute_async(...) only for ALTER and UPDATE statements. >> >> And introduced 1 sec sleep for each ALTER statement and 5 sec sleep >> before UPDATE statement. >> >> It WORKS! now. I dont know this is right solution, but its a work around. >> >> So clearly some config value need to be updated for some parameter in >> cassandra.yaml >> >> Do you know which one? >> >> >> >> ------------------------------ >> *From:* Laing, Michael [michael.la...@nytimes.com] >> *Sent:* 13 November 2015 12:26 >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column >> name ... while executing ALTER statement >> >> Dynamic schema changes are generally a bad idea, especially if they are >> rapid. >> >> You should rethink your approach. >> >> On Fri, Nov 13, 2015 at 7:20 AM, Rajesh Radhakrishnan < >> rajesh.radhakrish...@phe.gov.uk> wrote: >> >>> >>> Thank you Carlos for looking. >>> But when I rand the nodetool describecluster. >>> It is showing the same schema versions for both nodes? >>> >>> So it is something else! Please help me from this bottleneck. Thank you. >>> >>> ------------------------------ >>> *From:* Carlos Alonso [i...@mrcalonso.com] >>> *Sent:* 13 November 2015 11:55 >>> *To:* user@cassandra.apache.org >>> *Subject:* Re: Getting code=2200 [Invalid query] message=Invalid column >>> name ... while executing ALTER statement >>> >>> Maybe schema disagreement? >>> >>> Run nodetool describecluster to discover >>> >>> Carlos Alonso | Software Engineer | @calonso >>> <https://twitter.com/calonso> >>> >>> On 13 November 2015 at 11:14, Rajesh Radhakrishnan < >>> rajesh.radhakrish...@phe.gov.uk> wrote: >>> >>>> >>>> Hi, >>>> >>>> I am using Cassandra 2.1.5 in a cluster of two nodes (running CentOS) >>>> and using Python driver to connect to Cassandra. >>>> My Python code snippet is show here: >>>> >>>> >>>> #------------------------------------------------------------------------------------------------------------------- >>>> import time, os, datetime, keyword >>>> import uuid >>>> from cassandra.cluster import Cluster >>>> import os.path, sys >>>> .... >>>> from cassandra.auth import PlainTextAuthProvider >>>> .... >>>> auth_provider = PlainTextAuthProvider(username, password) >>>> cluster = Cluster([node1,node2],auth_provider=auth_provider) >>>> session = cluster.connect(); >>>> >>>> session.execute("CREATE table IF NOT EXISTS test.iau (" >>>> "id uuid, " >>>> "sample_id text, " >>>> "PRIMARY KEY (sample_id) )"); >>>> >>>> print " \n created the table" >>>> #-------- >>>> >>>> sqlInsertSampleIdUid = "INSERT INTO test.iau (id, sample_id) >>>> VALUES ("+str(uuid.uuid1())+",'sample123')" >>>> session.execute(sqlInsertSampleIdUid) >>>> print " \n Inserted main ids into the table" >>>> #------- >>>> >>>> colNames >>>> =['col1','col2','col3','col4','col5','col6','col7','col8','col9'] >>>> >>>> for colName in colNames : >>>> >>>> sqlAlterStatement1 = " ALTER TABLE test.iau ADD "+colName+" >>>> text" >>>> print sqlAlterStatement1 >>>> session.execute(sqlAlterStatement1) >>>> sqlAlterStatement1 = None >>>> >>>> print " Altered tables :: " >>>> # ---------------------------------------- >>>> count = 0 >>>> for colName in colNames : >>>> count = count +1 >>>> sqlUpdateGeneDetection = " UPDATE test.iau SET "+colName+" >>>> = '"+str(count)+"' WHERE sample_id = 'sample123'" >>>> session.execute(sqlUpdateGeneDetection) >>>> sqlUpdateGeneDetection = None >>>> >>>> print " Updated tables :: " >>>> session.cluster.shutdown() >>>> .... >>>> >>>> #------------------------------------------------------------------------------------------------------------------- >>>> >>>> Very rarely this code works, but most of the time it fails when it >>>> reach ALTER statement. >>>> FYI, I tried preparedstatement with binding in INSERT, UPDATE >>>> statements too. >>>> >>>> The error with output is shown here: >>>> >>>> #------ >>>> created the table >>>> >>>> Inserted main ids into the table >>>> ALTER TABLE test.iau ADD col1 text >>>> ALTER TABLE test.iau ADD col2 text >>>> ALTER TABLE test.iau ADD col3 text >>>> ALTER TABLE test.iau ADD col4 text >>>> ALTER TABLE test.iau ADD col5 text >>>> ALTER TABLE test.iau ADD col6 text >>>> ALTER TABLE test.iau ADD col7 text >>>> ALTER TABLE test.iau ADD col8 text >>>> ALTER TABLE test.iau ADD col9 text >>>> E >>>> ====================================================================== >>>> >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File "UnitTests.py", line 313, in test_insert_data >>>> session.execute(sqlAlterStatement1) >>>> File "/usr/local/lib/python2.7/site-packages/cassandra/cluster.py", >>>> line 1405, in execute >>>> result = future.result(timeout) >>>> File "/usr/local/lib/python2.7/site-packages/cassandra/cluster.py", >>>> line 2976, in result >>>> raise self._final_exception >>>> InvalidRequest: code=2200 [Invalid query] message="Invalid column name >>>> col9 because it conflicts with an existing column" >>>> >>>> ---------------------------------------------------------------------- >>>> Ran 1 test in 9.856s >>>> #------ >>>> >>>> But when I checked the table using CQL col9 is not there. >>>> >>>> Is there' schema' refresh issue or is it bug in Cassandra 2.1.5? >>>> >>>> Thank you. >>>> Kind regards >>>> Rajesh R >>>> >>>> >>>> >>>> >>>> ************************************************************************** >>>> The information contained in the EMail and any attachments is >>>> confidential and intended solely and for the attention and use of the named >>>> addressee(s). It may not be disclosed to any other person without the >>>> express authority of Public Health England, or the intended recipient, or >>>> both. If you are not the intended recipient, you must not disclose, copy, >>>> distribute or retain this message or any part of it. This footnote also >>>> confirms that this EMail has been swept for computer viruses by >>>> Symantec.Cloud, but please re-sweep any attachments before opening or >>>> saving. http://www.gov.uk/PHE >>>> >>>> ************************************************************************** >>>> >>> >>> >>> >>> ************************************************************************** >>> The information contained in the EMail and any attachments is >>> confidential and intended solely and for the attention and use of the named >>> addressee(s). It may not be disclosed to any other person without the >>> express authority of Public Health England, or the intended recipient, or >>> both. If you are not the intended recipient, you must not disclose, copy, >>> distribute or retain this message or any part of it. This footnote also >>> confirms that this EMail has been swept for computer viruses by >>> Symantec.Cloud, but please re-sweep any attachments before opening or >>> saving. http://www.gov.uk/PHE >>> >>> ************************************************************************** >>> >> >> >> ************************************************************************** >> The information contained in the EMail and any attachments is >> confidential and intended solely and for the attention and use of the named >> addressee(s). It may not be disclosed to any other person without the >> express authority of Public Health England, or the intended recipient, or >> both. If you are not the intended recipient, you must not disclose, copy, >> distribute or retain this message or any part of it. This footnote also >> confirms that this EMail has been swept for computer viruses by >> Symantec.Cloud, but please re-sweep any attachments before opening or >> saving. http://www.gov.uk/PHE >> ************************************************************************** >> > > > > -- > Bests, > > Alex Popescu | @al3xandru > Sen. Product Manager @ DataStax > > > ************************************************************************** > The information contained in the EMail and any attachments is confidential > and intended solely and for the attention and use of the named > addressee(s). It may not be disclosed to any other person without the > express authority of Public Health England, or the intended recipient, or > both. If you are not the intended recipient, you must not disclose, copy, > distribute or retain this message or any part of it. This footnote also > confirms that this EMail has been swept for computer viruses by > Symantec.Cloud, but please re-sweep any attachments before opening or > saving. http://www.gov.uk/PHE > ************************************************************************** >