I’ve never been a big fan of the “COPY” statement. My preference for stuff like this (though I am definitely in the minority I think!) — particularly for the amount of data you’re talking about — is to use the open source tool “cassandradump” — which is similar to mysqldump but for cassandra.
Original: https://github.com/gianlucaborello/cassandradump <https://github.com/gianlucaborello/cassandradump> My version which changes a number of things to make it a little easier to work with (and more mysqldump-like): https://github.com/terwilliger42/cassandradump/commits/develop-tw42 <https://github.com/terwilliger42/cassandradump/commits/develop-tw42> - Updated for Python 3 (only works on Python 3; tested on Python 3.5) <https://github.com/terwilliger42/cassandradump/commit/3aab25d74eb373fdf279d8eec89df7d57b3e07df> - Added progress bar - Added default reading/writing to stdin/stdout - Added ability to prompt for password - --protocol-version no longer required for C* 2.x+ - Added capability to export/import into a different keyspace - Split commandline args into two subparsers "import" and "export" - Added --truncate option - Added shorter CLI params, similar to mysqldump (-h for --host, -u for --username, -p for --password, etc) - Max > On Oct 25, 2018, at 5:07 am, Philip Ó Condúin <philipocond...@gmail.com> > wrote: > > Hi Alain, > > That is exactly what I did yesterday in the end. I ran the selects and > output the results to a file, I ran some greps on that file to leave myself > with just the data rows removing any white space and headers. > I then copied this data into a notepad on my local machine and saved it as a > csv. Luckily the results of the selects were delimited by pipe "|" so I > imported the csv into a spreadsheet and was able to separate the values into > columns. > > From here I was able to build up the insert statements and now have 4K insert > statements as a backup. > > Thanks a lot for your reply. > > Kind regards, > Phil > > On Thu, 25 Oct 2018 at 11:59, Alain RODRIGUEZ <arodr...@gmail.com > <mailto:arodr...@gmail.com>> wrote: > > Does anyone have any ideas of what I can do to generate inserts based on > primary key numbers in an excel spreadsheet? > > A quick thought: > > What about using a column of the spreadsheet to actually store the SELECT > result and generate the INSERT statement (and I would probably do the DELETE > too) corresponding to each row using the power of the spreadsheet to write > the query once and have it for all the partitions with the proper values? > > The spreadsheet would then be your backup somehow. > > We are a bit far from any Cassandra advice, but that's my first thought on > your problem, use the spreadsheet :). > Another option is probably to SELECT these rows and INSERT them into some > other Cassandra table (same cluster or not). Here you would have to code it I > think (client app of any kind) > This might not a good fit, but just in case, you might want to check at the > 'COPY' statement: > https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table > > <https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table> > I'm not too sure what suits you the best. > > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com <mailto:al...@thelastpickle.com> > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com <http://www.thelastpickle.com/> > Le mer. 24 oct. 2018 à 12:46, Philip Ó Condúin <philipocond...@gmail.com > <mailto:philipocond...@gmail.com>> a écrit : > Hi All, > > I have a problem that I'm trying to work out and can't find anything online > that may help me. > > I have been asked to delete 4K records from a Column Family that has a total > of 1.8 million rows. I have been given an excel spreadsheet with a list of > the 4K PRIMARY KEY numbers to be deleted. Great, the delete will be easy > anyway. > > But before I delete them I want to take a backup of what I'm deleting before > I do, so that if the customer comes along and says they got the wrong numbers > then I can quickly restore one or all of them. > I have been trying to figure out how I can generate inserts from a select but > it looks like this is not possible. > > I'm using centos and Cassandra 2.11 > > Does anyone have any ideas of what I can do to generate inserts based on > primary key numbers in an excel spreadsheet? > > Kind Regards, > Phil > > > > > -- > Regards, > Phil