Hi Vladamir, Thanks for the response. I assume then that it is safe to remove the directories that are not current as per the system_schema.tables table. I have dozens of the same table and haven't dropped and added nearly that many times. Do any of the nodetool or other commands clean up these unused directories?
Thanks, Jason Kania From: Vladimir Yudovin <vla...@winguzone.com> To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> Sent: Saturday, October 8, 2016 2:05 PM Subject: Re: Understanding cassandra data directory contents Each table has unique id (suffix). If you drop and then recreate table with the same name it gets new id. Try SELECT keyspace_name, table_name, id FROM system_schema.tables ; to determinate actual ID. You can limit request to specific keyspace or table. Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer. Launch your cluster in minutes. ---- On Sat, 08 Oct 2016 13:42:19 -0400 Jason Kania<jason.ka...@ymail.com> wrote ---- Hello, I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in my 3 node cluster have vastly different amounts of data even though they should be roughly the same. When I looked through the data directory for my database on two of the nodes, I see a number of directories with the same prefix, eg: periodicReading-76eb7510096811e68a7421c8b9466352,periodicReading-453d55a0501d11e68623a9d2b6f96e86... Only one directory with a specific table name prefix has a current date and the rest are older. In contrast, on the node with the least space used, each directory has a unique prefix (not shared). I am wondering what the contents of a Cassandra database directory should look like. Are there supposed to be multiple entries for a given table or just one? If just one, what would be a procedure to determine if the other directories with the same table are junk that can be removed. Thanks, Jason