Hi, Has anyone used HIVE + Cassandra Community successfully? I am having problems mapping the keyspace, but I started wondering if only DSE has support for it.
I am trying to use HIVE 0.13 to access cassandra 2.0.8 column families created with CQL3. Here is how I created my column families: CREATE KEYSPACE IF NOT EXISTS Identification WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2 }; USE Identification; CREATE TABLE IF NOT EXISTS entitylookup ( name varchar, value varchar, entity_id uuid, PRIMARY KEY ((name, value), entity_id)) WITH caching=all ; I followed the instructions from the README of this project: https://github.com/tuplejump/cash/tree/master/cassandra-handler I generated hive-cassandra-1.2.6.jar, copied it and cassandra-all-1.2.6.jar, cassandra-thrift-1.2.6.jar to hive lib folder. Then I started hive and tried the following: CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "localhost", "cassandra.port "= "9160") TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy"); Here is the output: hive> mvalle@mvalle:~/hadoop$ hive 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/05/30 12:02:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Logging initialized using configuration in jar:file:/home/mvalle/hadoop/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties OpenJDK 64-Bit Server VM warning: You have loaded library /home/mvalle/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. hive> CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary) > STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "ident.s1mbi0se.com", "cassandra.port "= "9160") > TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy"); FAILED: SemanticException [Error 10072]: Database does not exist: identification Question: how do I do to get more information about what is going wrong? I tried the same hive command using "Identification" (capital I), but same result. Is it possible to access CQL3 column families in cassandra community? It seems the keyspace has not been mapped, but I don't see how to map then. In DSE, they are automatically mapped... Best regards, Marcelo.