Thanks Michael for the quick return. I was looking forward the automatic schema inferring (I think that's you mean by 'schema merging' ?), and I think the STORED AS would still require me to define the table columns right?
Anyways, I am glad to hear you guys already working to fix this on future releases. Thanks, Carlos On Fri, May 8, 2015 at 2:43 PM, Michael Armbrust <[email protected]> wrote: > This is an unfortunate limitation of the datasource api which does not > support multiple databases. For parquet in particular (if you aren't using > schema merging). You can create a hive table using STORED AS PARQUET > today. I hope to fix this limitation in Spark 1.5. > > On Fri, May 8, 2015 at 2:41 PM, Carlos Pereira <[email protected]> > wrote: > >> Hi, I would like to create a hive table on top a existent parquet file as >> described here: >> >> https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html >> >> Due network restrictions, I need to store the metadata definition in a >> different path than '/user/hive/warehouse', so I first set a new database >> on >> my own HDFS dir: >> >> CREATE DATABASE foo_db LOCATION '/user/foo'; >> USE foo_db; >> >> And then I run the following query: >> >> CREATE TABLE mytable_parquet >> USING parquet >> OPTIONS (path "/user/foo/data.parquet") >> >> The problem is that SparkSQL is not using the same database defined the in >> shell context, but the default metastore instead of: >> >> ---------------------------- >> > CREATE TABLE mytable_parquet USING parquet OPTIONS (path >> "/user/foo/data.parquet"); >> 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: get_table : *db=foo_db* >> tbl=mytable_parquet >> >> 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo ip=unknown-ip-addr >> cmd=get_table : db=foo_db tbl=mytable_parquet >> >> 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: create_table: >> Table(tableName:mytable_parquet, *dbName:default,* owner:foo, >> createTime:1431117741, lastAccessTime:0, retention:0, >> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, >> comment:from deserializer)], location:null, >> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, >> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, >> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, >> >> serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, >> parameters:{serialization.format=1, path=/user/foo/data.parquet}), >> bucketCols:[], sortCols:[], parameters:{}, >> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], >> skewedColValueLocationMaps:{})), partitionKeys:[], >> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet}, >> viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) >> 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo ip=unknown-ip-addr >> cmd=create_table: Table(tableName:mytable_parquet, dbName:default, >> owner:foo, createTime:1431117741, lastAccessTime:0, retention:0, >> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, >> comment:from deserializer)], location:null, >> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, >> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, >> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, >> >> serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, >> parameters:{serialization.format=1, path=/user/foo/data.parquet}), >> bucketCols:[], sortCols:[], parameters:{}, >> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], >> skewedColValueLocationMaps:{})), partitionKeys:[], >> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet}, >> viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) >> >> 15/05/08 20:42:21 ERROR hive.log: Got exception: >> org.apache.hadoop.security.AccessControlException Permission denied: >> user=foo, access=WRITE, >> inode="/user/hive/warehouse":hive:grp_gdoop_hdfs:drwxr-xr-x >> ---------------------------- >> >> >> The permission error above happens because my linux user does not have >> write >> access on the default metastore path. I can workaround this issue if I use >> CREATE TEMPORARY TABLE and have no metadata written on disk. >> >> I would like to know if I am doing anything wrong here and if there is any >> additional property I can use to force the database/metastore_dir I need >> to >> write on. >> >> Thanks, >> Carlos >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/CREATE-TABLE-ignores-database-when-using-PARQUET-option-tp22824.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
