Re: CREATE TABLE ignores database when using PARQUET option

Michael Armbrust Fri, 08 May 2015 14:46:06 -0700

This is an unfortunate limitation of the datasource api which does not
support multiple databases.  For parquet in particular (if you aren't using
schema merging).  You can create a hive table using STORED AS PARQUET
today.  I hope to fix this limitation in Spark 1.5.


On Fri, May 8, 2015 at 2:41 PM, Carlos Pereira <[email protected]> wrote:

> Hi, I would like to create a hive table on top a existent parquet file as
> described here:
>
> https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html
>
> Due network restrictions, I need to store the metadata definition in a
> different path than '/user/hive/warehouse', so I first set a new database
> on
> my own HDFS dir:
>
> CREATE DATABASE foo_db LOCATION '/user/foo';
> USE foo_db;
>
> And then I run the following query:
>
> CREATE TABLE mytable_parquet
> USING parquet
> OPTIONS (path "/user/foo/data.parquet")
>
> The problem is that SparkSQL is not using the same database defined the in
> shell context, but the default metastore instead of:
>
> ----------------------------
>  > CREATE TABLE mytable_parquet USING parquet OPTIONS (path
> "/user/foo/data.parquet");
> 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: get_table : *db=foo_db*
> tbl=mytable_parquet
>
> 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo     ip=unknown-ip-addr
> cmd=get_table : db=foo_db tbl=mytable_parquet
>
> 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: create_table:
> Table(tableName:mytable_parquet, *dbName:default,* owner:foo,
> createTime:1431117741, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>,
> comment:from deserializer)], location:null,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
> parameters:{serialization.format=1, path=/user/foo/data.parquet}),
> bucketCols:[], sortCols:[], parameters:{},
> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
> skewedColValueLocationMaps:{})), partitionKeys:[],
> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet},
> viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)
> 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo     ip=unknown-ip-addr
> cmd=create_table: Table(tableName:mytable_parquet, dbName:default,
> owner:foo, createTime:1431117741, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>,
> comment:from deserializer)], location:null,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
> parameters:{serialization.format=1, path=/user/foo/data.parquet}),
> bucketCols:[], sortCols:[], parameters:{},
> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
> skewedColValueLocationMaps:{})), partitionKeys:[],
> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet},
> viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)
>
> 15/05/08 20:42:21 ERROR hive.log: Got exception:
> org.apache.hadoop.security.AccessControlException Permission denied:
> user=foo, access=WRITE,
> inode="/user/hive/warehouse":hive:grp_gdoop_hdfs:drwxr-xr-x
> ----------------------------
>
>
> The permission error above happens because my linux user does not have
> write
> access on the default metastore path. I can workaround this issue if I use
> CREATE TEMPORARY TABLE and have no metadata written on disk.
>
> I would like to know if I am doing anything wrong here and if there is any
> additional property I can use to force the database/metastore_dir I need to
> write on.
>
> Thanks,
> Carlos
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/CREATE-TABLE-ignores-database-when-using-PARQUET-option-tp22824.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: CREATE TABLE ignores database when using PARQUET option

Reply via email to