Re: Getting table info from HiveContext

Michael Armbrust Thu, 02 Oct 2014 18:42:40 -0700

We actually leave all the DDL commands up to hive, so there is no
programatic way to access the things you are looking for.


On Thu, Oct 2, 2014 at 5:17 PM, Banias <calvi...@yahoo.com.invalid> wrote:

> Hi,
>
> Would anybody know how to get the following information from HiveContext
> given a Hive table name?
>
> - partition key(s)
> - table directory
> - input/output format
>
> I am new to Spark. And I have a couple tables created using Parquet data
> like:
>
> CREATE EXTERNAL TABLE parquet_table (
> COL1 string,
> COL2 string,
> COL3 string
> )
> ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
> STORED AS
> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
> LOCATION '/user/foo/parquet_src';
>
> and some of the tables have partitions. In my Spark Java code, I am able
> to run queries using the HiveContext like:
>
> SparkConf sparkConf = new SparkConf().setAppName("example");
> JavaSparkContext ctx = new JavaSparkContext(sparkConf);
> JavaHiveContext hiveCtx = new JavaHiveContext(ctx);
> JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table");
>
> Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other
> cases partition key(s) programmatically through the HiveContext?
>
> The only way I know (pardon my ignorance) is to parse from the SchemaRDD
> returned by hiveCtx.sql("describe extended parquet_table");
>
> If anybody could shed some light on a better way, I would appreciate that.
> Thanks :)
>
> -BC
>
>

Re: Getting table info from HiveContext

Reply via email to