We actually leave all the DDL commands up to hive, so there is no
programatic way to access the things you are looking for.

On Thu, Oct 2, 2014 at 5:17 PM, Banias <calvi...@yahoo.com.invalid> wrote:

> Hi,
>
> Would anybody know how to get the following information from HiveContext
> given a Hive table name?
>
> - partition key(s)
> - table directory
> - input/output format
>
> I am new to Spark. And I have a couple tables created using Parquet data
> like:
>
> CREATE EXTERNAL TABLE parquet_table (
> COL1 string,
> COL2 string,
> COL3 string
> )
> ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
> STORED AS
> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
> LOCATION '/user/foo/parquet_src';
>
> and some of the tables have partitions. In my Spark Java code, I am able
> to run queries using the HiveContext like:
>
> SparkConf sparkConf = new SparkConf().setAppName("example");
> JavaSparkContext ctx = new JavaSparkContext(sparkConf);
> JavaHiveContext hiveCtx = new JavaHiveContext(ctx);
> JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table");
>
> Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other
> cases partition key(s) programmatically through the HiveContext?
>
> The only way I know (pardon my ignorance) is to parse from the SchemaRDD
> returned by hiveCtx.sql("describe extended parquet_table");
>
> If anybody could shed some light on a better way, I would appreciate that.
> Thanks :)
>
> -BC
>
>

Reply via email to