We actually leave all the DDL commands up to hive, so there is no programatic way to access the things you are looking for.
On Thu, Oct 2, 2014 at 5:17 PM, Banias <calvi...@yahoo.com.invalid> wrote: > Hi, > > Would anybody know how to get the following information from HiveContext > given a Hive table name? > > - partition key(s) > - table directory > - input/output format > > I am new to Spark. And I have a couple tables created using Parquet data > like: > > CREATE EXTERNAL TABLE parquet_table ( > COL1 string, > COL2 string, > COL3 string > ) > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS > INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" > OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" > LOCATION '/user/foo/parquet_src'; > > and some of the tables have partitions. In my Spark Java code, I am able > to run queries using the HiveContext like: > > SparkConf sparkConf = new SparkConf().setAppName("example"); > JavaSparkContext ctx = new JavaSparkContext(sparkConf); > JavaHiveContext hiveCtx = new JavaHiveContext(ctx); > JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table"); > > Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other > cases partition key(s) programmatically through the HiveContext? > > The only way I know (pardon my ignorance) is to parse from the SchemaRDD > returned by hiveCtx.sql("describe extended parquet_table"); > > If anybody could shed some light on a better way, I would appreciate that. > Thanks :) > > -BC > >