Re: Create table from ORC or Parquet file?

Alexander Pivovarov Wed, 09 Dec 2015 10:45:09 -0800

E.g. in Spark SQL I can create temporary table from ORC, Parquet or json
files without specifying column names and types


val myDf = sqlContext.read.format("orc").load("s3n://alex/test/mytable_orc")

myDf.printSchema
root
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- rc_state: string (nullable = true)
 |-- rc_county_name: string (nullable = true)

myDf.registerTempTable("mytable")
val res = sqlContext.sql("""
  select rc_state, count(*) cnt
  from mytable
  group by rc_state
  order by rc_state""")

res.show(10)
+--------+---+
|rc_state|cnt|
+--------+---+
|      AK| 37|
|      AL|224|
|      AR|109|
|      AZ| 81|
|      CA|417|
|      CO|145|
|      CT| 71|
|      DC| 15|
|      DE| 27|
|      FL|452|
+--------+---+
only showing top 10 rows

Lots of companies switch to Spark for ETL. But Hive is still used by many
people, reporting tools or legacy solutions to select data from files
(tables) prepared by Spark.
It would be nice if Hive can create table based on ORC or Parquet file(s)
without specifying table columns and types. Integration with Spark output
will be easier.


On Wed, Dec 9, 2015 at 9:50 AM, Owen O'Malley <omal...@apache.org> wrote:

> So your use case is that you already have the ORC files and you want a
> table that can read those files without specifying the columns in the
> table? Obviously without the columns being specified Hive wouldn't be able
> to write to that table, so I assume you only care about reading it. Is that
> right?
>
> .. Owen
>
> On Wed, Dec 2, 2015 at 9:53 PM, Alexander Pivovarov <apivova...@gmail.com>
> wrote:
>
>> Hi Everyone
>>
>> Is it possible to create Hive table from ORC or Parquet file without
>> specifying field names and their types. ORC or Parquet files contain field
>> name and type information inside.
>>
>> Alex
>>
>
>

Re: Create table from ORC or Parquet file?

Reply via email to