Hi We used spark.sql to create a table using DELTA. We also have a hive metastore attached to the spark session. Hence, a table gets created in Hive metastore. We then tried to query the table from Hive. We faced following issues:
1. SERDE is SequenceFile, should have been Parquet 2. Scema fields are not passed. Essentially the hive DDL looks like: *CREATE TABLE `TABLE NAME`(** `col` array<string> COMMENT 'from deserializer')* *ROW FORMAT SERDE ** 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH SERDEPROPERTIES ( ** 'path'=WASB PATH**') **STORED AS INPUTFORMAT * * 'org.apache.hadoop.mapred.SequenceFileInputFormat'* *OUTPUTFORMAT ** 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' **LOCATION ** '* *WASB PATH'* *TBLPROPERTIES ( ** 'spark.sql.create.version'='2.4.0',** 'spark.sql.sources.provider'='DELTA',** 'spark.sql.sources.schema.numParts'='1',* * 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',** 'transient_lastDdlTime'='1556544657')* Is this expected? And will the use case be supported in future releases? We are now experimenting Best Ayan On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen....@databricks.com> wrote: > Hi James, > > Right now we don't have plans for having a catalog component as part of > Delta Lake, but we are looking to support Hive metastore and also DDL > commands in the near future. > > Thanks, > Liwen > > On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios < > jamescotrots...@gmail.com> wrote: > >> Is there a plan to have a business catalog component for the Data Lake? >> If not how would someone make a proposal to create an open source project >> related to that. I would be interested in building out an open source data >> catalog that would use the Hive metadata store as a baseline for technical >> metadata. >> >> >> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen....@databricks.com> >> wrote: >> >>> We are delighted to announce the availability of Delta Lake 0.2.0! >>> >>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart: >>> https://docs.delta.io/0.2.0/quick-start.html >>> >>> To view the release notes: >>> https://github.com/delta-io/delta/releases/tag/v0.2.0 >>> >>> This release introduces two main features: >>> >>> *Cloud storage support* >>> In addition to HDFS, you can now configure Delta Lake to read and write >>> data on cloud storage services such as Amazon S3 and Azure Blob Storage. >>> For configuration instructions, please see: >>> https://docs.delta.io/0.2.0/delta-storage.html >>> >>> *Improved concurrency* >>> Delta Lake now allows concurrent append-only writes while still ensuring >>> serializability. For concurrency control in Delta Lake, please see: >>> https://docs.delta.io/0.2.0/delta-concurrency.html >>> >>> We have also greatly expanded the test coverage as part of this release. >>> >>> We would like to acknowledge all community members for contributing to >>> this release. >>> >>> Best regards, >>> Liwen Sun >>> >>> -- Best Regards, Ayan Guha