Re: Announcing Delta Lake 0.2.0

ayan guha Fri, 21 Jun 2019 04:30:53 -0700

Hi

Thanks for confirmation. We are using the workaround to create a separate
Hive external table STORED AS PARQUET with the exact location of Delta
table. Our use case is batch-driven and we are running VACUUM with 0
retention after every batch is completed. Do you see any potential problem
with this workaround, other than during the time when the batch is running
the table can provide some wrong information?


Best
Ayan

On Fri, Jun 21, 2019 at 8:03 PM Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> @ayan guha <guha.a...@gmail.com> @Gourav Sengupta
> <gourav.sengu...@gmail.com>
> Delta Lake is OSS currently does not support defining tables in Hive
> metastore using DDL commands. We are hoping to add the necessary
> compatibility fixes in Apache Spark to make Delta Lake work with tables and
> DDL commands. So we will support them in a future release. In the meantime,
> please read/write Delta tables using paths.
>
> TD
>
> On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi Ayan,
>>
>> I may be wrong about this, but I think that Delta files are in Parquet
>> format. But I am sure that you have already checked this. Am I missing
>> something?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Fri, Jun 21, 2019 at 6:39 AM ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>> We used spark.sql to create a table using DELTA. We also have a hive
>>> metastore attached to the spark session. Hence, a table gets created in
>>> Hive metastore. We then tried to query the table from Hive. We faced
>>> following issues:
>>>
>>>    1. SERDE is SequenceFile, should have been Parquet
>>>    2. Scema fields are not passed.
>>>
>>> Essentially the hive DDL looks like:
>>>
>>> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
>>> deserializer')*
>>>
>>> *ROW FORMAT SERDE **
>>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>>
>>> *OUTPUTFORMAT **
>>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  *
>>> *LOCATION **  '* *WASB PATH'*
>>>
>>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>>> 'spark.sql.sources.provider'='DELTA',**
>>> 'spark.sql.sources.schema.numParts'='1',*
>>> *  
>>> 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>>> 'transient_lastDdlTime'='1556544657')*
>>>
>>> Is this expected? And will the use case be supported in future releases?
>>>
>>>
>>> We are now experimenting
>>>
>>> Best
>>>
>>> Ayan
>>>
>>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen....@databricks.com>
>>> wrote:
>>>
>>>> Hi James,
>>>>
>>>> Right now we don't have plans for having a catalog component as part of
>>>> Delta Lake, but we are looking to support Hive metastore and also DDL
>>>> commands in the near future.
>>>>
>>>> Thanks,
>>>> Liwen
>>>>
>>>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>>>> jamescotrots...@gmail.com> wrote:
>>>>
>>>>> Is there a plan to have a business catalog component for the Data
>>>>> Lake? If not how would someone make a proposal to create an open source
>>>>> project related to that. I would be interested in building out an open
>>>>> source data catalog that would use the Hive metadata store as a baseline
>>>>> for technical metadata.
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen....@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>>>
>>>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>>>
>>>>>> To view the release notes:
>>>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>>>
>>>>>> This release introduces two main features:
>>>>>>
>>>>>> *Cloud storage support*
>>>>>> In addition to HDFS, you can now configure Delta Lake to read and
>>>>>> write data on cloud storage services such as Amazon S3 and Azure Blob
>>>>>> Storage. For configuration instructions, please see:
>>>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>>>
>>>>>> *Improved concurrency*
>>>>>> Delta Lake now allows concurrent append-only writes while still
>>>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>>>
>>>>>> We have also greatly expanded the test coverage as part of this
>>>>>> release.
>>>>>>
>>>>>> We would like to acknowledge all community members for contributing
>>>>>> to this release.
>>>>>>
>>>>>> Best regards,
>>>>>> Liwen Sun
>>>>>>
>>>>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>

-- 
Best Regards,
Ayan Guha

Re: Announcing Delta Lake 0.2.0

Reply via email to