Hi Joice,

Got it, thanks very much. We will setup a cdp env and do some compatibility
work.

Best regards,
Kaka

Joice Jacob <joicejacob1...@gmail.com> 于2023年10月7日周六 14:58写道:

> Hi,Dear Doris Community,
> I wanted to provide an update on my recent experience with integrating
> Doris with both Apache Hive and CDP Hive, as well as seek insights into a
> particular observation.
>
> In my integration with Apache Hive, I noticed that transaction tables
> retain the "_orc_acid_version" file, which aligns with the expected
> behavior. However, when I integrated with CDP Hive, I observed that this
> "_orc_acid_version" file was conspicuously absent.
>
> This discrepancy in behavior between the two environments has left me
> somewhat perplexed, and I'm keen to understand the underlying reasons
> behind it. It appears that in CDP Hive, *managed tables are set to be
> transactional by default*. Interestingly, when we create managed tables
> with the TBLPROPERTIES("transactional"="false") option, they are seemingly 
> *treated
> as external tables*.
>
> I've attached screenshots to illustrate this scenario for further clarity.
> CREATE  TABLE  BABY2(
> id int,
> FNAME VARCHAR(50),
> GENDER VARCHAR(2),
> TOTCOUNT INT
> )
> PARTITIONED BY(DATA_ID INT)
> TBLPROPERTIES (
> 'transactional'='false',
> 'orc.compress'='snappy');
>
> if we execute above statement table create like this.
> [image: image.png]
>
> I would greatly appreciate any insights, suggestions, or explanations that
> the Doris community may have regarding this behavior. Your expertise will
> be invaluable in helping me navigate this integration and address any
> associated challenges.
>
> Thank you in advance for your time and support. I look forward to hearing
> from you and to collaborating with the community to better understand and
> resolve this matter.
>
> Best regards,
> joice
>
> On Sat, Oct 7, 2023 at 11:46 AM kaka chen <kaka11.c...@gmail.com> wrote:
>
>> Hi Joice:
>> Thanks your reporting.
>>
>> It seems the root cause of this issue is missing "_orc_acid_version" file.
>> From Hive version >= 3.0, delta/base files will always have file
>> '_orc_acid_version' with value >= '2'.
>> Maybe the hive3 of HDP has similar issue?
>> https://issues.apache.org/jira/browse/HIVE-16964
>>
>> A workaround is try to create table without transactional props.
>> TBLPROPERTIES("transactional"="true")
>> pls try it, thanks.
>>
>> Best regards,
>> Kaka
>>
>>
>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月6日周五 22:58写道:
>>
>> > Hi,
>> > I've checked the Hive data directory, and I couldn't find the
>> > "_orc_acid_version" file. I have attached screenshots for your
>> reference.
>> >
>> > Are there any recommended workarounds or alternative approaches that I
>> can
>> > consider to resolve this issue? I'm open to exploring different
>> solutions
>> > to ensure the successful integration of Doris with Hive in CDP. Any
>> > guidance or suggestions would be greatly appreciated.
>> >
>> > Screenshots attached for your reference.
>> >
>> > Best regards,
>> > Joice
>> >
>> > [image: image.png]
>> >
>> > On Fri, Oct 6, 2023 at 6:28 PM Mingyu Chen <morning...@163.com> wrote:
>> >
>> >> Could you list your hive data dir, to see what files exist? like:
>> >> _orc_acid_version
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best Regards
>> >> Mingyu Chen
>> >>
>> >> Email:
>> >> morning...@apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2023-10-05 22:31:58, "Mingyu Chen" <morning...@163.com> wrote:
>> >> >Oh, I see, let me check it again.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >
>> >> >Best Regards
>> >> >Mingyu Chen
>> >> >
>> >> >Email:
>> >> >morning...@apache.org
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >At 2023-10-05 21:49:28, "Joice Jacob" <joicejacob1...@gmail.com>
>> wrote:
>> >> >>Hi,
>> >> >>I wanted to share some important information regarding the CDP Hive
>> >> version
>> >> >>that I am currently using, which is Hive 3.1.3000.7.1.7.0-551.
>> >> >>
>> >> >>In CDP Hive version 3.x, a significant change has been introduced
>> >> regarding
>> >> >>managed tables. By default, managed tables in Hive 3.x are considered
>> >> >>transactional. This means that if you create a managed table
>> explicitly
>> >> >>specifying it as transactional with a value of "false," it will be
>> >> treated
>> >> >>as an external table instead.
>> >> >>Thank you for your attention to this matter, and I appreciate your
>> >> >>continued support.
>> >> >>
>> >>
>> https://www.thecodersstop.com/hadoop/apache-hive-3-changes-in-cdp-upgrade-part-1/
>> >> >>Best regards,
>> >> >>Joice
>> >> >>
>> >> >>On Thu, Oct 5, 2023 at 7:01 PM Mingyu Chen <morning...@163.com>
>> wrote:
>> >> >>
>> >> >>> I saw that your hive table is with property "transactional" =
>> "true",
>> >> >>> And Doris only support ACID table with Hive 3, not support with
>> Hive
>> >> 2.x.
>> >> >>> So you may need to create non-transactional hive table for Doris to
>> >> visit.
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Best Regards
>> >> >>> Mingyu Chen
>> >> >>>
>> >> >>> Email:
>> >> >>> morning...@apache.org
>> >> >>>
>> >> >>>
>> >> >>> 在 2023-10-05 13:30:25,"Joice Jacob" <joicejacob1...@gmail.com> 写道:
>> >> >>>
>> >> >>> Dear Doris Community & Jiafeng.Zhang
>> >> >>> ,
>> >> >>> Thank you for your prompt response and your willingness to assist
>> >> with the
>> >> >>> issue I've been encountering with the integration of Doris and
>> >> Cloudera's
>> >> >>> Hive.
>> >> >>>
>> >> >>> To provide you with the necessary information, here are the details
>> >> you
>> >> >>> requested:
>> >> >>>
>> >> >>> *1. Hive Catalog Creation Statement:*
>> >> >>>
>> >> >>> My Hive catalog creation statement is as follows:
>> >> >>>
>> >> >>> CREATE CATALOG hive PROPERTIES (
>> >> >>>     'type'='hms',
>> >> >>>     'hive.metastore.uris' = 'thrift://10.128.0.4:9083',
>> >> >>> 'hive.version' = '3.1.3',
>> >> >>>     'hive.metastore.sasl.enabled' = 'true',
>> >> >>>     'hive.metastore.kerberos.principal' = 'hive/
>> instanc...@hadoop.com
>> >> ',
>> >> >>>     'hadoop.security.authentication' = 'kerberos',
>> >> >>>     'hadoop.kerberos.keytab' = '/home/techuser/doris/hive.keytab',
>> >> >>>     'hadoop.kerberos.principal' = 'hive/instanc...@hadoop.com',
>> >> >>>     'yarn.resourcemanager.principal' = 'yarn/instanc...@hadoop.com
>> '
>> >> >>> );
>> >> >>>
>> >> >>> *2. CDP Hive Version:*
>> >> >>>
>> >> >>> The CDP Hive version I am using is: Hive 3.1.3000.7.1.7.0-551
>> >> >>>
>> >> >>> *3. Configuration Attempt:*
>> >> >>>
>> >> >>> I have tried specifying the Hive version in my catalog
>> configuration
>> >> with
>> >> >>> the following statement:
>> >> >>> "hive.version=2.1.0"
>> >> >>> Despite attempting to set the Hive version to 2.1.0, I continue to
>> >> >>> experience the same issue, which is detailed in my previous emails.
>> >> >>>
>> >> >>> *4. Logs: *
>> >> >>> I will attach both the fe.log and fe.warn logs to this email for
>> your
>> >> >>> reference. These logs should provide additional context regarding
>> the
>> >> issue
>> >> >>> I'm facing.
>> >> >>>
>> >> >>> Once again, I want to express my gratitude for your assistance and
>> >> support
>> >> >>> in resolving this matter. I look forward to your insights and
>> >> >>> recommendations based on the provided logs and catalog
>> configuration.
>> >> >>>
>> >> >>> Please feel free to let me know if you require any further
>> >> information or
>> >> >>> if there are additional steps I should take to assist in diagnosing
>> >> and
>> >> >>> resolving the issue.
>> >> >>>
>> >> >>> [image: doris_error.JPG]
>> >> >>>  fe.log log at hive catlog creation time
>> >> >>> [image: fe_log_at_hivecatalog.JPG]
>> >> >>> Hive managed table script
>> >> >>> [image: image.png]
>> >> >>>
>> >> >>>
>> >> >>> Best regards,
>> >> >>> Joice
>> >> >>>
>> >> >>> On Thu, Oct 5, 2023 at 8:10 AM Jiafeng.Zhang <zhang...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Can you provide your hive catalog creation statement, your fe.log
>> >> log at
>> >> >>>> that time, and your cdp hive version? This will help us locate the
>> >> >>>> problem,
>> >> >>>> thank you.
>> >> >>>> You can also try specifying your hive version in your catalog
>> >> statement:
>> >> >>>> "hive.version=2.1.0"
>> >> >>>>
>> >> >>>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月5日周四 00:45写道:
>> >> >>>>
>> >> >>>> > I am reaching out once again to seek assistance and share a
>> >> specific
>> >> >>>> issue
>> >> >>>> > I've encountered while integrating Doris with Cloudera-flavored
>> >> Hive.
>> >> >>>> The
>> >> >>>> > error message I'm facing is as follows:
>> >> >>>> > detailMessage = get file split failed for table: baby1, err:
>> >> >>>> > java.lang.Exception: Hive 2.x versioned full-acid tables need to
>> >> run
>> >> >>>> major
>> >> >>>> > compaction.
>> >> >>>> >
>> >> >>>> > This error message appears when I attempt to query a Hive table
>> >> from
>> >> >>>> > Doris, and it seems to be related to Hive 2.x versioned
>> full-ACID
>> >> tables
>> >> >>>> > requiring a major compaction.
>> >> >>>> > I would like to ask the Doris community for guidance on how to
>> >> handle
>> >> >>>> this
>> >> >>>> > issue effectively. Specifically, I am interested in
>> understanding
>> >> the
>> >> >>>> best
>> >> >>>> > practices and steps to follow when dealing with
>> Cloudera-flavored
>> >> Hive
>> >> >>>> > tables that require major compaction for Doris integration.
>> >> >>>> >
>> >> >>>> > If anyone in the community has successfully addressed this issue
>> >> or can
>> >> >>>> > provide insights into how to configure and manage
>> >> Cloudera-flavored Hive
>> >> >>>> > tables for integration with Doris, your expertise would be
>> highly
>> >> >>>> > appreciated.
>> >> >>>> >
>> >> >>>> > Thank you for your time and support, and I look forward to
>> >> receiving
>> >> >>>> your
>> >> >>>> > valuable input.
>> >> >>>> > Joice
>> >> >>>> >
>> >> >>>> > On Wed, Oct 4, 2023 at 9:23 PM Joice Jacob <
>> >> joicejacob1...@gmail.com>
>> >> >>>> > wrote:
>> >> >>>> >
>> >> >>>> >> Dear Doris Community,
>> >> >>>> >> I have an update on the issue I previously mentioned regarding
>> the
>> >> >>>> >> integration of Hive on a CDP distribution with Doris 2.0.1.1.
>> >> >>>> >>
>> >> >>>> >> After further investigation, I have identified that the issue
>> is
>> >> >>>> related
>> >> >>>> >> to Hive managed tables being transactional by default. This
>> >> appears to
>> >> >>>> be
>> >> >>>> >> causing the error I encountered earlier.
>> >> >>>> >>
>> >> >>>> >> To address this issue, I am seeking guidance from the community
>> >> on any
>> >> >>>> >> specific configurations or settings that need to be adjusted
>> for
>> >> Hive
>> >> >>>> >> transactional tables when using Hive as the catalog in Doris.
>> Are
>> >> >>>> there any
>> >> >>>> >> recommended configurations or best practices that I should
>> follow
>> >> to
>> >> >>>> ensure
>> >> >>>> >> smooth integration and query execution?
>> >> >>>> >>
>> >> >>>> >> Any insights or recommendations from the Doris community would
>> be
>> >> >>>> greatly
>> >> >>>> >> appreciated. Your expertise and guidance will be instrumental
>> in
>> >> >>>> helping me
>> >> >>>> >> resolve this challenge.
>> >> >>>> >>
>> >> >>>> >> Thank you for your continued support, and I look forward to
>> your
>> >> >>>> valuable
>> >> >>>> >> input.
>> >> >>>> >>
>> >> >>>> >> Best regards,
>> >> >>>> >> Joice
>> >> >>>> >>
>> >> >>>> >> On Wed, Oct 4, 2023 at 6:54 PM Joice Jacob <
>> >> joicejacob1...@gmail.com>
>> >> >>>> >> wrote:
>> >> >>>> >>
>> >> >>>> >>> Dear Doris Community,
>> >> >>>> >>>
>> >> >>>> >>> I am reaching out to the community to seek assistance with an
>> >> >>>> >>> integration issue I've encountered while trying to use Hive
>> on a
>> >> CDP
>> >> >>>> >>> distribution with Doris 2.0.1.1.
>> >> >>>> >>>
>> >> >>>> >>> Here are the details of my setup:
>> >> >>>> >>>
>> >> >>>> >>> Doris Version: 2.0.1.1
>> >> >>>> >>> Hive Version: 3.1.3
>> >> >>>> >>> Cluster Security: Kerberized
>> >> >>>> >>>
>> >> >>>> >>> I have successfully created a Hive catalog in Doris and have
>> >> been able
>> >> >>>> >>> to set up the integration between Hive and Doris. However,
>> when I
>> >> >>>> attempt
>> >> >>>> >>> to query a Hive table using Doris, I encounter the following
>> >> error:
>> >> >>>> >>>
>> >> >>>> >>> ERROR 1105 (HY000): errCode = 2, detailMessage = get file
>> split
>> >> failed
>> >> >>>> >>> for table: baby1, err: java.lang.Exception: Hive 2.x versioned
>> >> >>>> full-acid
>> >> >>>> >>> tables need to run major compaction.
>> >> >>>> >>>
>> >> >>>> >>> I have already performed a major compaction as recommended,
>> but
>> >> I am
>> >> >>>> >>> still encountering the same error.
>> >> >>>> >>>
>> >> >>>> >>> I would greatly appreciate any insights, guidance, or
>> solutions
>> >> that
>> >> >>>> the
>> >> >>>> >>> Doris community can offer to help me resolve this issue. If
>> >> anyone has
>> >> >>>> >>> encountered a similar problem or has expertise in integrating
>> >> Hive
>> >> >>>> with
>> >> >>>> >>> Doris, your assistance would be invaluable.
>> >> >>>> >>>
>> >> >>>> >>> Thank you in advance for your time and support. I look
>> forward to
>> >> >>>> >>> hearing from the community and working together to find a
>> >> solution to
>> >> >>>> this
>> >> >>>> >>> challenge.
>> >> >>>> >>>
>> >> >>>> >>> [image: baby_table.JPG]
>> >> >>>> >>> [image: doris_hive_catlog_result.JPG]
>> >> >>>> >>>
>> >> >>>> >>> [image: doris_fe_log.JPG]
>> >> >>>> >>>
>> >> >>>> >>> [image: hive_metastore_error.JPG]
>> >> >>>> >>>
>> >> >>>> >>> Thanks
>> >> >>>> >>> Joice
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> 张家峰
>> >> >>>>
>> >> >>>
>> >>
>> >
>>
>

Reply via email to