Hi Kaka & Doris Community,

Thank you for your suggestion. Following your advice, I manually placed the
"_orc_acid_version" file into the HDFS directory of Hive. I'm pleased to
report that this solution has resolved the issue, and the integration with
Doris is now functioning correctly.

Your guidance has been instrumental in finding a workaround for this
challenge. I appreciate your assistance and the valuable insights you've
provided.
Once again, thank you for your help, and I'm pleased to confirm that the
integration is now working as expected.

[image: image.png]
[image: image.png]

Best regards,
Joice

On Sun, Oct 8, 2023 at 7:48 AM kaka chen <kaka11.c...@gmail.com> wrote:

> Hi Joice,
>
> BTW,could you pls try to query it by Doris after putting the 
> "_orc_acid_version"
> file to the hdfs directory of hive manually? Thanks.
>
> Best regards,
> Kaka
>
> kaka chen <kaka11.c...@gmail.com> 于2023年10月8日周日 10:10写道:
>
>> Hi Joice,
>>
>> Got it, thanks very much. We will setup a cdp env and do some
>> compatibility work.
>>
>> Best regards,
>> Kaka
>>
>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月7日周六 14:58写道:
>>
>>> Hi,Dear Doris Community,
>>> I wanted to provide an update on my recent experience with integrating
>>> Doris with both Apache Hive and CDP Hive, as well as seek insights into a
>>> particular observation.
>>>
>>> In my integration with Apache Hive, I noticed that transaction tables
>>> retain the "_orc_acid_version" file, which aligns with the expected
>>> behavior. However, when I integrated with CDP Hive, I observed that this
>>> "_orc_acid_version" file was conspicuously absent.
>>>
>>> This discrepancy in behavior between the two environments has left me
>>> somewhat perplexed, and I'm keen to understand the underlying reasons
>>> behind it. It appears that in CDP Hive, *managed tables are set to be
>>> transactional by default*. Interestingly, when we create managed tables
>>> with the TBLPROPERTIES("transactional"="false") option, they are seemingly 
>>> *treated
>>> as external tables*.
>>>
>>> I've attached screenshots to illustrate this scenario for further
>>> clarity.
>>> CREATE  TABLE  BABY2(
>>> id int,
>>> FNAME VARCHAR(50),
>>> GENDER VARCHAR(2),
>>> TOTCOUNT INT
>>> )
>>> PARTITIONED BY(DATA_ID INT)
>>> TBLPROPERTIES (
>>> 'transactional'='false',
>>> 'orc.compress'='snappy');
>>>
>>> if we execute above statement table create like this.
>>> [image: image.png]
>>>
>>> I would greatly appreciate any insights, suggestions, or explanations
>>> that the Doris community may have regarding this behavior. Your expertise
>>> will be invaluable in helping me navigate this integration and address any
>>> associated challenges.
>>>
>>> Thank you in advance for your time and support. I look forward to
>>> hearing from you and to collaborating with the community to better
>>> understand and resolve this matter.
>>>
>>> Best regards,
>>> joice
>>>
>>> On Sat, Oct 7, 2023 at 11:46 AM kaka chen <kaka11.c...@gmail.com> wrote:
>>>
>>>> Hi Joice:
>>>> Thanks your reporting.
>>>>
>>>> It seems the root cause of this issue is missing "_orc_acid_version"
>>>> file.
>>>> From Hive version >= 3.0, delta/base files will always have file
>>>> '_orc_acid_version' with value >= '2'.
>>>> Maybe the hive3 of HDP has similar issue?
>>>> https://issues.apache.org/jira/browse/HIVE-16964
>>>>
>>>> A workaround is try to create table without transactional props.
>>>> TBLPROPERTIES("transactional"="true")
>>>> pls try it, thanks.
>>>>
>>>> Best regards,
>>>> Kaka
>>>>
>>>>
>>>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月6日周五 22:58写道:
>>>>
>>>> > Hi,
>>>> > I've checked the Hive data directory, and I couldn't find the
>>>> > "_orc_acid_version" file. I have attached screenshots for your
>>>> reference.
>>>> >
>>>> > Are there any recommended workarounds or alternative approaches that
>>>> I can
>>>> > consider to resolve this issue? I'm open to exploring different
>>>> solutions
>>>> > to ensure the successful integration of Doris with Hive in CDP. Any
>>>> > guidance or suggestions would be greatly appreciated.
>>>> >
>>>> > Screenshots attached for your reference.
>>>> >
>>>> > Best regards,
>>>> > Joice
>>>> >
>>>> > [image: image.png]
>>>> >
>>>> > On Fri, Oct 6, 2023 at 6:28 PM Mingyu Chen <morning...@163.com>
>>>> wrote:
>>>> >
>>>> >> Could you list your hive data dir, to see what files exist? like:
>>>> >> _orc_acid_version
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >>
>>>> >> Best Regards
>>>> >> Mingyu Chen
>>>> >>
>>>> >> Email:
>>>> >> morning...@apache.org
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> At 2023-10-05 22:31:58, "Mingyu Chen" <morning...@163.com> wrote:
>>>> >> >Oh, I see, let me check it again.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >--
>>>> >> >
>>>> >> >Best Regards
>>>> >> >Mingyu Chen
>>>> >> >
>>>> >> >Email:
>>>> >> >morning...@apache.org
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >At 2023-10-05 21:49:28, "Joice Jacob" <joicejacob1...@gmail.com>
>>>> wrote:
>>>> >> >>Hi,
>>>> >> >>I wanted to share some important information regarding the CDP Hive
>>>> >> version
>>>> >> >>that I am currently using, which is Hive 3.1.3000.7.1.7.0-551.
>>>> >> >>
>>>> >> >>In CDP Hive version 3.x, a significant change has been introduced
>>>> >> regarding
>>>> >> >>managed tables. By default, managed tables in Hive 3.x are
>>>> considered
>>>> >> >>transactional. This means that if you create a managed table
>>>> explicitly
>>>> >> >>specifying it as transactional with a value of "false," it will be
>>>> >> treated
>>>> >> >>as an external table instead.
>>>> >> >>Thank you for your attention to this matter, and I appreciate your
>>>> >> >>continued support.
>>>> >> >>
>>>> >>
>>>> https://www.thecodersstop.com/hadoop/apache-hive-3-changes-in-cdp-upgrade-part-1/
>>>> >> >>Best regards,
>>>> >> >>Joice
>>>> >> >>
>>>> >> >>On Thu, Oct 5, 2023 at 7:01 PM Mingyu Chen <morning...@163.com>
>>>> wrote:
>>>> >> >>
>>>> >> >>> I saw that your hive table is with property "transactional" =
>>>> "true",
>>>> >> >>> And Doris only support ACID table with Hive 3, not support with
>>>> Hive
>>>> >> 2.x.
>>>> >> >>> So you may need to create non-transactional hive table for Doris
>>>> to
>>>> >> visit.
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Best Regards
>>>> >> >>> Mingyu Chen
>>>> >> >>>
>>>> >> >>> Email:
>>>> >> >>> morning...@apache.org
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> 在 2023-10-05 13:30:25,"Joice Jacob" <joicejacob1...@gmail.com>
>>>> 写道:
>>>> >> >>>
>>>> >> >>> Dear Doris Community & Jiafeng.Zhang
>>>> >> >>> ,
>>>> >> >>> Thank you for your prompt response and your willingness to assist
>>>> >> with the
>>>> >> >>> issue I've been encountering with the integration of Doris and
>>>> >> Cloudera's
>>>> >> >>> Hive.
>>>> >> >>>
>>>> >> >>> To provide you with the necessary information, here are the
>>>> details
>>>> >> you
>>>> >> >>> requested:
>>>> >> >>>
>>>> >> >>> *1. Hive Catalog Creation Statement:*
>>>> >> >>>
>>>> >> >>> My Hive catalog creation statement is as follows:
>>>> >> >>>
>>>> >> >>> CREATE CATALOG hive PROPERTIES (
>>>> >> >>>     'type'='hms',
>>>> >> >>>     'hive.metastore.uris' = 'thrift://10.128.0.4:9083',
>>>> >> >>> 'hive.version' = '3.1.3',
>>>> >> >>>     'hive.metastore.sasl.enabled' = 'true',
>>>> >> >>>     'hive.metastore.kerberos.principal' = 'hive/
>>>> instanc...@hadoop.com
>>>> >> ',
>>>> >> >>>     'hadoop.security.authentication' = 'kerberos',
>>>> >> >>>     'hadoop.kerberos.keytab' =
>>>> '/home/techuser/doris/hive.keytab',
>>>> >> >>>     'hadoop.kerberos.principal' = 'hive/instanc...@hadoop.com',
>>>> >> >>>     'yarn.resourcemanager.principal' = 'yarn/
>>>> instanc...@hadoop.com'
>>>> >> >>> );
>>>> >> >>>
>>>> >> >>> *2. CDP Hive Version:*
>>>> >> >>>
>>>> >> >>> The CDP Hive version I am using is: Hive 3.1.3000.7.1.7.0-551
>>>> >> >>>
>>>> >> >>> *3. Configuration Attempt:*
>>>> >> >>>
>>>> >> >>> I have tried specifying the Hive version in my catalog
>>>> configuration
>>>> >> with
>>>> >> >>> the following statement:
>>>> >> >>> "hive.version=2.1.0"
>>>> >> >>> Despite attempting to set the Hive version to 2.1.0, I continue
>>>> to
>>>> >> >>> experience the same issue, which is detailed in my previous
>>>> emails.
>>>> >> >>>
>>>> >> >>> *4. Logs: *
>>>> >> >>> I will attach both the fe.log and fe.warn logs to this email for
>>>> your
>>>> >> >>> reference. These logs should provide additional context
>>>> regarding the
>>>> >> issue
>>>> >> >>> I'm facing.
>>>> >> >>>
>>>> >> >>> Once again, I want to express my gratitude for your assistance
>>>> and
>>>> >> support
>>>> >> >>> in resolving this matter. I look forward to your insights and
>>>> >> >>> recommendations based on the provided logs and catalog
>>>> configuration.
>>>> >> >>>
>>>> >> >>> Please feel free to let me know if you require any further
>>>> >> information or
>>>> >> >>> if there are additional steps I should take to assist in
>>>> diagnosing
>>>> >> and
>>>> >> >>> resolving the issue.
>>>> >> >>>
>>>> >> >>> [image: doris_error.JPG]
>>>> >> >>>  fe.log log at hive catlog creation time
>>>> >> >>> [image: fe_log_at_hivecatalog.JPG]
>>>> >> >>> Hive managed table script
>>>> >> >>> [image: image.png]
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> Best regards,
>>>> >> >>> Joice
>>>> >> >>>
>>>> >> >>> On Thu, Oct 5, 2023 at 8:10 AM Jiafeng.Zhang <zhang...@gmail.com
>>>> >
>>>> >> wrote:
>>>> >> >>>
>>>> >> >>>> Can you provide your hive catalog creation statement, your
>>>> fe.log
>>>> >> log at
>>>> >> >>>> that time, and your cdp hive version? This will help us locate
>>>> the
>>>> >> >>>> problem,
>>>> >> >>>> thank you.
>>>> >> >>>> You can also try specifying your hive version in your catalog
>>>> >> statement:
>>>> >> >>>> "hive.version=2.1.0"
>>>> >> >>>>
>>>> >> >>>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月5日周四 00:45写道:
>>>> >> >>>>
>>>> >> >>>> > I am reaching out once again to seek assistance and share a
>>>> >> specific
>>>> >> >>>> issue
>>>> >> >>>> > I've encountered while integrating Doris with
>>>> Cloudera-flavored
>>>> >> Hive.
>>>> >> >>>> The
>>>> >> >>>> > error message I'm facing is as follows:
>>>> >> >>>> > detailMessage = get file split failed for table: baby1, err:
>>>> >> >>>> > java.lang.Exception: Hive 2.x versioned full-acid tables need
>>>> to
>>>> >> run
>>>> >> >>>> major
>>>> >> >>>> > compaction.
>>>> >> >>>> >
>>>> >> >>>> > This error message appears when I attempt to query a Hive
>>>> table
>>>> >> from
>>>> >> >>>> > Doris, and it seems to be related to Hive 2.x versioned
>>>> full-ACID
>>>> >> tables
>>>> >> >>>> > requiring a major compaction.
>>>> >> >>>> > I would like to ask the Doris community for guidance on how to
>>>> >> handle
>>>> >> >>>> this
>>>> >> >>>> > issue effectively. Specifically, I am interested in
>>>> understanding
>>>> >> the
>>>> >> >>>> best
>>>> >> >>>> > practices and steps to follow when dealing with
>>>> Cloudera-flavored
>>>> >> Hive
>>>> >> >>>> > tables that require major compaction for Doris integration.
>>>> >> >>>> >
>>>> >> >>>> > If anyone in the community has successfully addressed this
>>>> issue
>>>> >> or can
>>>> >> >>>> > provide insights into how to configure and manage
>>>> >> Cloudera-flavored Hive
>>>> >> >>>> > tables for integration with Doris, your expertise would be
>>>> highly
>>>> >> >>>> > appreciated.
>>>> >> >>>> >
>>>> >> >>>> > Thank you for your time and support, and I look forward to
>>>> >> receiving
>>>> >> >>>> your
>>>> >> >>>> > valuable input.
>>>> >> >>>> > Joice
>>>> >> >>>> >
>>>> >> >>>> > On Wed, Oct 4, 2023 at 9:23 PM Joice Jacob <
>>>> >> joicejacob1...@gmail.com>
>>>> >> >>>> > wrote:
>>>> >> >>>> >
>>>> >> >>>> >> Dear Doris Community,
>>>> >> >>>> >> I have an update on the issue I previously mentioned
>>>> regarding the
>>>> >> >>>> >> integration of Hive on a CDP distribution with Doris 2.0.1.1.
>>>> >> >>>> >>
>>>> >> >>>> >> After further investigation, I have identified that the
>>>> issue is
>>>> >> >>>> related
>>>> >> >>>> >> to Hive managed tables being transactional by default. This
>>>> >> appears to
>>>> >> >>>> be
>>>> >> >>>> >> causing the error I encountered earlier.
>>>> >> >>>> >>
>>>> >> >>>> >> To address this issue, I am seeking guidance from the
>>>> community
>>>> >> on any
>>>> >> >>>> >> specific configurations or settings that need to be adjusted
>>>> for
>>>> >> Hive
>>>> >> >>>> >> transactional tables when using Hive as the catalog in
>>>> Doris. Are
>>>> >> >>>> there any
>>>> >> >>>> >> recommended configurations or best practices that I should
>>>> follow
>>>> >> to
>>>> >> >>>> ensure
>>>> >> >>>> >> smooth integration and query execution?
>>>> >> >>>> >>
>>>> >> >>>> >> Any insights or recommendations from the Doris community
>>>> would be
>>>> >> >>>> greatly
>>>> >> >>>> >> appreciated. Your expertise and guidance will be
>>>> instrumental in
>>>> >> >>>> helping me
>>>> >> >>>> >> resolve this challenge.
>>>> >> >>>> >>
>>>> >> >>>> >> Thank you for your continued support, and I look forward to
>>>> your
>>>> >> >>>> valuable
>>>> >> >>>> >> input.
>>>> >> >>>> >>
>>>> >> >>>> >> Best regards,
>>>> >> >>>> >> Joice
>>>> >> >>>> >>
>>>> >> >>>> >> On Wed, Oct 4, 2023 at 6:54 PM Joice Jacob <
>>>> >> joicejacob1...@gmail.com>
>>>> >> >>>> >> wrote:
>>>> >> >>>> >>
>>>> >> >>>> >>> Dear Doris Community,
>>>> >> >>>> >>>
>>>> >> >>>> >>> I am reaching out to the community to seek assistance with
>>>> an
>>>> >> >>>> >>> integration issue I've encountered while trying to use Hive
>>>> on a
>>>> >> CDP
>>>> >> >>>> >>> distribution with Doris 2.0.1.1.
>>>> >> >>>> >>>
>>>> >> >>>> >>> Here are the details of my setup:
>>>> >> >>>> >>>
>>>> >> >>>> >>> Doris Version: 2.0.1.1
>>>> >> >>>> >>> Hive Version: 3.1.3
>>>> >> >>>> >>> Cluster Security: Kerberized
>>>> >> >>>> >>>
>>>> >> >>>> >>> I have successfully created a Hive catalog in Doris and have
>>>> >> been able
>>>> >> >>>> >>> to set up the integration between Hive and Doris. However,
>>>> when I
>>>> >> >>>> attempt
>>>> >> >>>> >>> to query a Hive table using Doris, I encounter the following
>>>> >> error:
>>>> >> >>>> >>>
>>>> >> >>>> >>> ERROR 1105 (HY000): errCode = 2, detailMessage = get file
>>>> split
>>>> >> failed
>>>> >> >>>> >>> for table: baby1, err: java.lang.Exception: Hive 2.x
>>>> versioned
>>>> >> >>>> full-acid
>>>> >> >>>> >>> tables need to run major compaction.
>>>> >> >>>> >>>
>>>> >> >>>> >>> I have already performed a major compaction as recommended,
>>>> but
>>>> >> I am
>>>> >> >>>> >>> still encountering the same error.
>>>> >> >>>> >>>
>>>> >> >>>> >>> I would greatly appreciate any insights, guidance, or
>>>> solutions
>>>> >> that
>>>> >> >>>> the
>>>> >> >>>> >>> Doris community can offer to help me resolve this issue. If
>>>> >> anyone has
>>>> >> >>>> >>> encountered a similar problem or has expertise in
>>>> integrating
>>>> >> Hive
>>>> >> >>>> with
>>>> >> >>>> >>> Doris, your assistance would be invaluable.
>>>> >> >>>> >>>
>>>> >> >>>> >>> Thank you in advance for your time and support. I look
>>>> forward to
>>>> >> >>>> >>> hearing from the community and working together to find a
>>>> >> solution to
>>>> >> >>>> this
>>>> >> >>>> >>> challenge.
>>>> >> >>>> >>>
>>>> >> >>>> >>> [image: baby_table.JPG]
>>>> >> >>>> >>> [image: doris_hive_catlog_result.JPG]
>>>> >> >>>> >>>
>>>> >> >>>> >>> [image: doris_fe_log.JPG]
>>>> >> >>>> >>>
>>>> >> >>>> >>> [image: hive_metastore_error.JPG]
>>>> >> >>>> >>>
>>>> >> >>>> >>> Thanks
>>>> >> >>>> >>> Joice
>>>> >> >>>> >>>
>>>> >> >>>> >>>
>>>> >> >>>> >>>
>>>> >> >>>> >>>
>>>> >> >>>> >>>
>>>> >> >>>>
>>>> >> >>>> --
>>>> >> >>>> 张家峰
>>>> >> >>>>
>>>> >> >>>
>>>> >>
>>>> >
>>>>
>>>

Reply via email to