Hi Joice, BTW,could you pls try to query it by Doris after putting the "_orc_acid_version" file to the hdfs directory of hive manually? Thanks.
Best regards, Kaka kaka chen <kaka11.c...@gmail.com> 于2023年10月8日周日 10:10写道: > Hi Joice, > > Got it, thanks very much. We will setup a cdp env and do some > compatibility work. > > Best regards, > Kaka > > Joice Jacob <joicejacob1...@gmail.com> 于2023年10月7日周六 14:58写道: > >> Hi,Dear Doris Community, >> I wanted to provide an update on my recent experience with integrating >> Doris with both Apache Hive and CDP Hive, as well as seek insights into a >> particular observation. >> >> In my integration with Apache Hive, I noticed that transaction tables >> retain the "_orc_acid_version" file, which aligns with the expected >> behavior. However, when I integrated with CDP Hive, I observed that this >> "_orc_acid_version" file was conspicuously absent. >> >> This discrepancy in behavior between the two environments has left me >> somewhat perplexed, and I'm keen to understand the underlying reasons >> behind it. It appears that in CDP Hive, *managed tables are set to be >> transactional by default*. Interestingly, when we create managed tables >> with the TBLPROPERTIES("transactional"="false") option, they are seemingly >> *treated >> as external tables*. >> >> I've attached screenshots to illustrate this scenario for further clarity. >> CREATE TABLE BABY2( >> id int, >> FNAME VARCHAR(50), >> GENDER VARCHAR(2), >> TOTCOUNT INT >> ) >> PARTITIONED BY(DATA_ID INT) >> TBLPROPERTIES ( >> 'transactional'='false', >> 'orc.compress'='snappy'); >> >> if we execute above statement table create like this. >> [image: image.png] >> >> I would greatly appreciate any insights, suggestions, or explanations >> that the Doris community may have regarding this behavior. Your expertise >> will be invaluable in helping me navigate this integration and address any >> associated challenges. >> >> Thank you in advance for your time and support. I look forward to hearing >> from you and to collaborating with the community to better understand and >> resolve this matter. >> >> Best regards, >> joice >> >> On Sat, Oct 7, 2023 at 11:46 AM kaka chen <kaka11.c...@gmail.com> wrote: >> >>> Hi Joice: >>> Thanks your reporting. >>> >>> It seems the root cause of this issue is missing "_orc_acid_version" >>> file. >>> From Hive version >= 3.0, delta/base files will always have file >>> '_orc_acid_version' with value >= '2'. >>> Maybe the hive3 of HDP has similar issue? >>> https://issues.apache.org/jira/browse/HIVE-16964 >>> >>> A workaround is try to create table without transactional props. >>> TBLPROPERTIES("transactional"="true") >>> pls try it, thanks. >>> >>> Best regards, >>> Kaka >>> >>> >>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月6日周五 22:58写道: >>> >>> > Hi, >>> > I've checked the Hive data directory, and I couldn't find the >>> > "_orc_acid_version" file. I have attached screenshots for your >>> reference. >>> > >>> > Are there any recommended workarounds or alternative approaches that I >>> can >>> > consider to resolve this issue? I'm open to exploring different >>> solutions >>> > to ensure the successful integration of Doris with Hive in CDP. Any >>> > guidance or suggestions would be greatly appreciated. >>> > >>> > Screenshots attached for your reference. >>> > >>> > Best regards, >>> > Joice >>> > >>> > [image: image.png] >>> > >>> > On Fri, Oct 6, 2023 at 6:28 PM Mingyu Chen <morning...@163.com> wrote: >>> > >>> >> Could you list your hive data dir, to see what files exist? like: >>> >> _orc_acid_version >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> Best Regards >>> >> Mingyu Chen >>> >> >>> >> Email: >>> >> morning...@apache.org >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> At 2023-10-05 22:31:58, "Mingyu Chen" <morning...@163.com> wrote: >>> >> >Oh, I see, let me check it again. >>> >> > >>> >> > >>> >> > >>> >> > >>> >> >-- >>> >> > >>> >> >Best Regards >>> >> >Mingyu Chen >>> >> > >>> >> >Email: >>> >> >morning...@apache.org >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> >At 2023-10-05 21:49:28, "Joice Jacob" <joicejacob1...@gmail.com> >>> wrote: >>> >> >>Hi, >>> >> >>I wanted to share some important information regarding the CDP Hive >>> >> version >>> >> >>that I am currently using, which is Hive 3.1.3000.7.1.7.0-551. >>> >> >> >>> >> >>In CDP Hive version 3.x, a significant change has been introduced >>> >> regarding >>> >> >>managed tables. By default, managed tables in Hive 3.x are >>> considered >>> >> >>transactional. This means that if you create a managed table >>> explicitly >>> >> >>specifying it as transactional with a value of "false," it will be >>> >> treated >>> >> >>as an external table instead. >>> >> >>Thank you for your attention to this matter, and I appreciate your >>> >> >>continued support. >>> >> >> >>> >> >>> https://www.thecodersstop.com/hadoop/apache-hive-3-changes-in-cdp-upgrade-part-1/ >>> >> >>Best regards, >>> >> >>Joice >>> >> >> >>> >> >>On Thu, Oct 5, 2023 at 7:01 PM Mingyu Chen <morning...@163.com> >>> wrote: >>> >> >> >>> >> >>> I saw that your hive table is with property "transactional" = >>> "true", >>> >> >>> And Doris only support ACID table with Hive 3, not support with >>> Hive >>> >> 2.x. >>> >> >>> So you may need to create non-transactional hive table for Doris >>> to >>> >> visit. >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> Best Regards >>> >> >>> Mingyu Chen >>> >> >>> >>> >> >>> Email: >>> >> >>> morning...@apache.org >>> >> >>> >>> >> >>> >>> >> >>> 在 2023-10-05 13:30:25,"Joice Jacob" <joicejacob1...@gmail.com> >>> 写道: >>> >> >>> >>> >> >>> Dear Doris Community & Jiafeng.Zhang >>> >> >>> , >>> >> >>> Thank you for your prompt response and your willingness to assist >>> >> with the >>> >> >>> issue I've been encountering with the integration of Doris and >>> >> Cloudera's >>> >> >>> Hive. >>> >> >>> >>> >> >>> To provide you with the necessary information, here are the >>> details >>> >> you >>> >> >>> requested: >>> >> >>> >>> >> >>> *1. Hive Catalog Creation Statement:* >>> >> >>> >>> >> >>> My Hive catalog creation statement is as follows: >>> >> >>> >>> >> >>> CREATE CATALOG hive PROPERTIES ( >>> >> >>> 'type'='hms', >>> >> >>> 'hive.metastore.uris' = 'thrift://10.128.0.4:9083', >>> >> >>> 'hive.version' = '3.1.3', >>> >> >>> 'hive.metastore.sasl.enabled' = 'true', >>> >> >>> 'hive.metastore.kerberos.principal' = 'hive/ >>> instanc...@hadoop.com >>> >> ', >>> >> >>> 'hadoop.security.authentication' = 'kerberos', >>> >> >>> 'hadoop.kerberos.keytab' = '/home/techuser/doris/hive.keytab', >>> >> >>> 'hadoop.kerberos.principal' = 'hive/instanc...@hadoop.com', >>> >> >>> 'yarn.resourcemanager.principal' = 'yarn/ >>> instanc...@hadoop.com' >>> >> >>> ); >>> >> >>> >>> >> >>> *2. CDP Hive Version:* >>> >> >>> >>> >> >>> The CDP Hive version I am using is: Hive 3.1.3000.7.1.7.0-551 >>> >> >>> >>> >> >>> *3. Configuration Attempt:* >>> >> >>> >>> >> >>> I have tried specifying the Hive version in my catalog >>> configuration >>> >> with >>> >> >>> the following statement: >>> >> >>> "hive.version=2.1.0" >>> >> >>> Despite attempting to set the Hive version to 2.1.0, I continue to >>> >> >>> experience the same issue, which is detailed in my previous >>> emails. >>> >> >>> >>> >> >>> *4. Logs: * >>> >> >>> I will attach both the fe.log and fe.warn logs to this email for >>> your >>> >> >>> reference. These logs should provide additional context regarding >>> the >>> >> issue >>> >> >>> I'm facing. >>> >> >>> >>> >> >>> Once again, I want to express my gratitude for your assistance and >>> >> support >>> >> >>> in resolving this matter. I look forward to your insights and >>> >> >>> recommendations based on the provided logs and catalog >>> configuration. >>> >> >>> >>> >> >>> Please feel free to let me know if you require any further >>> >> information or >>> >> >>> if there are additional steps I should take to assist in >>> diagnosing >>> >> and >>> >> >>> resolving the issue. >>> >> >>> >>> >> >>> [image: doris_error.JPG] >>> >> >>> fe.log log at hive catlog creation time >>> >> >>> [image: fe_log_at_hivecatalog.JPG] >>> >> >>> Hive managed table script >>> >> >>> [image: image.png] >>> >> >>> >>> >> >>> >>> >> >>> Best regards, >>> >> >>> Joice >>> >> >>> >>> >> >>> On Thu, Oct 5, 2023 at 8:10 AM Jiafeng.Zhang <zhang...@gmail.com> >>> >> wrote: >>> >> >>> >>> >> >>>> Can you provide your hive catalog creation statement, your fe.log >>> >> log at >>> >> >>>> that time, and your cdp hive version? This will help us locate >>> the >>> >> >>>> problem, >>> >> >>>> thank you. >>> >> >>>> You can also try specifying your hive version in your catalog >>> >> statement: >>> >> >>>> "hive.version=2.1.0" >>> >> >>>> >>> >> >>>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月5日周四 00:45写道: >>> >> >>>> >>> >> >>>> > I am reaching out once again to seek assistance and share a >>> >> specific >>> >> >>>> issue >>> >> >>>> > I've encountered while integrating Doris with Cloudera-flavored >>> >> Hive. >>> >> >>>> The >>> >> >>>> > error message I'm facing is as follows: >>> >> >>>> > detailMessage = get file split failed for table: baby1, err: >>> >> >>>> > java.lang.Exception: Hive 2.x versioned full-acid tables need >>> to >>> >> run >>> >> >>>> major >>> >> >>>> > compaction. >>> >> >>>> > >>> >> >>>> > This error message appears when I attempt to query a Hive table >>> >> from >>> >> >>>> > Doris, and it seems to be related to Hive 2.x versioned >>> full-ACID >>> >> tables >>> >> >>>> > requiring a major compaction. >>> >> >>>> > I would like to ask the Doris community for guidance on how to >>> >> handle >>> >> >>>> this >>> >> >>>> > issue effectively. Specifically, I am interested in >>> understanding >>> >> the >>> >> >>>> best >>> >> >>>> > practices and steps to follow when dealing with >>> Cloudera-flavored >>> >> Hive >>> >> >>>> > tables that require major compaction for Doris integration. >>> >> >>>> > >>> >> >>>> > If anyone in the community has successfully addressed this >>> issue >>> >> or can >>> >> >>>> > provide insights into how to configure and manage >>> >> Cloudera-flavored Hive >>> >> >>>> > tables for integration with Doris, your expertise would be >>> highly >>> >> >>>> > appreciated. >>> >> >>>> > >>> >> >>>> > Thank you for your time and support, and I look forward to >>> >> receiving >>> >> >>>> your >>> >> >>>> > valuable input. >>> >> >>>> > Joice >>> >> >>>> > >>> >> >>>> > On Wed, Oct 4, 2023 at 9:23 PM Joice Jacob < >>> >> joicejacob1...@gmail.com> >>> >> >>>> > wrote: >>> >> >>>> > >>> >> >>>> >> Dear Doris Community, >>> >> >>>> >> I have an update on the issue I previously mentioned >>> regarding the >>> >> >>>> >> integration of Hive on a CDP distribution with Doris 2.0.1.1. >>> >> >>>> >> >>> >> >>>> >> After further investigation, I have identified that the issue >>> is >>> >> >>>> related >>> >> >>>> >> to Hive managed tables being transactional by default. This >>> >> appears to >>> >> >>>> be >>> >> >>>> >> causing the error I encountered earlier. >>> >> >>>> >> >>> >> >>>> >> To address this issue, I am seeking guidance from the >>> community >>> >> on any >>> >> >>>> >> specific configurations or settings that need to be adjusted >>> for >>> >> Hive >>> >> >>>> >> transactional tables when using Hive as the catalog in Doris. >>> Are >>> >> >>>> there any >>> >> >>>> >> recommended configurations or best practices that I should >>> follow >>> >> to >>> >> >>>> ensure >>> >> >>>> >> smooth integration and query execution? >>> >> >>>> >> >>> >> >>>> >> Any insights or recommendations from the Doris community >>> would be >>> >> >>>> greatly >>> >> >>>> >> appreciated. Your expertise and guidance will be instrumental >>> in >>> >> >>>> helping me >>> >> >>>> >> resolve this challenge. >>> >> >>>> >> >>> >> >>>> >> Thank you for your continued support, and I look forward to >>> your >>> >> >>>> valuable >>> >> >>>> >> input. >>> >> >>>> >> >>> >> >>>> >> Best regards, >>> >> >>>> >> Joice >>> >> >>>> >> >>> >> >>>> >> On Wed, Oct 4, 2023 at 6:54 PM Joice Jacob < >>> >> joicejacob1...@gmail.com> >>> >> >>>> >> wrote: >>> >> >>>> >> >>> >> >>>> >>> Dear Doris Community, >>> >> >>>> >>> >>> >> >>>> >>> I am reaching out to the community to seek assistance with an >>> >> >>>> >>> integration issue I've encountered while trying to use Hive >>> on a >>> >> CDP >>> >> >>>> >>> distribution with Doris 2.0.1.1. >>> >> >>>> >>> >>> >> >>>> >>> Here are the details of my setup: >>> >> >>>> >>> >>> >> >>>> >>> Doris Version: 2.0.1.1 >>> >> >>>> >>> Hive Version: 3.1.3 >>> >> >>>> >>> Cluster Security: Kerberized >>> >> >>>> >>> >>> >> >>>> >>> I have successfully created a Hive catalog in Doris and have >>> >> been able >>> >> >>>> >>> to set up the integration between Hive and Doris. However, >>> when I >>> >> >>>> attempt >>> >> >>>> >>> to query a Hive table using Doris, I encounter the following >>> >> error: >>> >> >>>> >>> >>> >> >>>> >>> ERROR 1105 (HY000): errCode = 2, detailMessage = get file >>> split >>> >> failed >>> >> >>>> >>> for table: baby1, err: java.lang.Exception: Hive 2.x >>> versioned >>> >> >>>> full-acid >>> >> >>>> >>> tables need to run major compaction. >>> >> >>>> >>> >>> >> >>>> >>> I have already performed a major compaction as recommended, >>> but >>> >> I am >>> >> >>>> >>> still encountering the same error. >>> >> >>>> >>> >>> >> >>>> >>> I would greatly appreciate any insights, guidance, or >>> solutions >>> >> that >>> >> >>>> the >>> >> >>>> >>> Doris community can offer to help me resolve this issue. If >>> >> anyone has >>> >> >>>> >>> encountered a similar problem or has expertise in integrating >>> >> Hive >>> >> >>>> with >>> >> >>>> >>> Doris, your assistance would be invaluable. >>> >> >>>> >>> >>> >> >>>> >>> Thank you in advance for your time and support. I look >>> forward to >>> >> >>>> >>> hearing from the community and working together to find a >>> >> solution to >>> >> >>>> this >>> >> >>>> >>> challenge. >>> >> >>>> >>> >>> >> >>>> >>> [image: baby_table.JPG] >>> >> >>>> >>> [image: doris_hive_catlog_result.JPG] >>> >> >>>> >>> >>> >> >>>> >>> [image: doris_fe_log.JPG] >>> >> >>>> >>> >>> >> >>>> >>> [image: hive_metastore_error.JPG] >>> >> >>>> >>> >>> >> >>>> >>> Thanks >>> >> >>>> >>> Joice >>> >> >>>> >>> >>> >> >>>> >>> >>> >> >>>> >>> >>> >> >>>> >>> >>> >> >>>> >>> >>> >> >>>> >>> >> >>>> -- >>> >> >>>> 张家峰 >>> >> >>>> >>> >> >>> >>> >> >>> > >>> >>