Hi Joice, Got it, thanks very much. We will setup a cdp env and do some compatibility work.
Best regards, Kaka Joice Jacob <joicejacob1...@gmail.com> 于2023年10月7日周六 14:58写道: > Hi,Dear Doris Community, > I wanted to provide an update on my recent experience with integrating > Doris with both Apache Hive and CDP Hive, as well as seek insights into a > particular observation. > > In my integration with Apache Hive, I noticed that transaction tables > retain the "_orc_acid_version" file, which aligns with the expected > behavior. However, when I integrated with CDP Hive, I observed that this > "_orc_acid_version" file was conspicuously absent. > > This discrepancy in behavior between the two environments has left me > somewhat perplexed, and I'm keen to understand the underlying reasons > behind it. It appears that in CDP Hive, *managed tables are set to be > transactional by default*. Interestingly, when we create managed tables > with the TBLPROPERTIES("transactional"="false") option, they are seemingly > *treated > as external tables*. > > I've attached screenshots to illustrate this scenario for further clarity. > CREATE TABLE BABY2( > id int, > FNAME VARCHAR(50), > GENDER VARCHAR(2), > TOTCOUNT INT > ) > PARTITIONED BY(DATA_ID INT) > TBLPROPERTIES ( > 'transactional'='false', > 'orc.compress'='snappy'); > > if we execute above statement table create like this. > [image: image.png] > > I would greatly appreciate any insights, suggestions, or explanations that > the Doris community may have regarding this behavior. Your expertise will > be invaluable in helping me navigate this integration and address any > associated challenges. > > Thank you in advance for your time and support. I look forward to hearing > from you and to collaborating with the community to better understand and > resolve this matter. > > Best regards, > joice > > On Sat, Oct 7, 2023 at 11:46 AM kaka chen <kaka11.c...@gmail.com> wrote: > >> Hi Joice: >> Thanks your reporting. >> >> It seems the root cause of this issue is missing "_orc_acid_version" file. >> From Hive version >= 3.0, delta/base files will always have file >> '_orc_acid_version' with value >= '2'. >> Maybe the hive3 of HDP has similar issue? >> https://issues.apache.org/jira/browse/HIVE-16964 >> >> A workaround is try to create table without transactional props. >> TBLPROPERTIES("transactional"="true") >> pls try it, thanks. >> >> Best regards, >> Kaka >> >> >> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月6日周五 22:58写道: >> >> > Hi, >> > I've checked the Hive data directory, and I couldn't find the >> > "_orc_acid_version" file. I have attached screenshots for your >> reference. >> > >> > Are there any recommended workarounds or alternative approaches that I >> can >> > consider to resolve this issue? I'm open to exploring different >> solutions >> > to ensure the successful integration of Doris with Hive in CDP. Any >> > guidance or suggestions would be greatly appreciated. >> > >> > Screenshots attached for your reference. >> > >> > Best regards, >> > Joice >> > >> > [image: image.png] >> > >> > On Fri, Oct 6, 2023 at 6:28 PM Mingyu Chen <morning...@163.com> wrote: >> > >> >> Could you list your hive data dir, to see what files exist? like: >> >> _orc_acid_version >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Best Regards >> >> Mingyu Chen >> >> >> >> Email: >> >> morning...@apache.org >> >> >> >> >> >> >> >> >> >> >> >> At 2023-10-05 22:31:58, "Mingyu Chen" <morning...@163.com> wrote: >> >> >Oh, I see, let me check it again. >> >> > >> >> > >> >> > >> >> > >> >> >-- >> >> > >> >> >Best Regards >> >> >Mingyu Chen >> >> > >> >> >Email: >> >> >morning...@apache.org >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >At 2023-10-05 21:49:28, "Joice Jacob" <joicejacob1...@gmail.com> >> wrote: >> >> >>Hi, >> >> >>I wanted to share some important information regarding the CDP Hive >> >> version >> >> >>that I am currently using, which is Hive 3.1.3000.7.1.7.0-551. >> >> >> >> >> >>In CDP Hive version 3.x, a significant change has been introduced >> >> regarding >> >> >>managed tables. By default, managed tables in Hive 3.x are considered >> >> >>transactional. This means that if you create a managed table >> explicitly >> >> >>specifying it as transactional with a value of "false," it will be >> >> treated >> >> >>as an external table instead. >> >> >>Thank you for your attention to this matter, and I appreciate your >> >> >>continued support. >> >> >> >> >> >> https://www.thecodersstop.com/hadoop/apache-hive-3-changes-in-cdp-upgrade-part-1/ >> >> >>Best regards, >> >> >>Joice >> >> >> >> >> >>On Thu, Oct 5, 2023 at 7:01 PM Mingyu Chen <morning...@163.com> >> wrote: >> >> >> >> >> >>> I saw that your hive table is with property "transactional" = >> "true", >> >> >>> And Doris only support ACID table with Hive 3, not support with >> Hive >> >> 2.x. >> >> >>> So you may need to create non-transactional hive table for Doris to >> >> visit. >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Best Regards >> >> >>> Mingyu Chen >> >> >>> >> >> >>> Email: >> >> >>> morning...@apache.org >> >> >>> >> >> >>> >> >> >>> 在 2023-10-05 13:30:25,"Joice Jacob" <joicejacob1...@gmail.com> 写道: >> >> >>> >> >> >>> Dear Doris Community & Jiafeng.Zhang >> >> >>> , >> >> >>> Thank you for your prompt response and your willingness to assist >> >> with the >> >> >>> issue I've been encountering with the integration of Doris and >> >> Cloudera's >> >> >>> Hive. >> >> >>> >> >> >>> To provide you with the necessary information, here are the details >> >> you >> >> >>> requested: >> >> >>> >> >> >>> *1. Hive Catalog Creation Statement:* >> >> >>> >> >> >>> My Hive catalog creation statement is as follows: >> >> >>> >> >> >>> CREATE CATALOG hive PROPERTIES ( >> >> >>> 'type'='hms', >> >> >>> 'hive.metastore.uris' = 'thrift://10.128.0.4:9083', >> >> >>> 'hive.version' = '3.1.3', >> >> >>> 'hive.metastore.sasl.enabled' = 'true', >> >> >>> 'hive.metastore.kerberos.principal' = 'hive/ >> instanc...@hadoop.com >> >> ', >> >> >>> 'hadoop.security.authentication' = 'kerberos', >> >> >>> 'hadoop.kerberos.keytab' = '/home/techuser/doris/hive.keytab', >> >> >>> 'hadoop.kerberos.principal' = 'hive/instanc...@hadoop.com', >> >> >>> 'yarn.resourcemanager.principal' = 'yarn/instanc...@hadoop.com >> ' >> >> >>> ); >> >> >>> >> >> >>> *2. CDP Hive Version:* >> >> >>> >> >> >>> The CDP Hive version I am using is: Hive 3.1.3000.7.1.7.0-551 >> >> >>> >> >> >>> *3. Configuration Attempt:* >> >> >>> >> >> >>> I have tried specifying the Hive version in my catalog >> configuration >> >> with >> >> >>> the following statement: >> >> >>> "hive.version=2.1.0" >> >> >>> Despite attempting to set the Hive version to 2.1.0, I continue to >> >> >>> experience the same issue, which is detailed in my previous emails. >> >> >>> >> >> >>> *4. Logs: * >> >> >>> I will attach both the fe.log and fe.warn logs to this email for >> your >> >> >>> reference. These logs should provide additional context regarding >> the >> >> issue >> >> >>> I'm facing. >> >> >>> >> >> >>> Once again, I want to express my gratitude for your assistance and >> >> support >> >> >>> in resolving this matter. I look forward to your insights and >> >> >>> recommendations based on the provided logs and catalog >> configuration. >> >> >>> >> >> >>> Please feel free to let me know if you require any further >> >> information or >> >> >>> if there are additional steps I should take to assist in diagnosing >> >> and >> >> >>> resolving the issue. >> >> >>> >> >> >>> [image: doris_error.JPG] >> >> >>> fe.log log at hive catlog creation time >> >> >>> [image: fe_log_at_hivecatalog.JPG] >> >> >>> Hive managed table script >> >> >>> [image: image.png] >> >> >>> >> >> >>> >> >> >>> Best regards, >> >> >>> Joice >> >> >>> >> >> >>> On Thu, Oct 5, 2023 at 8:10 AM Jiafeng.Zhang <zhang...@gmail.com> >> >> wrote: >> >> >>> >> >> >>>> Can you provide your hive catalog creation statement, your fe.log >> >> log at >> >> >>>> that time, and your cdp hive version? This will help us locate the >> >> >>>> problem, >> >> >>>> thank you. >> >> >>>> You can also try specifying your hive version in your catalog >> >> statement: >> >> >>>> "hive.version=2.1.0" >> >> >>>> >> >> >>>> Joice Jacob <joicejacob1...@gmail.com> 于2023年10月5日周四 00:45写道: >> >> >>>> >> >> >>>> > I am reaching out once again to seek assistance and share a >> >> specific >> >> >>>> issue >> >> >>>> > I've encountered while integrating Doris with Cloudera-flavored >> >> Hive. >> >> >>>> The >> >> >>>> > error message I'm facing is as follows: >> >> >>>> > detailMessage = get file split failed for table: baby1, err: >> >> >>>> > java.lang.Exception: Hive 2.x versioned full-acid tables need to >> >> run >> >> >>>> major >> >> >>>> > compaction. >> >> >>>> > >> >> >>>> > This error message appears when I attempt to query a Hive table >> >> from >> >> >>>> > Doris, and it seems to be related to Hive 2.x versioned >> full-ACID >> >> tables >> >> >>>> > requiring a major compaction. >> >> >>>> > I would like to ask the Doris community for guidance on how to >> >> handle >> >> >>>> this >> >> >>>> > issue effectively. Specifically, I am interested in >> understanding >> >> the >> >> >>>> best >> >> >>>> > practices and steps to follow when dealing with >> Cloudera-flavored >> >> Hive >> >> >>>> > tables that require major compaction for Doris integration. >> >> >>>> > >> >> >>>> > If anyone in the community has successfully addressed this issue >> >> or can >> >> >>>> > provide insights into how to configure and manage >> >> Cloudera-flavored Hive >> >> >>>> > tables for integration with Doris, your expertise would be >> highly >> >> >>>> > appreciated. >> >> >>>> > >> >> >>>> > Thank you for your time and support, and I look forward to >> >> receiving >> >> >>>> your >> >> >>>> > valuable input. >> >> >>>> > Joice >> >> >>>> > >> >> >>>> > On Wed, Oct 4, 2023 at 9:23 PM Joice Jacob < >> >> joicejacob1...@gmail.com> >> >> >>>> > wrote: >> >> >>>> > >> >> >>>> >> Dear Doris Community, >> >> >>>> >> I have an update on the issue I previously mentioned regarding >> the >> >> >>>> >> integration of Hive on a CDP distribution with Doris 2.0.1.1. >> >> >>>> >> >> >> >>>> >> After further investigation, I have identified that the issue >> is >> >> >>>> related >> >> >>>> >> to Hive managed tables being transactional by default. This >> >> appears to >> >> >>>> be >> >> >>>> >> causing the error I encountered earlier. >> >> >>>> >> >> >> >>>> >> To address this issue, I am seeking guidance from the community >> >> on any >> >> >>>> >> specific configurations or settings that need to be adjusted >> for >> >> Hive >> >> >>>> >> transactional tables when using Hive as the catalog in Doris. >> Are >> >> >>>> there any >> >> >>>> >> recommended configurations or best practices that I should >> follow >> >> to >> >> >>>> ensure >> >> >>>> >> smooth integration and query execution? >> >> >>>> >> >> >> >>>> >> Any insights or recommendations from the Doris community would >> be >> >> >>>> greatly >> >> >>>> >> appreciated. Your expertise and guidance will be instrumental >> in >> >> >>>> helping me >> >> >>>> >> resolve this challenge. >> >> >>>> >> >> >> >>>> >> Thank you for your continued support, and I look forward to >> your >> >> >>>> valuable >> >> >>>> >> input. >> >> >>>> >> >> >> >>>> >> Best regards, >> >> >>>> >> Joice >> >> >>>> >> >> >> >>>> >> On Wed, Oct 4, 2023 at 6:54 PM Joice Jacob < >> >> joicejacob1...@gmail.com> >> >> >>>> >> wrote: >> >> >>>> >> >> >> >>>> >>> Dear Doris Community, >> >> >>>> >>> >> >> >>>> >>> I am reaching out to the community to seek assistance with an >> >> >>>> >>> integration issue I've encountered while trying to use Hive >> on a >> >> CDP >> >> >>>> >>> distribution with Doris 2.0.1.1. >> >> >>>> >>> >> >> >>>> >>> Here are the details of my setup: >> >> >>>> >>> >> >> >>>> >>> Doris Version: 2.0.1.1 >> >> >>>> >>> Hive Version: 3.1.3 >> >> >>>> >>> Cluster Security: Kerberized >> >> >>>> >>> >> >> >>>> >>> I have successfully created a Hive catalog in Doris and have >> >> been able >> >> >>>> >>> to set up the integration between Hive and Doris. However, >> when I >> >> >>>> attempt >> >> >>>> >>> to query a Hive table using Doris, I encounter the following >> >> error: >> >> >>>> >>> >> >> >>>> >>> ERROR 1105 (HY000): errCode = 2, detailMessage = get file >> split >> >> failed >> >> >>>> >>> for table: baby1, err: java.lang.Exception: Hive 2.x versioned >> >> >>>> full-acid >> >> >>>> >>> tables need to run major compaction. >> >> >>>> >>> >> >> >>>> >>> I have already performed a major compaction as recommended, >> but >> >> I am >> >> >>>> >>> still encountering the same error. >> >> >>>> >>> >> >> >>>> >>> I would greatly appreciate any insights, guidance, or >> solutions >> >> that >> >> >>>> the >> >> >>>> >>> Doris community can offer to help me resolve this issue. If >> >> anyone has >> >> >>>> >>> encountered a similar problem or has expertise in integrating >> >> Hive >> >> >>>> with >> >> >>>> >>> Doris, your assistance would be invaluable. >> >> >>>> >>> >> >> >>>> >>> Thank you in advance for your time and support. I look >> forward to >> >> >>>> >>> hearing from the community and working together to find a >> >> solution to >> >> >>>> this >> >> >>>> >>> challenge. >> >> >>>> >>> >> >> >>>> >>> [image: baby_table.JPG] >> >> >>>> >>> [image: doris_hive_catlog_result.JPG] >> >> >>>> >>> >> >> >>>> >>> [image: doris_fe_log.JPG] >> >> >>>> >>> >> >> >>>> >>> [image: hive_metastore_error.JPG] >> >> >>>> >>> >> >> >>>> >>> Thanks >> >> >>>> >>> Joice >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >> >> >>>> -- >> >> >>>> 张家峰 >> >> >>>> >> >> >>> >> >> >> > >> >