[
https://issues.apache.org/jira/browse/HUDI-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Selvaraj periyasamy updated HUDI-1232:
--------------------------------------
Summary: Caching takes lot of time (was: Caching take lot of time)
> Caching takes lot of time
> -------------------------
>
> Key: HUDI-1232
> URL: https://issues.apache.org/jira/browse/HUDI-1232
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Selvaraj periyasamy
> Priority: Major
> Attachments: log1.txt
>
>
> I am using Hudi 0.5.0.
>
> Issue 1) I have source 15 different transaction source tables written using
> COPY_ON_WRITE and all of them are partitioned by transaction_day,
> transaction_hour. Eat one of them are having more than 6 month data. I need
> to join all of them and few of the tables would have to looked back up to 6
> months for join condition. When I do the join and execute, I see the
> attached logs rolling for longer time. Say example for one table to list all
> the files, it takes more than 5 mins for listing down the files before
> initiating join. and it is happening in sequential flow. Wen I had to join 6
> months data from two different tables, it easily takes more than 10 mins to
> list the file. I have grepped some the lines and attached. File name is
> log1.txt .
>
> Issue 2) After above joining all 15 tables , am writing then into another
> huh target table called trr. The other slowness I am seeing is that, as the
> number of partitions is growing in trr table, I see below logs rolling for
> all individual partitions even though my write is on only couple of
> partitions and it takes unto 4 to 5 mins. I pasted only few of them alone.
> I am wondering , in future , I would ave 3 years worth of data, and write
> will be very slow every time I write into only couple of partitions.
>
> 20/08/27 02:08:22 INFO HoodieTableConfig: Loading dataset properties from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/.hoodie/hoodie.properties
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Finished Loading Table of type
> COPY_ON_WRITE from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO HoodieActiveTimeline: Loaded instants
> java.util.stream.ReferencePipeline$Head@fed0a8b
> 20/08/27 02:08:22 INFO HoodieTableFileSystemView: Adding file-groups for
> partition :20200714/01, #FileGroups=1
> 20/08/27 02:08:22 INFO AbstractTableFileSystemView: addFilesToView:
> NumFiles=4, FileGroupsCreationTime=0, StoreTimeTaken=1
> 20/08/27 02:08:22 INFO HoodieROTablePathFilter: Based on hoodie metadata from
> base path: hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr,
> caching 1 files under
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/20200714/01
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
> from hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO FSUtils: Hadoop Configuration: fs.defaultFS:
> [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, m
> apred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
> hdfs-site.xml], FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-778362260_1, ugi=svchdc36q@V
> ISA.COM (auth:KERBEROS)]]]
> 20/08/27 02:08:22 INFO HoodieTableConfig: Loading dataset properties from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/.hoodie/hoodie.properties
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Finished Loading Table of type
> COPY_ON_WRITE from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO HoodieActiveTimeline: Loaded instants
> java.util.stream.ReferencePipeline$Head@285c67a9
> 20/08/27 02:08:22 INFO HoodieTableFileSystemView: Adding file-groups for
> partition :20200714/02, #FileGroups=1
> 20/08/27 02:08:22 INFO AbstractTableFileSystemView: addFilesToView:
> NumFiles=4, FileGroupsCreationTime=0, StoreTimeTaken=0
> 20/08/27 02:08:22 INFO HoodieROTablePathFilter: Based on hoodie metadata from
> base path: hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr,
> caching 1 files under
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/20200714/02
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
> from hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO FSUtils: Hadoop Configuration: fs.defaultFS:
> [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, m
> apred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
> hdfs-site.xml], FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-778362260_1, ugi=svchdc36q@V
> ISA.COM (auth:KERBEROS)]]]
> 20/08/27 02:08:22 INFO HoodieTableConfig: Loading dataset properties from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/.hoodie/hoodie.properties
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Finished Loading Table of type
> COPY_ON_WRITE from
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO HoodieActiveTimeline: Loaded instants
> java.util.stream.ReferencePipeline$Head@2edd9c8
> 20/08/27 02:08:22 INFO HoodieTableFileSystemView: Adding file-groups for
> partition :20200714/03, #FileGroups=1
> 20/08/27 02:08:22 INFO AbstractTableFileSystemView: addFilesToView:
> NumFiles=4, FileGroupsCreationTime=1, StoreTimeTaken=0
> 20/08/27 02:08:22 INFO HoodieROTablePathFilter: Based on hoodie metadata from
> base path: hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr,
> caching 1 files under
> hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr/20200714/03
> 20/08/27 02:08:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
> from hdfs://oprhqanameservice/projects/cdp/data/cdp_reporting/trr
> 20/08/27 02:08:22 INFO FSUtils: Hadoop Configuration: fs.defaultFS:
> [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml,
> yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-778362260_1,
> [email protected] (auth:KERBEROS)]]]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)