Hi Wojtek, Thanks for explaining the detail. I understand you have a larger amount of data than `hive.fetch.task.conversion.threshold`.
Taking a glance, SimpleFetchOptimizer is likely to respect LIMIT if `hive.fetch.task.caching` is disabled and all predicates are for partition pruning. The case you mentioned [1] would work as far as I tried. Could you check if your query works as expected with `hive.fetch.task.caching=false`? Looks like, HIVE-25976 [2] introduced the option and I'm not sure why Hive ignores LIMIT with it enabled. Regards, Okumin - [1] select * from tbl limit 100 - [2] https://issues.apache.org/jira/browse/HIVE-25976 On Thu, Aug 3, 2023 at 9:14 PM Wojtek Meler <wme...@wp.pl> wrote: > Disabling hive.limit.optimize.enable make situation even worse - TEZ job > scans all files in partition which is unnecessary. > > I've run debugger and discovered that partition size in my table obtained > from metastore exceed hive.fetch.task.conversion.threshold. > It seems that since hive-1.0.1 optimization process changed a lot - > earlier version used estimator to get data size for the job, now hive use > stats from metastore if they are available. > SimpleFetchOpitimizer is unaware of GlobalLimitOptimizer and input data > size reduction through partition prunning. > > Solution for me is to tune the threshold and maybe fill RFE to change > SimpleFetchOptimizer behaviour in case of GlobalLimitOptimizer qualified > the optimize that reduces input size for limit. > > Regards > Wojtek > > Dnia 06 lipca 2023 18:07 Okumin <m...@okumin.com> napisał(a): > > Hi Wojtek, > I tried to submit the query with the given configurations on > Hive 4.0.0-alpha-2 on Tez on YARN. In my environment, the query is > converted to a single fetch task. > Could you please give us the precise revision of Hive, your table > definition, the amount of data, and so on? Also, I'm curious what if you > disable `hive.limit.optimize.enable`. > Regards, > Okumin > On Wed, Jul 5, 2023 at 9:40 PM Wojtek Meler <wme...@wp.pl> wrote: > > Hi, after switching to Hive 4.0 and Tez on yarn I've noticed that simple > fetch queries run much longer. > I have following configuration: > hive.fetch.task.conversion=more > hive.fetch.task.conversion.threshold=1073741824 > hive.limit.optimize.enable=true > hive.limit.optimize.fetch.max=50000 > hive.limit.optimize.limit.file=10 > hive.limit.pushdown.memory.usage=0.1 > hive.limit.row.max.size=100000 > and query > select * from tbl limit 100 > runs on Tez containers instead of being run internaly inside hiveserver2. > How to configure fetch task conversion properly as even clicking on Hue > has bad experience with data previews run on Tez on yarn... > Regards, > Wojtek > > > >