Hi, I have the following format query running through Hive. The UDF is slow and is the bottleneck for the query.
Select X FROM A WHERE Y!=0 and SlowUDF(Z) UNION ALL SELECT X FROM B WHERE SlowUDF(Z) Table A is 10x larger than table B, however, only a 1% of tuples in Table A satisfies Y!=0, thus table B has 10x more calls to the UDF than A if we split them equally. This causes very slow running time, since mappers to table A finished in 10 minutes while mappers to table B requires about 100 minutes. Is there anyway to set up the split size for tables? -- I searched the web but didn't find it. Thank you ! Best, Wenlei