Hi,

I have the following format query running through Hive. The UDF is slow and
is the bottleneck for the query.

Select X FROM A WHERE Y!=0 and SlowUDF(Z)
UNION ALL
SELECT X FROM B WHERE SlowUDF(Z)

Table A is 10x larger than table B, however, only a 1% of tuples in Table A
satisfies Y!=0, thus table B has 10x more calls to the UDF than A if we
split them equally. This causes very slow running time, since mappers to
table A finished in 10 minutes while mappers to table B requires about 100
minutes.

Is there anyway to set up the split size for tables? --  I searched the web
but didn't find it.


Thank you !

Best,
Wenlei

Reply via email to