I set hiveContext.setConf("spark.sql.orc.filterPushdown", "true"). But
from the log No ORC pushdown predicate for my query with WHERE clause.
15/10/09 19:16:01 DEBUG OrcInputFormat: No ORC pushdown predicate
I did not understand what wrong with this.
BR,
Patcharee
On 09. okt. 2015 19:10, Zhan Zhang wrote:
In your case, you manually set an AND pushdown, and the predicate is
right based on your setting, : leaf-0 = (EQUALS x 320)
The right way is to enable the predicate pushdown as follows.
sqlContext.setConf("spark.sql.orc.filterPushdown", "true”)
Thanks.
Zhan Zhang
On Oct 9, 2015, at 9:58 AM, patcharee <patcharee.thong...@uni.no
<mailto:patcharee.thong...@uni.no>> wrote:
Hi Zhan Zhang
Actually my query has WHERE clause "select date, month, year, hh,
(u*0.9122461 - v*-0.40964267), (v*0.9122461 + u*-0.40964267), z from
4D where x = 320 and y = 117 and zone == 2 and year=2009 and z >= 2
and z <= 8", column "x", "y" is not partition column, the others are
partition columns. I expected the system will use predicate pushdown.
I turned on the debug and found pushdown predicate was not generated
("DEBUG OrcInputFormat: No ORC pushdown predicate")
Then I tried to set the search argument explicitly (on the column "x"
which is not partition column)
val xs = SearchArgumentFactory.newBuilder().startAnd().equals("x",
320).end().build()
hiveContext.setConf("hive.io.file.readcolumn.names", "x")
hiveContext.setConf("sarg.pushdown", xs.toKryo())
this time in the log pushdown predicate was generated but results was
wrong (no results at all)
15/10/09 18:36:06 INFO OrcInputFormat: ORC pushdown predicate: leaf-0
= (EQUALS x 320)
expr = leaf-0
Any ideas What wrong with this? Why the ORC pushdown predicate is not
applied by the system?
BR,
Patcharee
On 09. okt. 2015 18:31, Zhan Zhang wrote:
Hi Patcharee,
>From the query, it looks like only the column pruning will be
applied. Partition pruning and predicate pushdown does not have
effect. Do you see big IO difference between two methods?
The potential reason of the speed difference I can think of may be
the different versions of OrcInputFormat. The hive path may use
NewOrcInputFormat, but the spark path use OrcInputFormat.
Thanks.
Zhan Zhang
On Oct 8, 2015, at 11:55 PM, patcharee <patcharee.thong...@uni.no
<mailto:patcharee.thong...@uni.no>> wrote:
Yes, the predicate pushdown is enabled, but still take longer time
than the first method
BR,
Patcharee
On 08. okt. 2015 18:43, Zhan Zhang wrote:
Hi Patcharee,
Did you enable the predicate pushdown in the second method?
Thanks.
Zhan Zhang
On Oct 8, 2015, at 1:43 AM, patcharee <patcharee.thong...@uni.no
<mailto:patcharee.thong...@uni.no>> wrote:
Hi,
I am using spark sql 1.5 to query a hive table stored as
partitioned orc file. We have the total files is about 6000 files
and each file size is about 245MB.
What is the difference between these two query methods below:
1. Using query on hive table directly
hiveContext.sql("select col1, col2 from table1")
2. Reading from orc file, register temp table and query from the
temp table
val c =
hiveContext.read.format("orc").load("/apps/hive/warehouse/table1")
c.registerTempTable("regTable")
hiveContext.sql("select col1, col2 from regTable")
When the number of files is large (query all from the total 6000
files) , the second case is much slower then the first one. Any
ideas why?
BR,
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>