instead of >= can you just try =  if you want to limit top 100 (b being a
partition  i guess it will have more that 100 records to fit into your
limit)

to improve your query performance your table file format matters as well.
Which one are you using?
how many partitions are there?
what's the size of the cluster?
you can set the number of reducers but if your query just has one key then
only one reducer will get the data and rest will run empty



On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <kwi...@keithwiley.com> wrote:

> The following query translates into a many-map-single-reduce job (which is
> common) and also slags through the reduce stage...it's killing the overall
> query:
>
> select * from a where b >= 'c' order by b desc limit 100
>
> Note that b is a partition.  What component is making the reducer heavy?
>  Is it the order by or the limit (I'm sure it's not the partition-specific
> where clause, right?)?  Are there ways to improve its performance?
>
>
> ________________________________________________________________________________
> Keith Wiley     kwi...@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> can't
> scratch. All together this implies: He scratched the itch from the scratch
> that
> itched but would never itch the scratch from the itch that scratched."
>                                            --  Keith Wiley
>
> ________________________________________________________________________________
>
>


-- 
Nitin Pawar

Reply via email to