Hi Tan,
It depends on how data organise and what your filter is.
For example in my case: I store data by partition by field time and network_id. 
If I filter by time or network_id or both and with other field Spark only load 
part of time and network in filter then filter the rest.



> On Jul 7, 2016, at 4:43 PM, Ted Yu <[email protected]> wrote:
> 
> Does the filter under consideration operate on sorted column(s) ?
> 
> Cheers
> 
>> On Jul 7, 2016, at 2:25 AM, tan shai <[email protected]> wrote:
>> 
>> Hi, 
>> 
>> I have a sorted dataframe, I need to optimize the filter operations.
>> How does Spark performs filter operations on sorted dataframe? 
>> 
>> It is scanning all the data? 
>> 
>> Many thanks. 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Reply via email to