After upgrading from 1.4.1 to 1.5.1 I found some of my spark SQL queries no
longer worked. Seems to be related to using count(1) or count(*) in a
nested query. I can reproduce the issue in a pyspark shell with the sample
code below. The ‘people’ table is from spark-1.5.1-bin-hadoop2.4/
examples/
des would be helpful in debugging?
>
>
>
> On Sat, Oct 3, 2015 at 1:08 PM, Jeff Thompson <
> jeffreykeatingthomp...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm running a simple SQL query over a ~700 million row table of the form:
>>
>> SELECT * FROM
Hi,
I'm running a simple SQL query over a ~700 million row table of the form:
SELECT * FROM my_table WHERE id = '12345';
When I submit the query via beeline & the JDBC thrift server it returns in
35s
When I submit the exact same query using sparkSQL from a pyspark shell
(sqlContex.sql("SELECT *