Hm, the existence queries even in 2.4.x had LIMIT 1. Are you sure nothing
else is generating or changing those queries?

On Thu, Nov 17, 2022 at 11:20 AM Ramakrishna Rayudu <
ramakrishna560.ray...@gmail.com> wrote:

> We are using spark 2.4.4 version.
> I can see two types of queries in DB logs.
>
> SELECT 1 FROM (INPUT_QUERY) SPARK_GEN_SUB_0
>
> SELECT * FROM (INPUT_QUERY) SPARK_GEN_SUB_0 WHERE 1=0
>
> When we see `SELECT *` which ending up with `Where 1=0`  but query starts
> with `SELECT 1` there is no where condition.
>
> Thanks,
> Rama
>
> On Thu, Nov 17, 2022, 10:39 PM Sean Owen <sro...@gmail.com> wrote:
>
>> Hm, actually that doesn't look like the queries that Spark uses to test
>> existence, which will be "SELECT 1 ... LIMIT 1" or "SELECT * ... WHERE 1=0"
>> depending on the dialect. What version, and are you sure something else is
>> not sending those queries?
>>
>> On Thu, Nov 17, 2022 at 11:02 AM Ramakrishna Rayudu <
>> ramakrishna560.ray...@gmail.com> wrote:
>>
>>> Hi Sean,
>>>
>>> Thanks for your response I think it has the performance impact because
>>> if the query return one million rows then in the response It's self we will
>>> one million rows unnecessarily like below.
>>>
>>> 1
>>> 1
>>> 1
>>> 1
>>> .
>>> .
>>> 1
>>>
>>>
>>> Its impact the performance. Can we any alternate solution for this.
>>>
>>> Thanks,
>>> Rama
>>>
>>>
>>> On Thu, Nov 17, 2022, 10:17 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> This is a query to check the existence of the table upfront.
>>>> It is nearly a no-op query; can it have a perf impact?
>>>>
>>>> On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu <
>>>> ramakrishna560.ray...@gmail.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I am facing one issue. Can you please help me on this.
>>>>>
>>>>> <https://stackoverflow.com/>
>>>>>
>>>>>    1.
>>>>>
>>>>>
>>>>> <https://stackoverflow.com/posts/74477662/timeline>
>>>>>
>>>>> We are connecting Tera data from spark SQL with below API
>>>>>
>>>>> Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
>>>>> connectionProperties);
>>>>>
>>>>> when we execute above logic on large table with million rows every time 
>>>>> we are seeing below
>>>>>
>>>>> extra query is executing every time as this resulting performance hit on 
>>>>> DB.
>>>>>
>>>>> This below information we got from DBA. We dont have any logs on SPARK
>>>>> SQL.
>>>>>
>>>>> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
>>>>>
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>>
>>>>> Can you please clarify why this query is executing or is there any
>>>>> chance that this type of query is executing from our code it self while
>>>>> check for rows count from dataframe.
>>>>>
>>>>> Please provide me your inputs on this.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rama
>>>>>
>>>>

Reply via email to