Hm, the existence queries even in 2.4.x had LIMIT 1. Are you sure nothing else is generating or changing those queries?
On Thu, Nov 17, 2022 at 11:20 AM Ramakrishna Rayudu < ramakrishna560.ray...@gmail.com> wrote: > We are using spark 2.4.4 version. > I can see two types of queries in DB logs. > > SELECT 1 FROM (INPUT_QUERY) SPARK_GEN_SUB_0 > > SELECT * FROM (INPUT_QUERY) SPARK_GEN_SUB_0 WHERE 1=0 > > When we see `SELECT *` which ending up with `Where 1=0` but query starts > with `SELECT 1` there is no where condition. > > Thanks, > Rama > > On Thu, Nov 17, 2022, 10:39 PM Sean Owen <sro...@gmail.com> wrote: > >> Hm, actually that doesn't look like the queries that Spark uses to test >> existence, which will be "SELECT 1 ... LIMIT 1" or "SELECT * ... WHERE 1=0" >> depending on the dialect. What version, and are you sure something else is >> not sending those queries? >> >> On Thu, Nov 17, 2022 at 11:02 AM Ramakrishna Rayudu < >> ramakrishna560.ray...@gmail.com> wrote: >> >>> Hi Sean, >>> >>> Thanks for your response I think it has the performance impact because >>> if the query return one million rows then in the response It's self we will >>> one million rows unnecessarily like below. >>> >>> 1 >>> 1 >>> 1 >>> 1 >>> . >>> . >>> 1 >>> >>> >>> Its impact the performance. Can we any alternate solution for this. >>> >>> Thanks, >>> Rama >>> >>> >>> On Thu, Nov 17, 2022, 10:17 PM Sean Owen <sro...@gmail.com> wrote: >>> >>>> This is a query to check the existence of the table upfront. >>>> It is nearly a no-op query; can it have a perf impact? >>>> >>>> On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu < >>>> ramakrishna560.ray...@gmail.com> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> I am facing one issue. Can you please help me on this. >>>>> >>>>> <https://stackoverflow.com/> >>>>> >>>>> 1. >>>>> >>>>> >>>>> <https://stackoverflow.com/posts/74477662/timeline> >>>>> >>>>> We are connecting Tera data from spark SQL with below API >>>>> >>>>> Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, >>>>> connectionProperties); >>>>> >>>>> when we execute above logic on large table with million rows every time >>>>> we are seeing below >>>>> >>>>> extra query is executing every time as this resulting performance hit on >>>>> DB. >>>>> >>>>> This below information we got from DBA. We dont have any logs on SPARK >>>>> SQL. >>>>> >>>>> SELECT 1 FROM ONE_MILLION_ROWS_TABLE; >>>>> >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> >>>>> Can you please clarify why this query is executing or is there any >>>>> chance that this type of query is executing from our code it self while >>>>> check for rows count from dataframe. >>>>> >>>>> Please provide me your inputs on this. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Rama >>>>> >>>>