Re: [Spark SQL] spark.sql insert overwrite on existing partition not updating hive metastore partition transient_lastddltime and column_stats

2025-05-02 Thread Sathi Chowdhury
I think it is not happening because it is a ddl time and upsert operation does not recreate the partition. It is just a dml statement.  Sent from Yahoo Mail for iPhone On Friday, May 2, 2025, 7:53 AM, Pradeep wrote: I have a partitioned hive external table as belowscala> spark.sql("describe

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-12 Thread Frank Bertsch
Amazing, looking forward to trying it out, thank you Allison! -Frank On Tue, Feb 11, 2025 at 11:16 PM Allison Wang wrote: > Hi Frank, > > I am actively working on SPARK-46057 > . You should be able > to try it out once the Spark 4.0 RC is out.

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-11 Thread Allison Wang
Hi Frank, I am actively working on SPARK-46057 . You should be able to try it out once the Spark 4.0 RC is out. Thanks, Allison On Wed, Feb 5, 2025 at 5:37 PM Reynold Xin wrote: > There's already one here https://issues.apache.org/jira/browse/

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-05 Thread Reynold Xin
There's already one here https://issues.apache.org/jira/browse/SPARK-46057 On Wed, Feb 5, 2025 at 5:16 PM Soumasish wrote: > Here I create one, https://issues.apache.org/jira/browse/SPARK-51102 > > Best Regards > Soumasish Goswami > in: www.linkedin.com/in/soumasish > # (415) 530-0405 > >-

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-05 Thread Soumasish
Here I create one, https://issues.apache.org/jira/browse/SPARK-51102 Best Regards Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On Wed, Feb 5, 2025 at 4:49 PM Frank Bertsch wrote: > Thank you Mich. > > Hi Folks, any lead on this? Just a pointer to a Jira ticket or

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-05 Thread Frank Bertsch
Thank you Mich. Hi Folks, any lead on this? Just a pointer to a Jira ticket or email discussion would be great! -Frank On Fri, Jan 31, 2025 at 10:06 AM Mich Talebzadeh wrote: > Hi Frank, > > I think this would be for the Spark dev team. I have added to the email. > > HTH > > Dr Mich Talebzadeh

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-01-31 Thread Mich Talebzadeh
Hi Frank, I think this would be for the Spark dev team. I have added to the email. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Fri, 31 Jan 2025 at 14:

Re: [Spark SQL] [DISK_ONLY Persistence] getting "this.inMemSorter" is null exception

2024-11-13 Thread Ashwani Pundir
Thanks for the response. Seems like a limitation. If resources are available then why bother about splitting the jobs in smaller durations(performance is not the concern). This issue is not about the performance optimization but rather the job is failing with null pointer exception. Do you have

Re: [Spark SQL] [DISK_ONLY Persistence] getting "this.inMemSorter" is null exception

2024-11-12 Thread Gurunandan
You should be able to split large job into more manageable jobs based on stages using checkpoint. if a job fails, Job can be restarted from the latest checkpoint, saving time and resources, thus xheckpoints can be used as recovery points. Smaller stages can be optimized independently, leading to be

Re: [Spark SQL] [DISK_ONLY Persistence] getting "this.inMemSorter" is null exception

2024-11-11 Thread Gurunandan
Hi Ashwani, Please verify input data by ensuring that the data being processed is valid and free of null values or unexpected data types. if data undergoes complex transformations before sorting review the data Transformations, verify that data transformations don't introduce inconsistencies or nul

Re: [Spark SQL]: Does Spark support processing records with timestamp NULL in stateful streaming?

2024-05-27 Thread Mich Talebzadeh
When you use applyInPandasWithState, Spark processes each input row as it arrives, regardless of whether certain columns, such as the timestamp column, contain NULL values. This behavior is useful where you want to handle incomplete or missing data gracefully within your stateful processing logic.

Re: [Spark SQL]: Source code for PartitionedFile

2024-04-11 Thread Ashley McManamon
Hi Mich, Thanks for the reply. I did come across that file but it didn't align with the appearance of `PartitionedFile`: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala In fact, the code snippet you shared also referenc

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-09 Thread Mich Talebzadeh
interesting. So below should be the corrected code with the suggestion in the [SPARK-47718] .sql() does not recognize watermark defined upstream - ASF JIRA (apache.org) # Define schema for parsing Kafka messages schema = StructType([ StructFi

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-09 Thread 刘唯
Sorry this is not a bug but essentially a user error. Spark throws a really confusing error and I'm also confused. Please see the reply in the ticket for how to make things correct. https://issues.apache.org/jira/browse/SPARK-47718 刘唯 于2024年4月6日周六 11:41写道: > This indeed looks like a bug. I will

Re: [Spark SQL]: Source code for PartitionedFile

2024-04-08 Thread Mich Talebzadeh
Hi, I believe this is the package https://raw.githubusercontent.com/apache/spark/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala And the code case class FilePartition(index: Int, files: Array[PartitionedFile]) extends Partition with InputPartition

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-06 Thread 刘唯
This indeed looks like a bug. I will take some time to look into it. Mich Talebzadeh 于2024年4月3日周三 01:55写道: > > hm. you are getting below > > AnalysisException: Append output mode not supported when there are > streaming aggregations on streaming DataFrames/DataSets without watermark; > > The pro

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Mich Talebzadeh
hm. you are getting below AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark; The problem seems to be that you are using the append output mode when writing the streaming query results to Kafka. This mode is

RE: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Chloe He
Hi Mich, Thank you so much for your response. I really appreciate your help! You mentioned "defining the watermark using the withWatermark function on the streaming_df before creating the temporary view” - I believe this is what I’m doing and it’s not working for me. Here is the exact code snip

Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-02 Thread Mich Talebzadeh
ok let us take it for a test. The original code of mine def fetch_data(self): self.sc.setLogLevel("ERROR") schema = StructType() \ .add("rowkey", StringType()) \ .add("timestamp", TimestampType()) \ .add("temperature", IntegerType()) checkpoi

Re: [ SPARK SQL ]: UPPER in WHERE condition is not working in Apache Spark 3.5.0 for Mysql ENUM Column

2023-11-07 Thread Suyash Ajmera
Any update on this? On Fri, 13 Oct, 2023, 12:56 pm Suyash Ajmera, wrote: > This issue is related to CharVarcharCodegenUtils readSidePadding method . > > Appending white spaces while reading ENUM data from mysql > > Causing issue in querying , writing the same data to Cassandra. > > On Thu, 12 O

Re: [ SPARK SQL ]: UPPER in WHERE condition is not working in Apache Spark 3.5.0 for Mysql ENUM Column

2023-10-13 Thread Suyash Ajmera
This issue is related to CharVarcharCodegenUtils readSidePadding method . Appending white spaces while reading ENUM data from mysql Causing issue in querying , writing the same data to Cassandra. On Thu, 12 Oct, 2023, 7:46 pm Suyash Ajmera, wrote: > I have upgraded my spark job from spark 3.3.

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-18 Thread Mich Talebzadeh
Yes, it sounds like it. So the broadcast DF size seems to be between 1 and 4GB. So I suggest that you leave it as it is. I have not used the standalone mode since spark-2.4.3 so I may be missing a fair bit of context here. I am sure there are others like you that are still using it! HTH Mich Ta

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
No, the driver memory was not set explicitly. So it was likely the default value, which appears to be 1GB. On Thu, Aug 17, 2023, 16:49 Mich Talebzadeh wrote: > One question, what was the driver memory before setting it to 4G? Did you > have it set at all before? > > HTH > > Mich Talebzadeh, > So

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
One question, what was the driver memory before setting it to 4G? Did you have it set at all before? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Here are my config values from spark-defaults.conf: spark.eventLog.enabled true spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs spark.history.fs.upd

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hello Paatrick, As a matter of interest what parameters and their respective values do you use in spark-submit. I assume it is running in YARN mode. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Yes, that's the sequence of events. I think the big breakthrough is that (for now at least) Spark is throwing errors instead of the queries hanging. Which is a big step forward. I can at least troubleshoot issues if I know what they are. When I reflect on the issues I faced and the solut

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hi Patrik, glad that you have managed to sort this problem out. Hopefully it will go away for good. Still we are in the dark about how this problem is going away and coming back :( As I recall the chronology of events were as follows: 1. The Issue with hanging Spark job reported 2. concur

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Everyone, I just wanted to follow up on this issue. This issue has continued since our last correspondence. Today I had a query hang and couldn't resolve the issue. I decided to upgrade my Spark install from 3.4.0 to 3.4.1. After doing so, instead of the query hanging, I got an error message th

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Mich Talebzadeh
OK I use Hive 3.1.1 My suggestion is to put your hive issues to u...@hive.apache.org and for JAVA version compatibility They will give you better info. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Patrick Tucci
I attempted to install Hive yesterday. The experience was similar to other attempts at installing Hive: it took a few hours and at the end of the process, I didn't have a working setup. The latest stable release would not run. I never discovered the cause, but similar StackOverflow questions sugges

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
OK you would not have known unless you went through the process so to speak. Let us do something revolutionary here 😁 Install hive and its metastore. You already have hadoop anyway https://cwiki.apache.org/confluence/display/hive/adminmanual+installation hive metastore https://data-flair.train

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Yes, on premise. Unfortunately after installing Delta Lake and re-writing all tables as Delta tables, the issue persists. On Sat, Aug 12, 2023 at 11:34 AM Mich Talebzadeh wrote: > ok sure. > > Is this Delta Lake going to be on-premise? > > Mich Talebzadeh, > Solutions Architect/Engineering Lead

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
ok sure. Is this Delta Lake going to be on-premise? Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Hi Mich, Thanks for the feedback. My original intention after reading your response was to stick to Hive for managing tables. Unfortunately, I'm running into another case of SQL scripts hanging. Since all tables are already Parquet, I'm out of troubleshooting options. I'm going to migrate to Delta

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Mich Talebzadeh
Hi Patrick, There is not anything wrong with Hive On-premise it is the best data warehouse there is Hive handles both ORC and Parquet formal well. They are both columnar implementations of relational model. What you are seeing is the Spark API to Hive which prefers Parquet. I found out a few year

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Patrick Tucci
Thanks for the reply Stephen and Mich. Stephen, you're right, it feels like Spark is waiting for something, but I'm not sure what. I'm the only user on the cluster and there are plenty of resources (+60 cores, +250GB RAM). I even tried restarting Hadoop, Spark and the host servers to make sure not

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Steve may have a valid point. You raised an issue with concurrent writes before, if I recall correctly. Since this limitation may be due to Hive metastore. By default Spark uses Apache Derby for its database persistence. *However it is limited to only one Spark session at any time for the purposes

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Stephen Coy
Hi Patrick, When this has happened to me in the past (admittedly via spark-submit) it has been because another job was still running and had already claimed some of the resources (cores and memory). I think this can also happen if your configuration tries to claim resources that will never be

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, I don't believe Hive is installed. I set up this cluster from scratch. I installed Hadoop and Spark by downloading them from their project websites. If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm running the Thrift server distributed with Spark, like so: ~/spa

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
sorry host is 10.0.50.1 Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all respons

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Hi Patrick That beeline on port 1 is a hive thrift server running on your hive on host 10.0.50.1:1. if you can access that host, you should be able to log into hive by typing hive. The os user is hadoop in your case and sounds like there is no password! Once inside that host, hive logs a

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, Thanks for the reply. Unfortunately I don't have Hive set up on my cluster. I can explore this if there are no other ways to troubleshoot. I'm using beeline to run commands against the Thrift server. Here's the command I use: ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:1 -n hadoop

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Can you run this sql query through hive itself? Are you using this command or similar for your thrift server? beeline -u jdbc:hive2:///1/default org.apache.hive.jdbc.HiveDriver -n hadoop -p xxx HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Link

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Mich Talebzadeh
ok so as expected the underlying database is Hive. Hive uses hdfs storage. You said you encountered limitations on concurrent writes. The order and limitations are introduced by Hive metastore so to speak. Since this is all happening through Spark, by default implementation of the Hive metastore <

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Patrick Tucci
Hi Mich and Pol, Thanks for the feedback. The database layer is Hadoop 3.3.5. The cluster restarted so I lost the stack trace in the application UI. In the snippets I saved, it looks like the exception being thrown was from Hive. Given the feedback you've provided, I suspect the issue is with how

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Pol Santamaria
Hi Patrick, You can have multiple writers simultaneously writing to the same table in HDFS by utilizing an open table format with concurrency control. Several formats, such as Apache Hudi, Apache Iceberg, Delta Lake, and Qbeast Format, offer this capability. All of them provide advanced features t

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-29 Thread Mich Talebzadeh
It is not Spark SQL that throws the error. It is the underlying Database or layer that throws the error. Spark acts as an ETL tool. What is the underlying DB where the table resides? Is concurrency supported. Please send the error to this list HTH Mich Talebzadeh, Solutions Architect/Engineeri

Re: [Spark SQL] Data objects from query history

2023-07-03 Thread Jack Wells
Hi Ruben, I’m not sure if this answers your question, but if you’re interested in exploring the underlying tables, you could always try something like the below in a Databricks notebook: display(spark.read.table(’samples.nyctaxi.trips’)) (For vanilla Spark users, it would be spark.read.table(’s

Re: Spark-Sql - Slow Performance With CTAS and Large Gzipped File

2023-06-26 Thread Mich Talebzadeh
OK, good news. You have made some progress here :) bzip (bzip2) works (splittable) because it is block-oriented whereas gzip is stream oriented. I also noticed that you are creating a managed ORC file. You can bucket and partition an ORC (Optimized Row Columnar file format. An example below: DR

Re: Spark-Sql - Slow Performance With CTAS and Large Gzipped File

2023-06-26 Thread Patrick Tucci
Hi Mich, Thanks for the reply. I started running ANALYZE TABLE on the external table, but the progress was very slow. The stage had only read about 275MB in 10 minutes. That equates to about 5.5 hours just to analyze the table. This might just be the reality of trying to process a 240m record fil

Re: Spark-Sql - Slow Performance With CTAS and Large Gzipped File

2023-06-26 Thread Mich Talebzadeh
OK for now have you analyzed statistics in Hive external table spark-sql (default)> ANALYZE TABLE test.stg_t2 COMPUTE STATISTICS FOR ALL COLUMNS; spark-sql (default)> DESC EXTENDED test.stg_t2; Hive external tables have little optimization HTH Mich Talebzadeh, Solutions Architect/Engineering

Re: Spark SQL question

2023-01-28 Thread Bjørn Jørgensen
Hi Mich. This is a Spark user group mailing list where people can ask *any* questions about spark. You know SQL and streaming, but I don't think it's necessary to start a replay with "*LOL*" to the question that's being asked. No questions are to stupid to be asked. lør. 28. jan. 2023 kl. 09:22 s

Re: Spark SQL question

2023-01-28 Thread Mich Talebzadeh
LOL First one spark-sql> select 1 as `data.group` from abc group by data.group; 1 Time taken: 0.198 seconds, Fetched 1 row(s) means that are assigning alias data.group to select and you are using that alias -> data.group in your group by statement This is equivalent to spark-sql> select 1 as

Re: [Spark SQL]: unpredictable errors: java.io.IOException: can not read class org.apache.parquet.format.PageHeader

2022-12-19 Thread Eric Hanchrow
We’ve discovered a workaround for this; it’s described here. From: Eric Hanchrow Date: Thursday, December 8, 2022 at 17:03 To: user@spark.apache.org Subject: [Spark SQL]: unpredictable errors: java.io.IOException: can not read class org.apach

RE: Re: [Spark Sql] Global Setting for Case-Insensitive String Compare

2022-11-22 Thread Patrick Tucci
Thanks. How would I go about formally submitting a feature request for this? On 2022/11/21 23:47:16 Andrew Melo wrote: > I think this is the right place, just a hard question :) As far as I > know, there's no "case insensitive flag", so YMMV > > On Mon, Nov 21, 2022 at 5:40 PM Patrick Tucci wrot

Re: [Spark Sql] Global Setting for Case-Insensitive String Compare

2022-11-21 Thread Andrew Melo
I think this is the right place, just a hard question :) As far as I know, there's no "case insensitive flag", so YMMV On Mon, Nov 21, 2022 at 5:40 PM Patrick Tucci wrote: > > Is this the wrong list for this type of question? > > On 2022/11/12 16:34:48 Patrick Tucci wrote: > > Hello, > > > > I

RE: [Spark Sql] Global Setting for Case-Insensitive String Compare

2022-11-21 Thread Patrick Tucci
Is this the wrong list for this type of question? On 2022/11/12 16:34:48 Patrick Tucci wrote: > Hello, > > Is there a way to set string comparisons to be case-insensitive globally? I > understand LOWER() can be used, but my codebase contains 27k lines of SQL > and many string comparisons. I wou

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-18 Thread Sean Owen
Taking this of list Start here: https://github.com/apache/spark/blob/70ec696bce7012b25ed6d8acec5e2f3b3e127f11/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L144 Look at subclasses of JdbcDialect too, like TeradataDialect. Note that you are using an old unsupported version, t

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
Weird, does Teradata not support LIMIT n? looking at the Spark source code suggests it won't. The syntax is "SELECT TOP"? I wonder if that's why the generic query that seems to test existence loses the LIMIT. But, that "SELECT 1" test seems to be used for MySQL, Postgres, so I'm still not sure wher

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
Hm, the existence queries even in 2.4.x had LIMIT 1. Are you sure nothing else is generating or changing those queries? On Thu, Nov 17, 2022 at 11:20 AM Ramakrishna Rayudu < ramakrishna560.ray...@gmail.com> wrote: > We are using spark 2.4.4 version. > I can see two types of queries in DB logs. >

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Ramakrishna Rayudu
We are using spark 2.4.4 version. I can see two types of queries in DB logs. SELECT 1 FROM (INPUT_QUERY) SPARK_GEN_SUB_0 SELECT * FROM (INPUT_QUERY) SPARK_GEN_SUB_0 WHERE 1=0 When we see `SELECT *` which ending up with `Where 1=0` but query starts with `SELECT 1` there is no where condition. T

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
Hm, actually that doesn't look like the queries that Spark uses to test existence, which will be "SELECT 1 ... LIMIT 1" or "SELECT * ... WHERE 1=0" depending on the dialect. What version, and are you sure something else is not sending those queries? On Thu, Nov 17, 2022 at 11:02 AM Ramakrishna Ray

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
This is a query to check the existence of the table upfront. It is nearly a no-op query; can it have a perf impact? On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu < ramakrishna560.ray...@gmail.com> wrote: > Hi Team, > > I am facing one issue. Can you please help me on this. > >

Re: EXT: Re: Spark SQL

2022-09-15 Thread Vibhor Gupta
unction, does the underlying thread get killed when a TimeoutExc... stackoverflow.com  Regards, Vibhor From: Gourav Sengupta Sent: Thursday, September 15, 2022 10:22 PM To: Mayur Benodekar Cc: user ; i...@spark.apache.org Subject: EXT: Re: Spark SQL EXTERNAL:

Re: Spark SQL

2022-09-15 Thread Gourav Sengupta
Okay, so for the problem to the solution 👍 that is powerful On Thu, 15 Sept 2022, 14:48 Mayur Benodekar, wrote: > Hi Gourav, > > It’s the way the framework is > > > Sent from my iPhone > > On Sep 15, 2022, at 02:02, Gourav Sengupta > wrote: > >  > Hi, > > Why spark and why scala? > > Regards,

Re: Spark SQL

2022-09-15 Thread Mayur Benodekar
Hi Gourav,It’s the way the framework is Sent from my iPhoneOn Sep 15, 2022, at 02:02, Gourav Sengupta wrote:Hi,Why spark and why scala? Regards,GouravOn Wed, 7 Sept 2022, 21:42 Mayur Benodekar, wrote: am new to scala and spark both .I have a code in scala which executes quier

Re: Spark SQL

2022-09-14 Thread Gourav Sengupta
Hi, Why spark and why scala? Regards, Gourav On Wed, 7 Sept 2022, 21:42 Mayur Benodekar, wrote: > am new to scala and spark both . > > I have a code in scala which executes quieres in while loop one after the > other. > > What we need to do is if a particular query takes more than a certain t

Re: [Spark SQL] Omit Create Table Statement in Spark Sql

2022-08-08 Thread pengyh
you have to saveAsTable or view to make a SQL query. As the title, does Spark Sql have a feature like Flink Catalog to omit `Create Table` statement, and write sql query directly ? - To unsubscribe e-mail: user-unsubscr...@sp

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Someshwar Kale
Hi Ram, Have you seen this stackoverflow query and response- https://stackoverflow.com/questions/39685744/apache-spark-how-to-cancel-job-in-code-and-kill-running-tasks if not, please have a look. seems to have a similar problem . *Regards,* *Someshwar Kale* On Fri, May 20, 2022 at 7:34 AM Artem

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Artemis User
WAITFOR is part of the Transact-SQL and it's Microsoft SQL server specific, not supported by Spark SQL.  If you want to impose a delay in a Spark program, you may want to use the thread sleep function in Java or Scala.  Hope this helps... On 5/19/22 1:45 PM, K. N. Ramachandran wrote: Hi Sean,

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread K. N. Ramachandran
Hi Sean, I'm trying to test a timeout feature in a tool that uses Spark SQL. Basically, if a long-running query exceeds a configured threshold, then the query should be canceled. I couldn't see a simple way to make a "sleep" SQL statement to test the timeout. Instead, I just ran a "select count(*)

Re: [Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-18 Thread Gavin Ray
Following up on this in case anyone runs across it in the archives in the future >From reading through the config docs and trying various combinations, I've discovered that: - You don't want to disable codegen. This roughly doubled the time to perform simple, few-column/few-row queries from basic

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread Sean Owen
I don't think that is standard SQL? what are you trying to do, and why not do it outside SQL? On Tue, May 17, 2022 at 6:03 PM K. N. Ramachandran wrote: > Gentle ping. Any info here would be great. > > Regards, > Ram > > On Sun, May 15, 2022 at 5:16 PM K. N. Ramachandran > wrote: > >> Hello Spar

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread K. N. Ramachandran
Gentle ping. Any info here would be great. Regards, Ram On Sun, May 15, 2022 at 5:16 PM K. N. Ramachandran wrote: > Hello Spark Users Group, > > I've just recently started working on tools that use Apache Spark. > When I try WAITFOR in the spark-sql command line, I just get: > > Error: Error ru

Re: {EXT} Re: Spark sql slowness in Spark 3.0.1

2022-04-15 Thread Anil Dasari
Hello, DF is checkpointed here. So it is written to HDFS. DF is written in paraquet format and used default parallelism. Thanks. From: wilson Date: Thursday, April 14, 2022 at 2:54 PM To: user@spark.apache.org Subject: {EXT} Re: Spark sql slowness in Spark 3.0.1 just curious, where to write

Re: Spark sql slowness in Spark 3.0.1

2022-04-14 Thread wilson
just curious, where to write? Anil Dasari wrote: We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to checkpoint data frames (intermediate data). DF write is very slow in 3.0.1 compared to 2.4.7. - To un

Re: Spark sql slowness in Spark 3.0.1

2022-04-14 Thread Sergey B.
The suggestion is to check: 1. Used format for write 2. Used parallelism On Thu, Apr 14, 2022 at 7:13 PM Anil Dasari wrote: > Hello, > > > > We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to > checkpoint data frames (intermediate data). DF write is very slow in 3.0.1 > comp

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-25 Thread Gourav Sengupta
Hi, completely agree with Alex, also if you are just writing to Cassandra then what is the purpose of writing to Kafka broker? Generally people just find it sound as if adding more components to their architecture is great, but sadly it is not. Remove the Kafka broker, incase you are not broadcas

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-25 Thread Alex Ott
You don't need to use foreachBatch to write to Cassandra. You just need to use Spark Cassandra Connector version 2.5.0 or higher - it supports native writing of stream data into Cassandra. Here is an announcement: https://www.datastax.com/blog/advanced-apache-cassandra-analytics-now-open-all gui

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-21 Thread Mich Talebzadeh
dear student, Check this article of mine in Linkedin Processing Change Data Capture with Spark Structured Streaming There is a link to GitHub

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-21 Thread Sean Owen
Looks like you are trying to apply this class/function across Spark, but it contains a non-serialized object, the connection. That has to be initialized on use, otherwise you try to send it from the driver and that can't work. On Mon, Mar 21, 2022 at 11:51 AM guillaume farcy < guillaume.fa...@imt-

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Edgar H
Oh I see now, using currentRow will give the correlation per ID within the group based on its ordering and using unbounded both will result in the overall correlation value for the whole group? El lun, 28 feb 2022 a las 16:33, Sean Owen () escribió: > The results make sense then. You want a corre

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Sean Owen
The results make sense then. You want a correlation per group right? because it's over the sums by ID within the group. Then currentRow is wrong; needs to be unbounded preceding and following. On Mon, Feb 28, 2022 at 9:22 AM Edgar H wrote: > The window is defined as you said yes, unboundedPrece

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Edgar H
The window is defined as you said yes, unboundedPreceding and currentRow ordering by orderCountSum. val initialSetWindow = Window .partitionBy("group") .orderBy("orderCountSum") .rowsBetween(Window.unboundedPreceding, Window.currentRow) I'm trying to obtain the correlation for each of the m

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Sean Owen
How are you defining the window? It looks like it's something like "rows unbounded proceeding, current" or the reverse, as the correlation varies across the elements of the group as if it's computing them on 1, then 2, then 3 elements. Don't you want the correlation across the group? otherwise this

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Edgar H
My bad completely, missed the example by a mile sorry for that, let me change a couple of things. - Got to add "id" to the initial grouping and also add more elements to the initial set; val sampleSet = Seq( ("group1", "id1", 1, 1, 6), ("group1", "id1", 4, 4, 6), ("group1", "id2", 2, 2, 5),

Re: [Spark SQL] Null when trying to use corr() with a Window

2022-02-28 Thread Sean Owen
You're computing correlations of two series of values, but each series has one value, a sum. Correlation is not defined in this case (both variances are undefined). This is sample correlation, note. On Mon, Feb 28, 2022 at 7:06 AM Edgar H wrote: > Morning all, been struggling with this for a whi

RE: Spark-SQL : Getting current user name in UDF

2022-02-22 Thread Lavelle, Shawn
Apologies, this is Spark 3.2.0. ~ Shawn From: Lavelle, Shawn Sent: Monday, February 21, 2022 5:39 PM To: 'user@spark.apache.org' Subject: Spark-SQL : Getting current user name in UDF Hello Spark Users, I have a UDF I wrote for use with Spark-SQL that performs a look up. In that look up,

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-04 Thread Kapoor, Rohit
My basic test is here - https://github.com/rohitkapoor1/sparkPushDownAggregate From: German Schiavon Date: Thursday, 4 November 2021 at 2:17 AM To: huaxin gao Cc: Kapoor, Rohit , user@spark.apache.org Subject: Re: [Spark SQL]: Aggregate Push Down / Spark 3.2 EXTERNAL MAIL: USE CAUTION BEFORE

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-04 Thread Sunil Prabhakara
Unsubscribe. On Mon, Nov 1, 2021 at 6:57 PM Kapoor, Rohit wrote: > Hi, > > > > I am testing the aggregate push down for JDBC after going through the JIRA > - https://issues.apache.org/jira/browse/SPARK-34952 > > I have the latest Spark 3.2 setup in local mode (laptop). > > > > I have PostgreSQL

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-03 Thread German Schiavon
to test the push down >> operators successfully against Postgresql using DS v2. >> >> >> >> >> >> *From: *huaxin gao >> *Date: *Tuesday, 2 November 2021 at 12:35 AM >> *To: *Kapoor, Rohit >> *Subject: *Re: [Spark SQL]: Aggregate Push Down / S

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-03 Thread huaxin gao
sday, 2 November 2021 at 12:35 AM > *To: *Kapoor, Rohit > *Subject: *Re: [Spark SQL]: Aggregate Push Down / Spark 3.2 > > > > *EXTERNAL MAIL: USE CAUTION BEFORE CLICKING LINKS OR OPENING ATTACHMENTS. > ALWAYS VERIFY THE SOURCE OF MESSAGES. * > > > > *EXTERNAL MA

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-03 Thread Kapoor, Rohit
Thanks for your guidance Huaxin. I have been able to test the push down operators successfully against Postgresql using DS v2. From: huaxin gao Date: Tuesday, 2 November 2021 at 12:35 AM To: Kapoor, Rohit Subject: Re: [Spark SQL]: Aggregate Push Down / Spark 3.2 EXTERNAL MAIL: USE CAUTION

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-01 Thread Kapoor, Rohit
Cc: user@spark.apache.org Subject: Re: [Spark SQL]: Aggregate Push Down / Spark 3.2 EXTERNAL MAIL: USE CAUTION BEFORE CLICKING LINKS OR OPENING ATTACHMENTS. ALWAYS VERIFY THE SOURCE OF MESSAGES. EXTERNAL MAIL: USE CAUTION BEFORE CLICKING LINKS OR OPENING ATTACHMENTS. ALWAYS VERIFY THE SOURCE OF

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-01 Thread huaxin gao
Hi Rohit, Thanks for testing this. Seems to me that you are using DS v1. We only support aggregate push down in DS v2. Could you please try again using DS v2 and let me know how it goes? Thanks, Huaxin On Mon, Nov 1, 2021 at 10:39 AM Chao Sun wrote: > > > -- Forwarded message -

Re: Spark-sql can replace Hive ?

2021-06-15 Thread Mich Talebzadeh
bzadeh > *Date: *Thursday, 10 June 2021 at 8:12 PM > *To: *Battula, Brahma Reddy > *Cc: *ayan guha , d...@spark.apache.org < > d...@spark.apache.org>, user@spark.apache.org > *Subject: *Re: Spark-sql can replace Hive ? > > These are different things. Spark provides a comput

Re: Spark-sql can replace Hive ?

2021-06-15 Thread Battula, Brahma Reddy
Currently I am using hive sql engine for adhoc queries. As spark-sql also supports this, I want migrate from hive. From: Mich Talebzadeh Date: Thursday, 10 June 2021 at 8:12 PM To: Battula, Brahma Reddy Cc: ayan guha , d...@spark.apache.org , user@spark.apache.org Subject: Re: Spark-sql

Re: Spark-sql can replace Hive ?

2021-06-10 Thread Mich Talebzadeh
a Reddy > *Cc: *d...@spark.apache.org , user@spark.apache.org < > user@spark.apache.org> > *Subject: *Re: Spark-sql can replace Hive ? > > Would you mind expanding the ask? Spark Sql can use hive by itaelf > > > > On Thu, 10 Jun 2021 at 8:58 pm, Battula, Brahma

Re: Spark-sql can replace Hive ?

2021-06-10 Thread Battula, Brahma Reddy
Thanks for prompt reply. I want to replace hive with spark. From: ayan guha Date: Thursday, 10 June 2021 at 4:35 PM To: Battula, Brahma Reddy Cc: d...@spark.apache.org , user@spark.apache.org Subject: Re: Spark-sql can replace Hive ? Would you mind expanding the ask? Spark Sql can use

  1   2   3   4   5   6   7   8   9   10   >