Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-18 Thread Mich Talebzadeh
Yes, it sounds like it. So the broadcast DF size seems to be between 1 and 4GB. So I suggest that you leave it as it is. I have not used the standalone mode since spark-2.4.3 so I may be missing a fair bit of context here. I am sure there are others like you that are still using it! HTH Mich Ta

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
No, the driver memory was not set explicitly. So it was likely the default value, which appears to be 1GB. On Thu, Aug 17, 2023, 16:49 Mich Talebzadeh wrote: > One question, what was the driver memory before setting it to 4G? Did you > have it set at all before? > > HTH > > Mich Talebzadeh, > So

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
One question, what was the driver memory before setting it to 4G? Did you have it set at all before? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Here are my config values from spark-defaults.conf: spark.eventLog.enabled true spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs spark.history.fs.upd

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hello Paatrick, As a matter of interest what parameters and their respective values do you use in spark-submit. I assume it is running in YARN mode. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Yes, that's the sequence of events. I think the big breakthrough is that (for now at least) Spark is throwing errors instead of the queries hanging. Which is a big step forward. I can at least troubleshoot issues if I know what they are. When I reflect on the issues I faced and the solut

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hi Patrik, glad that you have managed to sort this problem out. Hopefully it will go away for good. Still we are in the dark about how this problem is going away and coming back :( As I recall the chronology of events were as follows: 1. The Issue with hanging Spark job reported 2. concur

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Everyone, I just wanted to follow up on this issue. This issue has continued since our last correspondence. Today I had a query hang and couldn't resolve the issue. I decided to upgrade my Spark install from 3.4.0 to 3.4.1. After doing so, instead of the query hanging, I got an error message th

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Mich Talebzadeh
OK I use Hive 3.1.1 My suggestion is to put your hive issues to u...@hive.apache.org and for JAVA version compatibility They will give you better info. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Patrick Tucci
I attempted to install Hive yesterday. The experience was similar to other attempts at installing Hive: it took a few hours and at the end of the process, I didn't have a working setup. The latest stable release would not run. I never discovered the cause, but similar StackOverflow questions sugges

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
OK you would not have known unless you went through the process so to speak. Let us do something revolutionary here 😁 Install hive and its metastore. You already have hadoop anyway https://cwiki.apache.org/confluence/display/hive/adminmanual+installation hive metastore https://data-flair.train

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Yes, on premise. Unfortunately after installing Delta Lake and re-writing all tables as Delta tables, the issue persists. On Sat, Aug 12, 2023 at 11:34 AM Mich Talebzadeh wrote: > ok sure. > > Is this Delta Lake going to be on-premise? > > Mich Talebzadeh, > Solutions Architect/Engineering Lead

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
ok sure. Is this Delta Lake going to be on-premise? Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Hi Mich, Thanks for the feedback. My original intention after reading your response was to stick to Hive for managing tables. Unfortunately, I'm running into another case of SQL scripts hanging. Since all tables are already Parquet, I'm out of troubleshooting options. I'm going to migrate to Delta

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Mich Talebzadeh
Hi Patrick, There is not anything wrong with Hive On-premise it is the best data warehouse there is Hive handles both ORC and Parquet formal well. They are both columnar implementations of relational model. What you are seeing is the Spark API to Hive which prefers Parquet. I found out a few year

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Patrick Tucci
Thanks for the reply Stephen and Mich. Stephen, you're right, it feels like Spark is waiting for something, but I'm not sure what. I'm the only user on the cluster and there are plenty of resources (+60 cores, +250GB RAM). I even tried restarting Hadoop, Spark and the host servers to make sure not

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Steve may have a valid point. You raised an issue with concurrent writes before, if I recall correctly. Since this limitation may be due to Hive metastore. By default Spark uses Apache Derby for its database persistence. *However it is limited to only one Spark session at any time for the purposes

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Stephen Coy
Hi Patrick, When this has happened to me in the past (admittedly via spark-submit) it has been because another job was still running and had already claimed some of the resources (cores and memory). I think this can also happen if your configuration tries to claim resources that will never be

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, I don't believe Hive is installed. I set up this cluster from scratch. I installed Hadoop and Spark by downloading them from their project websites. If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm running the Thrift server distributed with Spark, like so: ~/spa

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
sorry host is 10.0.50.1 Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all respons

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Hi Patrick That beeline on port 1 is a hive thrift server running on your hive on host 10.0.50.1:1. if you can access that host, you should be able to log into hive by typing hive. The os user is hadoop in your case and sounds like there is no password! Once inside that host, hive logs a

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, Thanks for the reply. Unfortunately I don't have Hive set up on my cluster. I can explore this if there are no other ways to troubleshoot. I'm using beeline to run commands against the Thrift server. Here's the command I use: ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:1 -n hadoop

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Can you run this sql query through hive itself? Are you using this command or similar for your thrift server? beeline -u jdbc:hive2:///1/default org.apache.hive.jdbc.HiveDriver -n hadoop -p xxx HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Link

Re: Spark SQL query

2021-02-03 Thread Mich Talebzadeh
I suggest one thing you can do is to open another thread for this feature request "Having functionality in Spark to allow queries to be gathered and analyzed" and see what forum responds to it. HTH LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8P

Re: Spark SQL query

2021-02-03 Thread Arpan Bhandari
Yes Mich, Mapping the spark sql query that got executed corresponding to an application Id on yarn would greatly help in analyzing and debugging the query for any potential problems. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Spark SQL query

2021-02-03 Thread Mich Talebzadeh
I gather what you are after is a code sniffer for Spark that provides a form of GUI to get the code that applications run against spark. I don't think Spark has this type of plug-in although it would be potentially useful. Some RDBMS provide this. Usually stored on some form of persistent storage

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Mich, The directory is already there and event logs are getting generated, I have checked them it contains the query plan but not the actual query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --

Re: Spark SQL query

2021-02-02 Thread Mich Talebzadeh
create a directory in hdfs hdfs dfs -mkdir /spark_event_logs modify file $SPARK_HOME/conf/spark-defaults.conf and add these two lines spark.eventLog.enabled=true # do not use quotes below spark.eventLog.dir=hdfs://rhes75:9000/spark_event_logs Then run a job and check it hdfs dfs -ls /spark_eve

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Yes i can see the jobs on 8088 and also on the spark history url. spark history server is showing up the plan details on the sql tab but not giving the query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Hi Mich, I do see the .scala_history directory, but it contains all the queries which got executed uptill now, but if i have to map a specific query to an application Id in yarn that would not correlate, hence this method alone won't suffice Thanks, Arpan Bhandari -- Sent from: http://apache

Re: Spark SQL query

2021-02-02 Thread Mich Talebzadeh
Hi Arpan. I believe all applications including spark and scala create a hidden history file You can go to home directory cd # see list of all hidden files ls -a | egrep '^\.' If you are using scala do you see .scala_history file? .scala_history HTH LinkedIn * https://www.linkedin.com/pr

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Hi Mich, Repeated the steps as suggested, but still there is no such folder created in the home directory. Do we need to enable some property so that it creates one. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Sanchit, It seems I have to do some sort of analysis from the plan to get the query. Appreciate all your help on this. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscrib

Re: Spark SQL query

2021-02-01 Thread Mich Talebzadeh
Hi Arpan, log in as any user that has execution right for spark. type spark-shell, do some simple commands then exit. go to home directory of that user and look for that hidden file ${HOME/.spark_history it will be there. HTH, LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2g

Re: Spark SQL query

2021-02-01 Thread Sachit Murarka
Application wise it wont show as such. You can try to corelate it with explain plain output using some filters or attribute. Or else if you do not have too much queries in history. Just take queries and find plan of those queries and match it with shown in UI. I know thats the tedious task. But I

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Sachit, That is showing all the queries that got executed, but how it would get mapped to specific application Id it was associated with ? Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-01 Thread Sachit Murarka
Hi arpan, In spark shell when you type :history. then also it is not showing? Thanks Sachit On Mon, 1 Feb 2021, 21:13 Arpan Bhandari, wrote: > Hey Sachit, > > It shows the query plan, which is difficult to diagnose out and depict the > actual query. > > > Thanks, > Arpan Bhandari > > > > -- >

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Hey Mich, Thanks for the suggestions, but i don't see any such folder created on the edge node. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Hey Sachit, It shows the query plan, which is difficult to diagnose out and depict the actual query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-uns

Re: Spark SQL query

2021-01-31 Thread Mich Talebzadeh
Hi Arpan, I presume you are interested in what client was doing. If you have access to the edge node (where spark code is submitted), look for the following file ${HOME/.spark_history example -rw-r--r--. 1 hduser hadoop 111997 Jun 2 2018 .spark_history just use shell tools (cat, grep etc) t

Re: Spark SQL query

2021-01-31 Thread Sachit Murarka
Hi Arpan, Launch spark shell and in the shell type ":history" , you will see the query executed. In the Spark UI under SQL Tab you can see the query plan when you click on the details button(Though it won't show you the complete query). But by looking at the plan you can get your query. Hope t

Re: Spark SQL query

2021-01-29 Thread Arpan Bhandari
Hi Sachit, Yes it was executed using spark shell, history is already enabled. already checked sql tab but it is not showing the query. My spark version is 2.4.5 Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --

Re: Spark SQL query

2021-01-29 Thread Sachit Murarka
Hi Arpan, Was it executed using spark shell? If yes type :history Do u have history server enabled? If yes , go to the history and go to the SQL tab in History UI. Thanks Sachit On Fri, 29 Jan 2021, 19:19 Arpan Bhandari, wrote: > Hi , > > Is there a way to track back spark sql after it has be

RE: Spark-SQL Query Optimization: overlapping ranges

2017-05-01 Thread Lavelle, Shawn
Jacek, Thanks for your help. I didn’t want to write a bug/enhancement unless warranted. ~ Shawn From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Thursday, April 27, 2017 8:39 AM To: Lavelle, Shawn Cc: user Subject: Re: Spark-SQL Query Optimization: overlapping ranges Hi Shawn, If

Re: Spark-SQL Query Optimization: overlapping ranges

2017-04-27 Thread Jacek Laskowski
sort of thing. We’re probably going to write our own > org.apache.spark.sql.catalyst.rules.Rule to handle it. > > ~ Shawn > > > > *From:* Jacek Laskowski [mailto:ja...@japila.pl] > *Sent:* Wednesday, April 26, 2017 2:55 AM > *To:* Lavelle, Shawn > *Cc:* user > *Subject:* Re: Spark-SQL Q

RE: Spark-SQL Query Optimization: overlapping ranges

2017-04-27 Thread Lavelle, Shawn
thing. We’re probably going to write our own org.apache.spark.sql.catalyst.rules.Rule to handle it. ~ Shawn From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Wednesday, April 26, 2017 2:55 AM To: Lavelle, Shawn Cc: user Subject: Re: Spark-SQL Query Optimization: overlapping ranges explain it

Re: Spark-SQL Query Optimization: overlapping ranges

2017-04-26 Thread Jacek Laskowski
explain it and you'll know what happens under the covers. i.e. Use explain on the Dataset. Jacek On 25 Apr 2017 12:46 a.m., "Lavelle, Shawn" wrote: > Hello Spark Users! > >Does the Spark Optimization engine reduce overlapping column ranges? > If so, should it push this down to a Data Sourc

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
nd corresponding all PRs. *spark.sql.hive.convertMetastoreParquet false* *spark.sql.hive.metastorePartitionPruning true* *I had set the above properties from *SPARK-6910 & PRs. > > Yong > > > -- > *From:* Raju Bairishetti > *Sent:* Tuesday, January 17, 2017 3:00

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Yong Zhang
From: Raju Bairishetti Sent: Tuesday, January 17, 2017 3:00 AM To: user @spark Subject: Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided Had a high level look into the code. Seems getHiveQlPartitions method

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Had a high level look into the code. Seems getHiveQlPartitions method from HiveMetastoreCatalog is getting called irrespective of metastorePartitionPruning conf value. It should not fetch all partitions if we set metastorePartitionPruning to true (Default value for this is false) def getHiveQlP

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-15 Thread Raju Bairishetti
Waiting for suggestions/help on this... On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote: > Hello, > >Spark sql is generating query plan with all partitions information even > though if we apply filters on partitions in the query. Due to this, spark > driver/hive metastore is hittin

Re: Spark SQL query for List

2016-04-26 Thread Ramkumar V
I'm getting following exception if i form a query like this. Its not coming to the point where get(0) or get(1). Exception in thread "main" java.lang.RuntimeException: [1.22] failure: ``*'' expected but `cities' found *Thanks*, On Tue, Apr 26, 2016 at

Re: Spark SQL query for List

2016-04-26 Thread Hyukjin Kwon
Doesn't get(0) give you the Array[String] for CITY (am I missing something?) On 26 Apr 2016 11:02 p.m., "Ramkumar V" wrote: JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(ctx); DataFrame parquetFile = sqlContext.parquetFile( "hdfs:/XYZ:802

Re: Spark SQL query for List

2016-04-26 Thread Ramkumar V
JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(ctx); DataFrame parquetFile = sqlContext.parquetFile( "hdfs:/XYZ:8020/user/hdfs/parquet/*.parquet"); parquetFile.registerTempTable("parquetFile"); DataFrame tempDF = sqlContext.sql("SEL

Re: Spark SQL query for List

2016-04-26 Thread Hyukjin Kwon
Could you maybe share your codes? On 26 Apr 2016 9:51 p.m., "Ramkumar V" wrote: > Hi, > > I had loaded JSON file in parquet format into SparkSQL. I can't able to > read List which is inside JSON. > > Sample JSON > > { > "TOUR" : { > "CITIES" : ["Paris","Berlin","Prague"] > }, > "BUDJET" : 10

Re: Spark sql query taking long time

2016-03-03 Thread Gourav Sengupta
Hi, using dataframes you can use SQL, and SQL has an option of JOIN, BETWEEN, IN and LIKE OPERATIONS. Why would someone use a dataframe and then use them as RDD's? :) Regards, Gourav Sengupta On Thu, Mar 3, 2016 at 4:28 PM, Sumedh Wale wrote: > On Thursday 03 March 2016 09:15 PM, Gourav Sengup

Re: Spark sql query taking long time

2016-03-03 Thread Sumedh Wale
On Thursday 03 March 2016 09:15 PM, Gourav Sengupta wrote: Hi, why not read the table into a dataframe directly using SPARK CSV package. You are trying to solve the problem the roun

Re: Spark sql query taking long time

2016-03-03 Thread Gourav Sengupta
Hi, why not read the table into a dataframe directly using SPARK CSV package. You are trying to solve the problem the round about way. Regards, Gourav Sengupta On Thu, Mar 3, 2016 at 12:33 PM, Sumedh Wale wrote: > On Thursday 03 March 2016 11:03 AM, Angel Angel wrote: > > Hello Sir/Madam, > >

Re: Spark sql query taking long time

2016-03-03 Thread Sumedh Wale
On Thursday 03 March 2016 11:03 AM, Angel Angel wrote: Hello Sir/Madam, I am writing one application using spark sql. i made the vary big table using the following command  val

Re: Spark sql query taking long time

2016-03-02 Thread Ted Yu
Have you seen the thread 'Filter on a column having multiple values' where Michael gave this example ? https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/107522969592/2840265927289860/2388bac36e.html FYI On Wed, Mar 2, 2016 at 9:3

RE: Spark SQL query AVRO file

2015-08-07 Thread java8964
Good to know that. Let me research it and give it a try. Thanks Yong From: mich...@databricks.com Date: Fri, 7 Aug 2015 11:44:48 -0700 Subject: Re: Spark SQL query AVRO file To: java8...@hotmail.com CC: user@spark.apache.org You can register your data as a table using this library and then query

Re: Spark SQL query AVRO file

2015-08-07 Thread Michael Armbrust
run the query? > > Thanks > > Yong > > -- > From: mich...@databricks.com > Date: Fri, 7 Aug 2015 11:32:21 -0700 > Subject: Re: Spark SQL query AVRO file > To: java8...@hotmail.com > CC: user@spark.apache.org > > > Have you considered trying Spark SQL's native supp

RE: Spark SQL query AVRO file

2015-08-07 Thread java8964
...@databricks.com Date: Fri, 7 Aug 2015 11:32:21 -0700 Subject: Re: Spark SQL query AVRO file To: java8...@hotmail.com CC: user@spark.apache.org Have you considered trying Spark SQL's native support for avro data? https://github.com/databricks/spark-avro On Fri, Aug 7, 2015 at 11:30 AM, java8964

Re: Spark SQL query AVRO file

2015-08-07 Thread Michael Armbrust
Have you considered trying Spark SQL's native support for avro data? https://github.com/databricks/spark-avro On Fri, Aug 7, 2015 at 11:30 AM, java8964 wrote: > Hi, Spark users: > > We currently are using Spark 1.2.2 + Hive 0.12 + Hadoop 2.2.0 on our > production cluster, which has 42 data/task

Re: Spark SQL query key/value in Map

2015-04-16 Thread JC Francisco
Ah yeah, didn't notice that difference. Thanks! It worked. On Fri, Apr 17, 2015 at 4:27 AM, Yin Huai wrote: > For Map type column, fields['driver'] is the syntax to retrieve the map > value (in the schema, you can see "fields: map"). The syntax of > fields.driver is used for struct type. > > On

Re: Spark SQL query key/value in Map

2015-04-16 Thread Yin Huai
For Map type column, fields['driver'] is the syntax to retrieve the map value (in the schema, you can see "fields: map"). The syntax of fields.driver is used for struct type. On Thu, Apr 16, 2015 at 12:37 AM, jc.francisco wrote: > Hi, > > I'm new with both Cassandra and Spark and am experimentin

Re: Spark-sql query got exception.Help

2015-03-26 Thread 李铖
Yes,the exception occured sometimes,but at the end the final result rised. 2015-03-26 11:08 GMT+08:00 Saisai Shao : > Would you mind running again to see if this exception can be reproduced > again, since exception in MapOutputTracker seldom occurs, maybe some other > exceptions which lead to t

Re: Spark-sql query got exception.Help

2015-03-25 Thread Saisai Shao
Would you mind running again to see if this exception can be reproduced again, since exception in MapOutputTracker seldom occurs, maybe some other exceptions which lead to this error. Thanks Jerry 2015-03-26 10:55 GMT+08:00 李铖 : > One more exception.How to fix it .Anybody help me ,please. > > >

Re: Spark-sql query got exception.Help

2015-03-25 Thread 李铖
One more exception.How to fix it .Anybody help me ,please. org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:386) at org

Re: Spark-sql query got exception.Help

2015-03-25 Thread 李铖
Yes, it works after I append the two properties in spark-defaults.conf. As I use python programing on spark platform,the python api does not have SparkConf api. Thanks. 2015-03-25 21:07 GMT+08:00 Cheng Lian : > Oh, just noticed that you were calling sc.setSystemProperty. Actually > you need t

Re: Spark-sql query got exception.Help

2015-03-25 Thread Cheng Lian
Oh, just noticed that you were calling |sc.setSystemProperty|. Actually you need to set this property in SparkConf or in spark-defaults.conf. And there are two configurations related to Kryo buffer size, * spark.kryoserializer.buffer.mb, which is the initial size, and * spark.kryoserializer.b

Re: Spark-sql query got exception.Help

2015-03-25 Thread 李铖
Here is the full track 15/03/25 17:48:34 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, cloud1): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 39135 at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.w

Re: Spark-sql query got exception.Help

2015-03-25 Thread Cheng Lian
Could you please provide the full stack trace? On 3/25/15 6:26 PM, 李铖 wrote: It is ok when I do query data from a small hdfs file. But if the hdfs file is 152m,I got this exception. I try this code .'sc.setSystemProperty("spark.kryoserializer.buffer.mb",'256')'.error still. ``` com.esoterics

Re: spark sql query optimization , and decision tree building

2014-10-27 Thread Yanbo Liang
If you want to calculate mean, variance, minimum, maximum and total count for each columns, especially for features of machine learning, you can try MultivariateOnlineSummarizer. MultivariateOnlineSummarizer implements a numerically stable algorithm to compute sample mean and variance by column in

Re: spark sql query optimization , and decision tree building

2014-10-22 Thread sanath kumar
Thank you very much , two more small questions : 1) val output = sqlContext.sql("SELECT * From people") my output has 128 columns and single row . how to find the which column has the maximum value in a single row using scala ? 2) as each row has 128 columns how to print each row into a text wh

RE: spark sql query optimization , and decision tree building

2014-10-22 Thread Cheng, Hao
The “output” variable is actually a SchemaRDD, it provides lots of DSL API, see http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD 1) How to save result values of a query into a list ? [CH:] val list: Array[Row] = output.collect, however get 1M records into a

Re: Spark SQL Query Plan optimization

2014-08-02 Thread Michael Armbrust
The number of partitions (which decides the number of tasks) is fixed after any shuffle and can be configured using 'spark.sql.shuffle.partitions' though SQLConf (i.e. sqlContext.set(...) or "SET spark.sql.shuffle.partitions=..." in sql) It is possible we will auto select this based on statistics