from:"prosp4300"

DataFrame registerTempTable Concurrent Access

2015-06-30 Thread prosp4300

system to make the executing ordered? Such updating logic seems not functional programming, correct? Thanks a lot prosp4300

Re: Performance tuning in Spark SQL.

2015-07-01 Thread prosp4300

Please see below link for the ways available https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#performance-tuning For example, reduce spark.sql.shuffle.partitions from 200 to 10 could improve the performance significantly -- View this message in context: http://apache-spark-user-l

回复:HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread prosp4300

Hi, bdev Derby is the default embedded DB for Hive MetaStore if you do not specify a hive.metastore.uris, please take a look at the lib directory of hive, you can find out derby jar there, Spark does not require derby by default At 2015-07-07 17:07:28, "bdev" wrote: >Just trying to get sta

Re:Maintain Persistent Connection with Hive meta store

2015-07-07 Thread prosp4300

Each time you run the jar, a new JVM will be started, maintain connection between different JVM is not a correct way to think of > each time when I run that jar it tries to make connection with hive metastore At 2015-07-07 17:07:06, "wazza" wrote: >Hi I am new to Apache Spark and I have t

回复:Re: how to use DoubleRDDFunctions on mllib Vector?

2015-07-08 Thread prosp4300

Seems what Feynman mentioned is the source code instead of documentation, vectorMean is private, see https://github.com/apache/spark/blob/v1.3.0/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala At 2015-07-09 10:10:58, "诺铁" wrote: thanks, I understand now. but I c

Re:Spark query

2015-07-08 Thread prosp4300

As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs, please take a look below builtin UDFs of Hive, get day of year should be as simply as existing RDBMS https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions At 2015-07-09 12:

Re:SparkSQL 1.4 can't accept registration of UDF?

2015-07-14 Thread prosp4300

What's the result of "list jar" in both 1.3.1 and 1.4.0, please check if there is any difference At 2015-07-15 08:10:44, "ogoh" wrote: >Hello, >I am using SparkSQL along with ThriftServer so that we can access using Hive >queries. >With Spark 1.3.1, I can register UDF function. But, Spark

Re:[ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300

Congratulations! 在 2016-07-27 14:00:22，"Reynold Xin" 写道： Hi all, Apache Spark 2.0.0 is the first release of Spark 2.x line. It includes 2500+ patches from 300+ contributors. To download Spark 2.0, head over to the download page: http://spark.apache.org/downloads.html To view the releas

Re:Re: ORC v/s Parquet for Spark 2.0

2016-07-27 Thread prosp4300

Thanks for this immediate correction :) 在 2016-07-27 15:17:54，"Gourav Sengupta" 写道： Sorry, in my email above I was referring to KUDU, and there is goes how can KUDU be right if it is mentioned in forums first with a wrong spelling. Its got a difficult beginning where people were trying to

Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300

Additionally, in the paragraph about MLlib, three links missed, it is better to provide the links to give us more information, thanks a lot See this blog post for details See this talk to learn more This talk lists many of these new features. 在 2016-07-27 15:18:41，"Ofir Manor" 写道： Hold the

Re:Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300

The page mentioned before is the release notes that miss the links http://spark.apache.org/releases/spark-release-2-0-0.html#mllib At 2016-07-27 15:56:00, "prosp4300" wrote: Additionally, in the paragraph about MLlib, three links missed, it is better to provide the links to gi

回复：RE: Error not found value sqlContext

2015-11-20 Thread prosp4300

Looks like a classpath problem, if you can provide the command you used to run your application and environment variable SPARK_HOME, it will help others to identify the root problem 在2015年11月20日 18:59，Satish 写道: Hi Michael, As my current Spark version is 1.4.0 than why it error out as "error:

Re:Re: RE: Error not found value sqlContext

2015-11-23 Thread prosp4300

ng JDBCRDD I tried couple of DataFrame related methods for which most of them errors stating that method has been overloaded Please let me know if any further inputs needed to analyze it Regards, Satish Chandra On Fri, Nov 20, 2015 at 5:46 PM, prosp4300 wrote: Looks like a classpath problem

回复：Spark DataFrames uses too many partition

2015-08-13 Thread prosp4300

Hi, I want to know how you coalesce the partition to one to improve the performance Thanks 在2015年08月11日 23:31，Al M 写道: I am using DataFrames with Spark 1.4.1. I really like DataFrames but the partitioning makes no sense to me. I am loading lots of very small files and joining them together.

回复：Does spark sql support column indexing

2015-08-19 Thread prosp4300

The answer is simply NO, But I hope someone could give more deep insight or any meaningful reference 在2015年08月19日 15:21，Todd 写道: I don't find related talk on whether spark sql supports column indexing. If it does, is there guide how to do it? Thanks.

RE：RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread prosp4300

By the way turn off the code generation could be an option to try, sometime code generation could introduce slowness 在2015年09月11日 15:58，Cheng, Hao 写道: Can you confirm if the query really run in the cluster mode? Not the local mode. Can you print the call stack of the executor when the query

Re:Log rollover in spark streaming jobs

2016-08-23 Thread prosp4300

Spark on Yarn by default support customized log4j configuration, RollingFileAppender could be used to avoid disk overflow as documented below If you need a reference to the proper location to put log files in the YARN so that YARN can properly display and aggregate them, use spark.yarn.app.co

Re:Do we still need to use Kryo serializer in Spark 1.6.2 ?

2016-08-23 Thread prosp4300

The way to use Kryo serializer is similar as Scala, like below, the only different is lack of convenient method "conf.registerKryoClasses", but it should be easy to make one by yourself conf=SparkConf() conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") conf.set("spark.kr

Spray Client VS PlayWS vs Spring RestTemplate within Spark Job

2016-09-06 Thread prosp4300

Hi, Spark Users As I know, Spray Client depends on Akka ActorSystem, is this dependency theoretically means it is not possible to use spray-client in Spark Job which run from Spark Executor nodes I believe PlayWS should works as a Restful client to run from Spark Executor, how about traditiona

Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300

Hi, Spark Users I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar It works for driver instance, but once en

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300

ystem >classpath; the application jar is not in the system classpath, so that >does not work. There are different ways for you to get it there, most >of them manual (YARN is, I think, the only RM supported in Spark where >the application itself can do it). > >On Thu, Dec 20, 2018

Re: Convert each partition of RDD to Dataframe

2020-02-27 Thread prosp4300

Looks no obvious relationship between the partition or tables, maybe try make them in different jobs, so they could run at same time to fully make use of the cluster resource. | | prosp4300 邮箱：prosp4...@163.com | Signature is customized by Netease Mail Master On 02/27/2020 22:50, Manjunath

DataFrame registerTempTable Concurrent Access

Re: Performance tuning in Spark SQL.

回复:HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Re:Maintain Persistent Connection with Hive meta store

回复:Re: how to use DoubleRDDFunctions on mllib Vector?

Re:Spark query

Re:SparkSQL 1.4 can't accept registration of UDF?

Re:[ANNOUNCE] Announcing Apache Spark 2.0.0

Re:Re: ORC v/s Parquet for Spark 2.0

Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

Re:Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

回复：RE: Error not found value sqlContext

Re:Re: RE: Error not found value sqlContext

回复：Spark DataFrames uses too many partition

回复：Does spark sql support column indexing

RE：RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

Re:Log rollover in spark streaming jobs

Re:Do we still need to use Kryo serializer in Spark 1.6.2 ?

Spray Client VS PlayWS vs Spring RestTemplate within Spark Job

Custom Metric Sink on Executor Always ClassNotFound

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

Re: Convert each partition of RDD to Dataframe

22 matches

Site Navigation

Mail list logo

Footer information