[ANNOUNCE] Apache Kyuubi released 1.7.3

2023-09-24 Thread Zhen Wang
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.7.3 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for en

[ANNOUNCE] Apache Kyuubi released 1.7.2

2023-09-18 Thread Zhen Wang
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.7.2 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for en

Re: read compressed hdfs files using SparkContext.textFile?

2015-09-08 Thread shenyan zhen
Realized I was using spark-shell, so it assumes local file. By submitting a spark job, the same code worked fine.. On Tue, Sep 8, 2015 at 3:13 PM, shenyan zhen wrote: > Hi, > > For hdfs files written with below code: > > rdd.saveAsTextFile(getHdfsPat

read compressed hdfs files using SparkContext.textFile?

2015-09-08 Thread shenyan zhen
Hi, For hdfs files written with below code: rdd.saveAsTextFile(getHdfsPath(...), classOf [org.apache.hadoop.io.compress.GzipCodec]) I can see the hdfs files been generated: 0 /lz/streaming/am/144173460/_SUCCESS 1.6 M /lz/streaming/am/144173460/part-0.gz 1.6 M /lz/streamin

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread shenyan zhen
t; > On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu wrote: > >> The folder is in "/tmp" by default. Could you use "df -h" to check the >> free space of /tmp? >> >> Best Regards, >> Shixiong Zhu >> >> 2015-09-05 9:50 GMT+08:00 shenyan

SparkContext initialization error- java.io.IOException: No space left on device

2015-09-04 Thread shenyan zhen
Has anyone seen this error? Not sure which dir the program was trying to write to. I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client mode. 15/09/04 21:36:06 ERROR SparkContext: Error adding jar (java.io.IOException: No space left on device), was the --addJars option used? 15

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
your objective and show some code snippet? Shenyan On Tue, Jul 28, 2015 at 3:23 PM, wrote: > Thank you for your response Zhen, > > > > I am using some vendor specific JDBC driver JAR file (honestly I dont know > where it came from). It’s api is NOT like JdbcRDD, instead, mor

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
Hi Saif, Are you using JdbcRDD directly from Spark? If yes, then the poor distribution could be due to the bound key you used. See the JdbcRDD Scala doc at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.JdbcRDD : sql the text of the query. The query must contain t

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread shenyan zhen
In case it helps: I got around it temporarily by saving and reseting the context class loader around creating HiveContext. On Jul 2, 2015 4:36 AM, "Terry Hole" wrote: > Found this a bug in spark 1.4.0: SPARK-8368 > > > Thanks! > Terry > > On Thu,

Re: Spark Cluster Benchmarking Frameworks

2015-06-03 Thread Zhen Jia
Hi Jonathan, Maybe you can try BigDataBench. http://prof.ict.ac.cn/BigDataBench/ <http://prof.ict.ac.cn/BigDataBench/> . It provides lots of workloads, including both Hadoop and Spark based workloads. Zhen Jia hodgesz wrote > Hi Spark Experts, > > I am curious what peop

Re: Driver hangs on running mllib word2vec

2015-01-05 Thread Eric Zhen
ends on the vocabSize. Even without overflow, there > are still other bottlenecks, for example, syn0Global and syn1Global, each > of them has vocabSize * vectorSize elements. > > Thanks. > > Zhan Zhang > > > > On Jan 5, 2015, at 7:47 PM, Eric Zhen wrote: > > Hi X

Re: Driver hangs on running mllib word2vec

2015-01-05 Thread Eric Zhen
ary size? -Xiangrui > > On Sun, Jan 4, 2015 at 11:18 PM, Eric Zhen wrote: > > Hi, > > > > When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup > > usage. Here is the jstack output: > > > > "main" prio=10 tid=0x

Driver hangs on running mllib word2vec

2015-01-04 Thread Eric Zhen
Hi, When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup usage. Here is the jstack output: "main" prio=10 tid=0x40112800 nid=0x46f2 runnable [0x4162e000] java.lang.Thread.State: RUNNABLE at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Object

Re: SparkSQL exception on spark.sql.codegen

2014-11-18 Thread Eric Zhen
n't have the resources > to investigate backporting a fix. However, if you can reproduce the > problem in Spark 1.2 then please file a JIRA. > > On Mon, Nov 17, 2014 at 9:37 PM, Eric Zhen wrote: > >> Yes, it's always appears on a part of the whole tasks in a stage(i.e. 1

Re: SparkSQL exception on spark.sql.codegen

2014-11-17 Thread Eric Zhen
17, 2014 at 7:04 PM, Eric Zhen wrote: > >> Hi Michael, >> >> We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4. >> >> On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust > > wrote: >> >>> What version of Spark SQL? >>> >>&g

Re: SparkSQL exception on spark.sql.codegen

2014-11-17 Thread Eric Zhen
Hi Michael, We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4. On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust wrote: > What version of Spark SQL? > > On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen wrote: > >> Hi all, >> >> We run SparkS

SparkSQL exception on spark.sql.codegen

2014-11-15 Thread Eric Zhen
Hi all, We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we got exceptions as below, has anyone else saw these before? java.lang.ExceptionInInitializerError at org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92) at org.apache.spark.sql.ex

Re: multiple passes in mapPartitions

2014-06-13 Thread zhen
Thank you for your suggestion. We will try it out and see how it performs. We think the single call to mapPartitions will be faster but we could be wrong. It would be nice to have a "clone method" on the iterator. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.

spark master UI does not keep detailed application history

2014-06-13 Thread zhen
I have been trying to get detailed history of previous spark shell executions (after exiting spark shell). In standalone mode and Spark 1.0, I think the spark master UI is supposed to provide detailed execution statistics of all previously run jobs. This is supposed to be viewable by clicking on th

spark.eventLog.enabled not working on spark on AWS EC2

2014-06-13 Thread zhen
following directories (I made sure all the directories were created and had the right permissions): hdfs:///spark_logs /root/spark_logs hdfs://:8020/spark_logs Nothing seems to work. Can you give me some advise why it is not working? Zhen -- View this message in context: http://apache-spark-user

multiple passes in mapPartitions

2014-06-12 Thread zhen
memory. Which is also bad in terms of more GC. Is there a faster/better way of taking multiple passes without copying all the data? Thank you, Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/multiple-passes-in-mapPartitions-tp7555.html Sent from the

Re: problem starting the history server on EC2

2014-06-11 Thread zhen
started the history server like the following. ./start-history-server.sh hdfs:///spark_logs --port 18080 In order to see the history server UI I needed to open up inbound traffic for the port 18080 in AWS. As follows custom TCP port 18080 from anywhere Hope this will help others. Zhen -- View

Re: problem starting the history server on EC2

2014-06-10 Thread zhen
Sure here it is: drwxrwxrwx 2 1000 root 4096 Jun 11 01:05 spark_logs Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-starting-the-history-server-on-EC2-tp7361p7373.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: problem starting the history server on EC2

2014-06-10 Thread zhen
root root 4096 Jun 11 02:08 tmp drwxrwxrwx 2 root root 4096 Jun 11 02:08 spark_log Thanks Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-starting-the-history-server-on-EC2-tp7361p7370.html Sent from the Apache Spark User List mailing list

problem starting the history server on EC2

2014-06-10 Thread zhen
exist. But I have definitely created the directory and made sure everyone can read/write/execute in the directory. Can you tell me why it does not work? Thank you Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-starting-the-history-server-on

Re: A new resource for getting examples of Spark RDD API calls

2014-05-21 Thread zhen
Great, thanks for that tip. I will update the documents! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-resource-for-getting-examples-of-Spark-RDD-API-calls-tp5529p6210.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: A new resource for getting examples of Spark RDD API calls

2014-05-16 Thread zhen
Thanks for the suggestion. I will look into this. Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-resource-for-getting-examples-of-Spark-RDD-API-calls-tp5529p5532.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

A new resource for getting examples of Spark RDD API calls

2014-05-15 Thread zhen
into it. Hope you find it useful. Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-resource-for-getting-examples-of-Spark-RDD-API-calls-tp5529.html Sent from the Apache Spark User List mailing list archive at Nabble.com.