Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set "spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 > game.txt Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spar

Abount Jobs UI on yarn-client mode

2015-06-19 Thread Sea
Hi, all: I run spark on yarn, I want to see the Jobs UI http://ip:4040/, but it redirect to http://${yarn.ip}/proxy/application_1428110196022_924324/ which can not be found. Why? Anyone can help?

Abount Jobs UI in yarn-client mode

2015-06-19 Thread Sea
Hi, all: I run spark on yarn, I want to see the Jobs UI http://ip:4040/, but it redirect to http://${yarn.ip}/proxy/application_1428110196022_924324/ which can not be found. Why? Anyone can help?

Re?? Abount Jobs UI in yarn-client mode

2015-06-21 Thread Sea
Thanks?? it is ok now?? -- -- ??: "Gavin Yue";; : 2015??6??21??(??) 4:40 ??: "Sea"<261810...@qq.com>; : "user"; : Re: Abount Jobs UI in yarn-client mode I got the same problem when

How to use an different version of hive

2015-06-21 Thread Sea
Hi, all: We have an own version of hive 0.13.1, we alter the code about permissions of operating table and an issue of hive 0.13.1 HIVE-6131 Spark 1.4.0 support different versions of hive metastore, who can give an example? I am confused of these spark.sql.hive.metastore.jars spark.sql.hive.me

Time is ugly in Spark Streaming....

2015-06-26 Thread Sea
Hi, all I find a problem in spark streaming, when I use the time in function foreachRDD... I find the time is very interesting. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet) dataStream.map(x => createGroup(x._2, dimensio

?????? Time is ugly in Spark Streaming....

2015-06-26 Thread Sea
Yes, I make it. -- -- ??: "Gerard Maas";; : 2015??6??26??(??) 5:40 ??: "Sea"<261810...@qq.com>; : "user"; "dev"; : Re: Time is ugly in Spark Streaming Are you shari

回复: Uncaught exception in thread delete Spark local dirs

2015-06-27 Thread Sea
SPARK_CLASSPATH is nice, spark.jars needs to list all the jars one by one when submitting to yarn because spark.driver.classpath and spark.executor.classpath are not available in yarn mode. Can someone remove the warnning from the code or upload the jar in spark.driver.classpath and spark.execut

?????? Time is ugly in Spark Streaming....

2015-06-27 Thread Sea
Yes , things go well now. It is a problem of SimpleDateFormat. Thank you all. -- -- ??: "Dumas Hwang";; : 2015??6??27??(??) 8:16 ??: "Tathagata Das"; : "Emrehan T??z??n"; "Sea"<2

?????? How to recover in case user errors in streaming

2015-06-29 Thread Sea
Hi, TD In my code, I write like this: dstream.foreachRDD { rdd => try { } catch { } } it will still throw exception, and the driver will be killed... I need to catch exception in rdd.foreachPartition just like these, so I need to retry by myself .. dstream.foreach

[SPARK-SQL] libgplcompression.so already loaded in another classloader

2015-07-07 Thread Sea
Hi, all I found an Exception when using spark-sql java.lang.UnsatisfiedLinkError: Native Library /data/lib/native/libgplcompression.so already loaded in another classloader ... I set spark.sql.hive.metastore.jars=. in file spark-defaults.conf It does not happen every time. Who knows why

About extra memory on yarn mode

2015-07-14 Thread Sea
Hi all: I have a question about why spark on yarn will need extra memory I apply for 10 executors, executor memory 6g, I find that it will allocate 1g more for 1 executor, totally 7g for 1 executor. I try to set spark.yarn.executor.memoryOverhead, but it did not help. 1g for 1 executor is too muc

?????? mapwithstate Hangs with Error cleaning broadcast

2016-03-15 Thread Sea
Hi,manas: Maybe you can look at this bug: https://issues.apache.org/jira/browse/SPARK-13566 -- -- ??: "manas kar";; : 2016??3??15??(??) 10:48 ??: "Ted Yu"; : "user"; : Re: mapwithstate Hangs with Error cleaning b

?????? Limit pyspark.daemon threads

2016-03-18 Thread Sea
It's useless... The python worker will go above 1.5g in my production environment -- -- ??: "Ted Yu";; : 2016??3??17??(??) 10:50 ??: "Carlile, Ken"; : "user"; : Re: Limit pyspark.daemon threads I took a look at

??????spark sql on hive

2016-04-18 Thread Sea
It's a bug of hive. Please use hive metastore service instead of visiting mysql directly. set hive.metastore.uris in hive-site.xml -- -- ??: "Jieliang Li";; : 2016??4??19??(??) 12:55 ??: "user"; : spark sql on hive hi

??????????: G1 GC takes too much time

2016-05-29 Thread Sea
Yes, It seems like that CMS is better. I have tried G1 as databricks' blog recommended, but it's too slow. -- -- ??: "condor join";; : 2016??5??30??(??) 10:17 ??: "Ted Yu"; : "user@spark.apache.org"; : : G1 GC takes

Bug about reading parquet files

2016-07-08 Thread Sea
I have a problem reading parquet files. sql: select count(1) from omega.dwd_native where year='2016' and month='07' and day='05' and hour='12' and appid='6'; The hive partition is (year,month,day,appid) only two tasks, and it will list all directories in my table, not only /user/omega/events/

?????? Bug about reading parquet files

2016-07-08 Thread Sea
Relation.(LogicalRelation.scala:37) -- -- ??: "lian.cs.zju";; ????: 2016??7??8??(??) 4:47 ??: "Sea"<261810...@qq.com>; : "user"; : Re: Bug about reading parquet files What's the Spark version? Could you please also attach result

?????? Spark hangs at "Removed broadcast_*"

2016-07-12 Thread Sea
please provide your jstack info. -- -- ??: "dhruve ashar";; : 2016??7??13??(??) 3:53 ??: "Anton Sviridov"; : "user"; : Re: Spark hangs at "Removed broadcast_*" Looking at the jstack, it seems that it doesn't contain a

How to use Java8

2016-01-05 Thread Sea
Hi, all I want to support java8, I use JDK1.8.0_65 in production environment, but it doesn't work. Should I build spark using jdk1.8, and set 1.8 in pom.xml? java.lang.UnsupportedClassVersionError: Unsupported major.minor version 52.

?????? How to use Java8

2016-01-05 Thread Sea
thanks -- -- ??: "Andy Davidson";; : 2016??1??6??(??) 11:04 ??: "Sea"<261810...@qq.com>; "user"; ????: Re: How to use Java8 Hi Sea From: Sea <261810...@qq.com> Date: Tues

How to query data in tachyon with spark-sql

2016-01-20 Thread Sea
Hi,all I want to mount some hive table in tachyon, but I don't know how to query data in tachyon with spark-sql, who knows?

?????? Shuffle memory woes

2016-02-07 Thread Sea
Hi??Corey?? "The dataset is 100gb at most, the spills can up to 10T-100T", Are your input files lzo format, and you use sc.text() ? If memory is not enough, spark will spill 3-4x of input data to disk. -- -- ??: "Corey Nolet";; : 2016??2

??????off-heap certain operations

2016-02-11 Thread Sea
spark.memory.offHeap.enabled (default is false) , it is wrong in spark docs. Spark1.6 do not recommend to use off-heap memory. -- -- ??: "Ovidiu-Cristian MARCU";; : 2016??2??12??(??) 5:51 ??: "user"; : off-heap certain oper

Deadlock between UnifiedMemoryManager and BlockManager

2016-02-29 Thread Sea
Hi??all?? My spark version is 1.6.0, I found a deadlock in production environment, Anyone can help? I create an issue in jira: https://issues.apache.org/jira/browse/SPARK-13566 === "block-manager-slave-async-thread-pool-1": at org.apach

?????? Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Sea
Hi, Sumona: It's a bug in Spark old version, In spark 1.6.0, it is fixed. After the application complete, spark master will load event log to memory, and it is sync because of actor. If the event log is big, spark master will hang a long time, and you can not submit any applications,

[Spark Streaming] Unable to write checkpoint when restart

2015-11-21 Thread Sea
When I restart my streaming program??this bug found And it will kill my program I am using spark 1.4.1 15/11/22 03:20:00 WARN CheckpointWriter: Error in attempt 1 of writing checkpoint to hdfs://streaming/user/dm/order_predict/streaming_ v2/10/checkpoint/checkpoint-144813360 org.apa

回复: Asked to remove non-existent executor exception

2015-07-26 Thread Sea
This exception is so ugly!!! The screen is full of these information when the program runs a long time, and they will not fail the job. I comment it in the source code. I think this information is useless because the executor is already removed and I don't know what does the executor id mean

About memory leak in spark 1.4.1

2015-08-01 Thread Sea
Hi, all I upgrage spark to 1.4.1, many applications failed... I find the heap memory is not full , but the process of CoarseGrainedExecutorBackend will take more memory than I expect, and it will increase as time goes on, finally more than max limited of the server, the worker will die. An

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea
?) ????4:11 ??: "Sea"<261810...@qq.com>; "user"; : "rxin"; "joshrosen"; "davies"; : Re: About memory leak in spark 1.4.1 Hi,reducing spark.storage.memoryFraction did the trick for me. Heap doesn't get filled because it

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea
spark.storage.memoryFraction is in heap memory, but my situation is that the memory is more than heap memory ! Anyone else use spark 1.4.1 in production? -- -- ??: "Ted Yu";; : 2015??8??2??(??) 5:45 ??: &q

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea
e point! I have tried set memoryFraction to 0.2??but it didn't help. I don't know whether it will still exist in the next release 1.5, I wish not. -- -- ??: "Barak Gitsis";; ????: 2015??8??2??(??) 9:55 ??: "Sea"<

Re?? About memory leak in spark 1.4.1

2015-08-04 Thread Sea
-- -- ??: "Igor Berman";; : 2015??8??3??(??) 7:56 ??: "Sea"<261810...@qq.com>; : "Barak Gitsis"; "Ted Yu"; "user@spark.apache.org"; "rxin"; "joshrosen"; "

Re?? About memory leak in spark 1.4.1

2015-08-05 Thread Sea
No one help me... I help myself, I split the cluster to two cluster 1.4.1 and 1.3.0 -- -- ??: "Ted Yu";; : 2015??8??4??(??) 10:28 ??: "Igor Berman"; : "Sea"<261810...@qq.com>; &qu

How to specify file

2016-09-22 Thread Sea
Hi, I want to run sql directly on files, I find that spark has supported sql like select * from csv.`/path/to/file`, but files may not be split by ','. Maybe it is split by '\001', how can I specify delimiter? Thank you!

?????? How to specify file

2016-09-23 Thread Sea
Hi, Hemant, Aditya: I don't want to create temp table and write code, I just want to run sql directly on files "select * from csv.`/path/to/file`" -- -- ??: "Hemant Bhanawat";; : 2016??9??23??(??????)

Re: SparkSQL not able to read a empty table location

2017-05-21 Thread Sea
please try spark.sql.hive.verifyPartitionPath true -- Original -- From: "Steve Loughran";; Date: Sat, May 20, 2017 09:19 PM To: "Bajpai, Amit X. -ND"; Cc: "user@spark.apache.org"; Subject: Re: SparkSQL not able to read a empty table location On 20 Ma

InvalidAuxServiceException in dynamicAllocation

2015-03-17 Thread Sea
Hi, all: Spark1.3.0 hadoop2.2.0 I put the following params in the spark-defaults.conf spark.dynamicAllocation.enabled true spark.dynamicAllocation.minExecutors 20 spark.dynamicAllocation.maxExecutors 300 spark.dynamicAllocation.executorIdleTimeout 300 spark.shuffle.service.enabled true‍ I

Filesystem closed Exception

2015-03-20 Thread Sea
Hi, all: When I exit the console of spark-sql, the following exception throwed.. My spark version is 1.3.0, hadoop version is 2.2.0 Exception in thread "Thread-3" java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) at

Filesystem closed Exception

2015-03-20 Thread Sea
Hi, all: When I exit the console of spark-sql, the following exception throwed.. Exception in thread "Thread-3" java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:629) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClien

Exception in using updateStateByKey

2015-04-27 Thread Sea
Hi, all: I use function updateStateByKey in Spark Streaming, I need to store the states for one minite, I set "spark.cleaner.ttl" to 120, the duration is 2 seconds, but it throws Exception Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist

?????? Exception in using updateStateByKey

2015-04-27 Thread Sea
Yes??I can make it larger, but I also want to know whether there is a formula to estimate it -- -- ??: "Ted Yu";; : 2015??4??27??(??) 10:20 ??: "Sea"<261810...@qq.com>; : Re: Exception in

?????? Exception in using updateStateByKey

2015-04-27 Thread Sea
Maybe I found the solution??do not set 'spark.cleaner.ttl', just use function 'remember' in StreamingContext to set the rememberDuration. -- -- ??: "Ted Yu";; : 2015??4??27??(??) 10:20 ??: "

How does Spark deal with Data Skewness?

2017-06-22 Thread Sea aj
Hi everyone, I have read about some interesting ideas on how to manage skew but I was not sure if any of these techniques are being used in Spark 2.x versions or not? To name a few, "Salting the Data" and "Dynamic Repartitioning" are techniques introduced in Spark Summits. I am really curious to k

Reading csv.gz files

2017-07-05 Thread Sea aj
I need to import a set of files with csv.gz extension into Spark. each file contains a table of data. I was wondering if anyone knows how to read it? Sent with Mailtrack

Re: SPARK Issue in Standalone cluster

2017-08-22 Thread Sea aj
Hi everyone, I have a huge dataframe with 1 billion rows and each row is a nested list. That being said, I want to train some ML models on this df but due to the huge size, I get out memory error on one of my nodes when I run fit function. currently, my configuration is: 144 cores, 16 cores for e

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to train a linear regression model on the df but it failed due to lack of memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of CPU. I decided to split my data into multiple chunks and train the model in

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
obably your model would do equally well with much less > samples. Have you checked bias and variance if you use much less random > samples? > > On 22. Aug 2017, at 12:58, Sea aj wrote: > > I have a large dataframe of 1 billion rows of type LabeledPoint. I tried > to train a linear

Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
Hi, I am trying to feed a huge dataframe to a ml algorithm in Spark but it crashes due to the shortage of memory. Is there a way to train the model on a subset of the data in multiple steps? Thanks Sent with Mailtrack

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
nt-descent-sgd > > On 23 August 2017 at 14:27, Sea aj wrote: > >> Hi, >> >> I am trying to feed a huge dataframe to a ml algorithm in Spark but it >> crashes due to the shortage of memory. >> >> Is there a way to train the model on a subset of the data

Re: Weight column values not used in Binary Logistic Regression Summary

2017-12-09 Thread Sea aj
Hello everyone, I have a data frame which has two columns: ids and features each cell in feature column is an array of Vectors.dense type. like: [(DenseVector([0.5692]),), (DenseVector([0.5086]),)] I need to train a new model for every single row of my data frame. How can I do it? ‌ On S