[ANNOUNCE] Apache Kyuubi (Incubating) released 1.4.0-incubating

2021-12-10 Thread Fei Wang
Hello Spark Community, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.4.0-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache

spark-sql[1.4.0] not compatible hive sql when using in with date_sub or regexp_replace

2016-01-25 Thread our...@cnsuning.com
data local inpath 'data.txt' into table spark.test_or1; when execute this hive command in spark-sql(1.4.0) encountor UnresolvedException: Invalid call to dataType on unresolved object, tree: 'ta.lbl_nm . However ,this command can be executed succussfully in hive-shell; sele

Re: Window Functions importing issue in Spark 1.4.0

2016-01-20 Thread satish chandra j
Hi Ted, Thanks for sharing the link on rowNumber example on usage Could you please let me know if I could use rowNumber window function in my currenct Spark 1.4.0 version If yes, than why am I getting error in "import org.apache.spark.sql. expressions.Window" a

Re: Window Functions importing issue in Spark 1.4.0

2016-01-07 Thread Jacek Laskowski
tter.com/jaceklaskowski On Thu, Jan 7, 2016 at 12:11 PM, Ted Yu wrote: > Please take a look at the following for sample on how rowNumber is used: > https://github.com/apache/spark/pull/9050 > > BTW 1.4.0 was an old release. > > Please consider upgrading. > > On Thu, Jan 7, 2

Re: Window Functions importing issue in Spark 1.4.0

2016-01-07 Thread Ted Yu
Please take a look at the following for sample on how rowNumber is used: https://github.com/apache/spark/pull/9050 BTW 1.4.0 was an old release. Please consider upgrading. On Thu, Jan 7, 2016 at 3:04 AM, satish chandra j wrote: > HI All, > Currently using Spark 1.4.0 version, I

Window Functions importing issue in Spark 1.4.0

2016-01-07 Thread satish chandra j
HI All, Currently using Spark 1.4.0 version, I have a requirement to add a column having Sequential Numbering to an existing DataFrame I understand Window Function "rowNumber" serves my purpose hence I have below import statements to include the same import org.apache.spark.sql.expressi

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread emlyn
emlyn wrote > > xjlin0 wrote >> I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with >> or without Hadoop or home compiled with ant or maven). There was no >> error message in v1.4.x, system prompt nothing. On v1.5.x, once I enter >> $SPARK_HOME

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread Emlyn Corrin
have JAVA_HOME set to a java 7 jdk? > > 2015-10-23 7:12 GMT-04:00 emlyn : > >> xjlin0 wrote >> > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with >> > or without Hadoop or home compiled with ant or maven). There was no >> error >>

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread Jonathan Coveney
do you have JAVA_HOME set to a java 7 jdk? 2015-10-23 7:12 GMT-04:00 emlyn : > xjlin0 wrote > > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with > > or without Hadoop or home compiled with ant or maven). There was no > error > > message in v1.4.

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread emlyn
xjlin0 wrote > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with > or without Hadoop or home compiled with ant or maven). There was no error > message in v1.4.x, system prompt nothing. On v1.5.x, once I enter > $SPARK_HOME/bin/pyspark or spark-shell, I go

Provide sampling ratio while loading json in spark version > 1.4.0

2015-09-23 Thread Udit Mehta
Hi, In earlier versions of spark(< 1.4.0), we were able to specify the sampling ratio while using *sqlContext.JsonFile* or *sqlContext.JsonRDD* so that we dont inspect each and every element while inferring the schema. I see that the use of these methods is deprecated in the newer spark vers

Re: Spark Streaming stop gracefully doesn't return to command line after upgrade to 1.4.0 and beyond

2015-09-18 Thread Petr Novak
I removed custom shutdown hook and it still doesn't work. I'm using KafkaDirectStream. I sometimes get java.lang.InterruptedException on Ctrl+C sometimes it goes through fine. I have this code now: ... some stream processing ... ssc.start() ssc.awaitTermination() ssc.stop(stopSparkContext = fals

Re: Spark Streaming stop gracefully doesn't return to command line after upgrade to 1.4.0 and beyond

2015-09-10 Thread Tathagata Das
Spark 1.4.0 introduced built-in shutdown hooks that would shutdown StreamingContext and SparkContext (similar to yours). If you are also introducing your shutdown hook, I wonder whats the behavior going to be. Try doing a jstack to see where the system is stuck. Alternatively, remove your

Spark Streaming stop gracefully doesn't return to command line after upgrade to 1.4.0 and beyond

2015-09-10 Thread Petr Novak
Hello, my Spark streaming v1.3.0 code uses sys.ShutdownHookThread { ssc.stop(stopSparkContext = true, stopGracefully = true) } to use Ctrl+C in command line to stop it. It returned back to command line after it finished batch but it doesn't with v1.4.0-v.1.5.0. Was the behaviour or required cod

Re: TimeoutException on start-slave spark 1.4.0

2015-08-28 Thread Alexander Pivovarov
; "), stdout=subprocess.PIPE).communicate()[0] == "'200'": break On Thu, Aug 27, 2015 at 3:07 PM, Alexander Pivovarov wrote: > I see the following error time to time when try to start slaves on spark > 1.4.0 > > > [hadoop@ip-10-0-27-24

TimeoutException on start-slave spark 1.4.0

2015-08-27 Thread Alexander Pivovarov
I see the following error time to time when try to start slaves on spark 1.4.0 [hadoop@ip-10-0-27-240 apps]$ pwd /mnt/var/log/apps [hadoop@ip-10-0-27-240 apps]$ cat spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ip-10-0-27-240.ec2.internal.out Spark Command: /usr/java/latest/bin/java -cp

Spark 1.4.0 Docker Slave GPU Access

2015-08-11 Thread Nastooh Avessta (navesta)
Hi Trying to access GPU from a Spark 1.4.0 Docker slave, without much luck. In my Spark program, I make a system call to a script, which performs various calculations using GPU. I am able to run this script as standalone, or via Mesos Marathon; however, calling the script through Spark fails

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Vanzin, spark-submit.cmd works Thanks Proust From: Marcelo Vanzin To: Proust GZ Feng/China/IBM@IBMCN Cc: Sean Owen , user Date: 07/29/2015 10:35 AM Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0 Can you run the windows batch files (e.g. spark

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Marcelo Vanzin
nch lib, see > below stacktrace > > LAUNCH_CLASSPATH: > C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar > java -cp > *C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar* > org.apache.spark.launcher.Main org.apache.spark.deploy.

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Although I'm not sure how valuable Cygwin support it is, at least the release notes need mention that Cygwin is not supported by design from 1.4.0 >From the description of the changeset, looks like remove the supporting is not intended by the author Thanks Proust From: Sachin

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Hi, Owen Add back the cygwin classpath detection can pass the issue mentioned before, but there seems lack of further support in the launch lib, see below stacktrace LAUNCH_CLASSPATH: C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar java -cp C:\spark-1.4.0-bin-hadoop2.3

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Sachin Naik
nslation in one script, seems > reasonable. > > On Tue, Jul 28, 2015 at 9:13 PM, Steve Loughran > wrote: >> >> there's a spark-submit.cmd file for windows. Does that work? >> >> On 27 Jul 2015, at 21:19, Proust GZ Feng wrote: >> >> Hi, Spark

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Sean Owen
> > there's a spark-submit.cmd file for windows. Does that work? > > On 27 Jul 2015, at 21:19, Proust GZ Feng wrote: > > Hi, Spark Users > > Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin > support in bin/spark-class > >

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Steve Loughran
there's a spark-submit.cmd file for windows. Does that work? On 27 Jul 2015, at 21:19, Proust GZ Feng mailto:pf...@cn.ibm.com>> wrote: Hi, Spark Users Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin support in bin/spark-class The changeset is https:/

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Sean Owen
Owen, the problem under Cygwin is while run spark-submit under 1.4.0, > it simply report > > Error: Could not find or load main class org.apache.spark.launcher.Main > > This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as > "/c/spark-1.4.0-bin-hadoop2.3/lib/

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Proust GZ Feng
Thanks Owen, the problem under Cygwin is while run spark-submit under 1.4.0, it simply report Error: Could not find or load main class org.apache.spark.launcher.Main This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as " /c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-27 Thread Sean Owen
It wasn't removed, but rewritten. Cygwin is just a distribution of POSIX-related utilities so you should be able to use the normal .sh scripts. In any event, you didn't say what the problem is? On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng wrote: > Hi, Spark Users > > Loo

NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-27 Thread Proust GZ Feng
Hi, Spark Users Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin support in bin/spark-class The changeset is https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3 The changeset said "Add a librar

Re: 1.4.0 classpath issue with spark-submit

2015-07-25 Thread Michal Haris
park-shell for exploration and I have a runner class that >> executes some tasks with spark-submit. I used to run against >> 1.4.0-SNAPSHOT. Since then 1.4.0 and 1.4.1 were released so I tried to >> switch to the official release. Now, when I run the program as a shell, >>

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-23 Thread Ruslan Dautkhanov
gt;> data, that is, key-value stores, databases, etc. >> >> On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu wrote: >> >>> Please take a look at SPARK-2365 which is in progress. >>> >>> On Tue, Jul 14, 2015 at 5:18 PM, swetha >>> wrote: >>

Re: 1.4.0 classpath issue with spark-submit

2015-07-23 Thread Akhil Das
have a runner class that > executes some tasks with spark-submit. I used to run against > 1.4.0-SNAPSHOT. Since then 1.4.0 and 1.4.1 were released so I tried to > switch to the official release. Now, when I run the program as a shell, > everything works but when I try to run it wi

1.4.0 classpath issue with spark-submit

2015-07-21 Thread Michal Haris
I have a spark program that uses dataframes to query hive and I run it both as a spark-shell for exploration and I have a runner class that executes some tasks with spark-submit. I used to run against 1.4.0-SNAPSHOT. Since then 1.4.0 and 1.4.1 were released so I tried to switch to the official

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-07-18 Thread Naveen Madhire
is an array of structs. So, can you change your >> pattern matching to the following? >> >> case Row(rows: Seq[_]) => rows.asInstanceOf[Seq[Row]].map(elem => ...) >> >> On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones < >> garjo...@socialmetrix.com> wrote: >

Spark 1.4.0 org.apache.spark.sql.AnalysisException: cannot resolve 'probability' given input columns

2015-07-16 Thread lokeshkumar
Hi forumI am currently using Spark 1.4.0, and started using the ML pipeline framework.I ran the example program "ml.JavaSimpleTextClassificationPipeline" which uses the LogisticRegression. But I wanted to do multiclass classification, so I used DecisionTreeClassifier pres

Re: Spark 1.4.0 compute-classpath.sh

2015-07-15 Thread Lokesh Kumar Padhnavis
; On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar wrote: > >> Hi forum >> >> I have downloaded the latest spark version 1.4.0 and started using it. >> But I couldn't find the compute-classpath.sh file in bin/ which I am using >> in previous versions to provide third party l

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
ou point me to the patch to fix the serialization stack? Maybe I >>> can pull it in and rerun my job. >>> >>> Chen >>> >>> On Wed, Jul 15, 2015 at 4:40 PM, Tathagata Das >>> wrote: >>> >>>> Your streaming job may have been se

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Tathagata Das
t; On Wed, Jul 15, 2015 at 4:40 PM, Tathagata Das >> wrote: >> >>> Your streaming job may have been seemingly running ok, but the DStream >>> checkpointing must have been failing in the background. It would have been >>> visible in the log4j logs. In 1.4.0, we

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Ted Yu
ling in the background. It would have been >> visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so >> that checkpointing failures dont get hidden in the background. >> >> The fact that the serialization stack is not being shown correctly, is a >>

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
in the background. It would have been > visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so > that checkpointing failures dont get hidden in the background. > > The fact that the serialization stack is not being shown correctly, is a > known bug in Spark 1.4.0, but is

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Tathagata Das
Your streaming job may have been seemingly running ok, but the DStream checkpointing must have been failing in the background. It would have been visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so that checkpointing failures dont get hidden in the background. The fact that

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Ted Yu
anything changed on 1.4.0 w.r.t > DStream checkpointint? > > Detailed error from driver: > > 15/07/15 18:00:39 ERROR yarn.ApplicationMaster: User class threw > exception: *java.io.NotSerializableException: DStream checkpointing has > been enabled but the DStreams with their

NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
The streaming job has been running ok in 1.2 and 1.3. After I upgraded to 1.4, I started seeing error as below. It appears that it fails in validate method in StreamingContext. Is there anything changed on 1.4.0 w.r.t DStream checkpointint? Detailed error from driver: 15/07/15 18:00:39 ERROR

Re: Spark 1.4.0 compute-classpath.sh

2015-07-15 Thread Marcelo Vanzin
That has never been the correct way to set you app's classpath. Instead, look at http://spark.apache.org/docs/latest/configuration.html and search for "extraClassPath". On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar wrote: > Hi forum > > I have downloaded the latest

Spark 1.4.0 compute-classpath.sh

2015-07-15 Thread lokeshkumar
Hi forum I have downloaded the latest spark version 1.4.0 and started using it. But I couldn't find the compute-classpath.sh file in bin/ which I am using in previous versions to provide third party libraries to my application. Can anyone please let me know where I can provide CLASSPATH wi

Tasks unevenly distributed in Spark 1.4.0

2015-07-15 Thread gisleyt
Hello all, I upgraded from spark 1.3.1 to 1.4.0, but I'm experiencing a massive drop in performance for the application I'm running. I've (somewhat) reproduced this behaviour in the attached file. My current spark setup may not be optimal exactly for this reproduction, but I

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Ted Yu
swetha >> wrote: >> >>> Hi, >>> >>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in >>> Spark >>> Streaming to do lookups/updates/deletes in RDDs using keys by storing >>> them >>> as key/value pairs

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Tathagata Das
: > Please take a look at SPARK-2365 which is in progress. > > On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote: > >> Hi, >> >> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark >> Streaming to do lookups/updates/deletes in RDDs using keys b

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Ted Yu
Please take a look at SPARK-2365 which is in progress. On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote: > Hi, > > Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark > Streaming to do lookups/updates/deletes in RDDs using keys by storing them > as

Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread swetha
Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD

Re: Problems after upgrading to spark 1.4.0

2015-07-14 Thread Luis Ángel Vicente Sánchez
rk 1.4; is there any config option that had been added and it's mandatory? 2015-07-13 22:12 GMT+01:00 Tathagata Das : > Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the > Sparkcontext in the driver, which would shutdown the executors. I am not > sure whether thi

Upgrade Spark-1.3.0 to Spark-1.4.0 in CDH5.4

2015-07-13 Thread ashishdutt
Hello all, The configuration of my cluster is as follows; # 4 noded cluster running on Centos OS 6.4 # spark-1.3.0 installed on all I would like to use SparkR shipped with spark-1.4.0. I checked Cloudera and find that the latest release CDH5.4 still does not have the spark-1.4.0. Forums like

Re: Problems after upgrading to spark 1.4.0

2015-07-13 Thread Tathagata Das
Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the Sparkcontext in the driver, which would shutdown the executors. I am not sure whether this is related or not, but somehow the executor's shutdown hook is being called. Can you check the driver logs to see if driver'

Re: Problems after upgrading to spark 1.4.0

2015-07-13 Thread Luis Ángel Vicente Sánchez
my spark jobs from spark 1.2.1 to spark 1.4.0 > and after deploying it to mesos, it's not working anymore. > > The upgrade process was quite easy: > > - Create a new docker container for spark 1.4.0. > - Upgrade spark job to use spark 1.4.0 as a dependency and create a new &

Problems after upgrading to spark 1.4.0

2015-07-13 Thread Luis Ángel Vicente Sánchez
I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0 and after deploying it to mesos, it's not working anymore. The upgrade process was quite easy: - Create a new docker container for spark 1.4.0. - Upgrade spark job to use spark 1.4.0 as a dependency and create a new f

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-07-09 Thread RedOakMark
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23732.html> > To unsubscribe from Spark 1.4.0 - Using SparkR on EC2 Instance, click here > <http://apache-spark-user-list.1001560.n3.nabbl

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
nov.com | 617.299.6746 > > > From: Yin Huai > Date: Monday, July 6, 2015 at 11:41 AM > To: Denny Lee > Cc: Simeon Simeonov , Andy Huang , > user > > Subject: Re: 1.4.0 regression: out-of-memory errors on small data > > Hi Sim, > > I

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Simeon Simeonov
ang mailto:andy.hu...@servian.com.au>>, user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf "spark.driver.extraJavaOptions=-XX:MaxPe

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
, Jul 6, 2015 at 1:36 PM Simeon Simeonov wrote: > >> The file is at >> https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 >> >> The command was included in the gist >> >> SPARK_REPL_OPTS="-XX:MaxPermSize=256m" >&g

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Denny Lee
.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 > > The command was included in the gist > > SPARK_REPL_OPTS="-XX:MaxPermSize=256m" > spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages > com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
ain" > 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed > broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 > GB) > > That did not change up until 4Gb of PermGen space and 8Gb for driver & > executor each. > > I stopped at this point bec

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
ece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver & executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner.

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
rved it happening with those who had JDK 6. The problem went away >> after installing jdk 8. This was only for the tutorial materials which was >> about loading a parquet file. >> >> Regards >> Andy >> >> On Sat, Jul 4, 2015 at 2:54 AM, sim wrote: &

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Denny Lee
Jul 4, 2015 at 2:54 AM, sim wrote: > >> @bipin, in my case the error happens immediately in a fresh shell in >> 1.4.0. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-mem

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Andy Huang
, sim wrote: > @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html > Sent from th

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread sim
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread bipin
I have a hunch I want to share: I feel that data is not being deallocated in memory (at least like in 1.3). Once it goes in-memory it just stays there. Spark SQL works fine, the same query when run on a new shell won't throw that error, but when run on a shell which has been used for other queries

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread bipin
I will second this. I very rarely used to get out-of-memory errors in 1.3. Now I get these errors all the time. I feel that I could work on 1.3 spark-shell for long periods of time without spark throwing that error, whereas in 1.4 the shell needs to be restarted or gets killed frequently. -- Vie

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Simeon Simeonov
From: Yin Huai mailto:yh...@databricks.com>> Date: Thursday, July 2, 2015 at 4:34 PM To: Simeon Simeonov mailto:s...@swoop.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, Seems you already set the PermGe

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
HiveContext's methods). Can you just use the sqlContext created by the shell and try again? Thanks, Yin On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai wrote: > Hi Sim, > > Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 > (explained in https://issues.apache.org/

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
Hi Sim, Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the command you used to launch Spark shell? This will increase t

1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread sim
A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0. This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4> includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the comman

Re: Issue with parquet write after join (Spark 1.4.0)

2015-07-01 Thread Michael Armbrust
onnectException: Connection refused: slave2/...:54845 >> >> Could you look in the executor logs (stderr on slave2) and see what made >> it shut down? Since you are doing a join there's a high possibility of OOM >> etc. >> >> >> Thanks >> Best Regard

Re: Issue with parquet write after join (Spark 1.4.0)

2015-07-01 Thread Raghavendra Pandey
rr on slave2) and see what made >> it shut down? Since you are doing a join there's a high possibility of OOM >> etc. >> >> >> Thanks >> Best Regards >> >> On Wed, Jul 1, 2015 at 10:20 AM, Pooja Jain >> wrote: >> >>> Hi, >&g

Re: Issue with parquet write after join (Spark 1.4.0)

2015-07-01 Thread Pooja Jain
d: slave2/...:54845 > > Could you look in the executor logs (stderr on slave2) and see what made > it shut down? Since you are doing a join there's a high possibility of OOM > etc. > > > Thanks > Best Regards > > On Wed, Jul 1, 2015 at 10:20 AM, Pooja Jain wrote: >

Re: Issue with parquet write after join (Spark 1.4.0)

2015-07-01 Thread Akhil Das
20 AM, Pooja Jain wrote: > Hi, > > We are using Spark 1.4.0 on hadoop using yarn-cluster mode via > spark-submit. We are facing parquet write issue after doing dataframe joins > > We have a full data set and then an incremental data. We are reading them > as dataframes, joining

Issue with parquet write after join (Spark 1.4.0)

2015-06-30 Thread Pooja Jain
Hi, We are using Spark 1.4.0 on hadoop using yarn-cluster mode via spark-submit. We are facing parquet write issue after doing dataframe joins We have a full data set and then an incremental data. We are reading them as dataframes, joining them, and then writing the data to the hdfs system in

Re: Spark 1.4.0: read.df() causes excessive IO

2015-06-30 Thread Exie
Just to add to this, here's some more info: val myDF = hiveContext.read.parquet("s3n://myBucket/myPath/") Produces these... 2015-07-01 03:25:50,450 INFO [pool-14-thread-4] (org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening 's3n://myBucket/myPath/part-r-00339.parquet' for reading That

RE: 1.4.0

2015-06-30 Thread yana
I wonder if this could be a side effect of Spark-3928. Does ending the path with *.parquet work? Original message From: Exie Date:06/30/2015 9:20 PM (GMT-05:00) To: user@spark.apache.org Subject: 1.4.0 So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would

Spark 1.4.0: Parquet partitions / folder hierarchy changed from 1.3.1

2015-06-30 Thread Exie
rquet("s3n://myBucket/myPath/2014/07/01") or val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07") However since upgrading to Spark 1.4.0 it doesnt seem to be working the same way. The first line works, in the "01" folder is all the normal

1.4.0

2015-06-30 Thread Exie
rquet("s3n://myBucket/myPath/2014/07/01") or val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07") However since upgrading to Spark 1.4.0 it doesnt seem to be working the same way. The first line works, in the "01" folder is all the normal files: 20

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-30 Thread Shivaram Venkataraman
k >>>>>> though and it might need some more manual tweaks. >>>>>> >>>>>> Thanks >>>>>> Shivaram >>>>>> >>>>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson < >>>>>> m...@redo

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-30 Thread Shivaram Venkataraman
orkarounds - fire up a separate EC2 instance with RStudio Server >>>>> that initializes the spark context against a separate Spark cluster. >>>>> >>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman < >>>>> shiva...@eecs.berkeley

Spark 1.4.0: read.df() causes excessive IO

2015-06-29 Thread Exie
Hi Folks, I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so far is the data frame reader/writer. Previously: val myData = hiveContext.load("s3n://someBucket/somePath/","parquet") Now: val myData = hiveContext.read.parquet("s3n://someBuc

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-27 Thread Shivaram Venkataraman
Thanks Mark for the update. For those interested Vincent Warmerdam also has some details on making the /root/spark installation work at https://issues.apache.org/jira/browse/SPARK-8596?focusedCommentId=14604328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14604328

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-27 Thread RedOakMark
For anyone monitoring the thread, I was able to successfully install and run a small Spark cluster and model using this method: First, make sure that the username being used to login to RStudio Server is the one that was used to install Spark on the EC2 instance. Thanks to Shivaram for his help h

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread Shivaram Venkataraman
a...@eecs.berkeley.edu> wrote: >>> >>> We don't have a documented way to use RStudio on EC2 right now. We have >>> a ticket open at https://issues.apache.org/jira/browse/SPARK-8596 to >>> discuss work-arounds and potential solutions for this. >>&g

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread mark
nstallation and usage of the newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using RStudio to run on top of it. Using these instructions ( http://spark.apache.org/docs/latest/ec2-scripts.html <http://spark.apache.org/docs/latest/ec2-scripts.html>  ) we can fire up an

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread Shivaram Venkataraman
Fri, Jun 26, 2015 at 6:27 AM, RedOakMark > wrote: > >> Good morning, >> >> I am having a bit of trouble finalizing the installation and usage of the >> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using >> RStudio to run on top of it. >> >

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread Mark Stephenson
t; > I am having a bit of trouble finalizing the installation and usage of the > newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using > RStudio to run on top of it. > > Using these instructions ( > http://spark.apache.org/docs/latest/ec2-scripts.html > <

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread Shivaram Venkataraman
t; I am having a bit of trouble finalizing the installation and usage of the > newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using > RStudio to run on top of it. > > Using these instructions ( > http://spark.apache.org/docs/latest/ec2-scripts.html > <http

Spark 1.4.0 - Using SparkR on EC2 Instance

2015-06-26 Thread RedOakMark
Good morning, I am having a bit of trouble finalizing the installation and usage of the newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using RStudio to run on top of it. Using these instructions ( http://spark.apache.org/docs/latest/ec2-scripts.html <h

Re: Spark 1.4.0, Secure YARN Cluster, Application Master throws 500 connection refused (Resolved)

2015-06-25 Thread Nachiketa
anyway). > > 3. Found this issue : https://issues.apache.org/jira/browse/SPARK-5837 > and multiple references to other YARN issues to the same. Continuing to > understand and explore the possibilities documented there. > > Regards, > Nachiketa > > On Fri, Jun 26, 2015 a

Re: Spark 1.4.0, Secure YARN Cluster, Application Master throws 500 connection refused

2015-06-25 Thread Nachiketa
issue : https://issues.apache.org/jira/browse/SPARK-5837 and multiple references to other YARN issues to the same. Continuing to understand and explore the possibilities documented there. Regards, Nachiketa On Fri, Jun 26, 2015 at 12:52 AM, Nachiketa wrote: > Spark 1.4.0 - Custom built f

Spark 1.4.0, Secure YARN Cluster, Application Master throws 500 connection refused

2015-06-25 Thread Nachiketa
Spark 1.4.0 - Custom built from source against Hortonworks HDP 2.2 (hadoop 2.6.0+) HDP 2.2 Cluster (Secure, kerberos) spark-shell (--master yarn-client) launches fine and the prompt shows up. Clicking on the Application Master url on the YARN RM UI, throws 500 connect error. The same build works

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Michael Armbrust
e Row(rows: Seq[_]) => rows.asInstanceOf[Seq[Row]].map(elem => ...) > > On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones < > garjo...@socialmetrix.com> wrote: > >> Hi All, >> >> I am using the new *Apache Spark version 1.4.0 Data-frames API* to >> ext

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Yin Huai
es wrote: > Hi All, > > I am using the new *Apache Spark version 1.4.0 Data-frames API* to > extract information from Twitter's Status JSON, mostly focused on the Entities > Object <https://dev.twitter.com/overview/api/entities> - the relevant >

How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Gustavo Arjones
Hi All, I am using the new Apache Spark version 1.4.0 Data-frames API to extract information from Twitter's Status JSON, mostly focused on the Entities Object <https://dev.twitter.com/overview/api/entities> - the relevant part to this question is showed below: { ... ...

Re: [Spark Streaming 1.4.0] SPARK-5063, Checkpointing and queuestream

2015-06-23 Thread Tathagata Das
queue stream does not support driver checkpoint recovery since the RDDs in the queue are arbitrary generated by the user and its hard for Spark Streaming to keep track of the data in the RDDs (thats necessary for recovering from checkpoint). Anyways queue stream is meant of testing and development,

Re: [Spark Streaming 1.4.0] SPARK-5063, Checkpointing and queuestream

2015-06-22 Thread Shaanan Cohney
It's a generated set of shell commands to run (written in C, highly optimized numerical computer), which is create from a set of user provided parameters. The snippet above is: task_outfiles_to_cmds = OrderedDict(run_sieving.leftover_tasks) task_outfiles_to_cmds.update(generate_sieving_t

  1   2   >