Hello Spark Community,
The Apache Kyuubi (Incubating) community is pleased to announce that
Apache Kyuubi (Incubating) 1.4.0-incubating has been released!
Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for
large-scale data processing and analytics, built on top of Apache
data local inpath 'data.txt' into table spark.test_or1;
when execute this hive command in spark-sql(1.4.0) encountor
UnresolvedException: Invalid call to dataType on unresolved object, tree:
'ta.lbl_nm . However ,this command can be executed succussfully in hive-shell;
sele
Hi Ted,
Thanks for sharing the link on rowNumber example on usage
Could you please let me know if I could use rowNumber window function in my
currenct Spark 1.4.0 version
If yes, than why am I getting error in "import org.apache.spark.sql.
expressions.Window" a
tter.com/jaceklaskowski
On Thu, Jan 7, 2016 at 12:11 PM, Ted Yu wrote:
> Please take a look at the following for sample on how rowNumber is used:
> https://github.com/apache/spark/pull/9050
>
> BTW 1.4.0 was an old release.
>
> Please consider upgrading.
>
> On Thu, Jan 7, 2
Please take a look at the following for sample on how rowNumber is used:
https://github.com/apache/spark/pull/9050
BTW 1.4.0 was an old release.
Please consider upgrading.
On Thu, Jan 7, 2016 at 3:04 AM, satish chandra j
wrote:
> HI All,
> Currently using Spark 1.4.0 version, I
HI All,
Currently using Spark 1.4.0 version, I have a requirement to add a column
having Sequential Numbering to an existing DataFrame
I understand Window Function "rowNumber" serves my purpose
hence I have below import statements to include the same
import org.apache.spark.sql.expressi
emlyn wrote
>
> xjlin0 wrote
>> I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with
>> or without Hadoop or home compiled with ant or maven). There was no
>> error message in v1.4.x, system prompt nothing. On v1.5.x, once I enter
>> $SPARK_HOME
have JAVA_HOME set to a java 7 jdk?
>
> 2015-10-23 7:12 GMT-04:00 emlyn :
>
>> xjlin0 wrote
>> > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with
>> > or without Hadoop or home compiled with ant or maven). There was no
>> error
>>
do you have JAVA_HOME set to a java 7 jdk?
2015-10-23 7:12 GMT-04:00 emlyn :
> xjlin0 wrote
> > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with
> > or without Hadoop or home compiled with ant or maven). There was no
> error
> > message in v1.4.
xjlin0 wrote
> I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with
> or without Hadoop or home compiled with ant or maven). There was no error
> message in v1.4.x, system prompt nothing. On v1.5.x, once I enter
> $SPARK_HOME/bin/pyspark or spark-shell, I go
Hi,
In earlier versions of spark(< 1.4.0), we were able to specify the sampling
ratio while using *sqlContext.JsonFile* or *sqlContext.JsonRDD* so that we
dont inspect each and every element while inferring the schema.
I see that the use of these methods is deprecated in the newer spark
vers
I removed custom shutdown hook and it still doesn't work. I'm using
KafkaDirectStream.
I sometimes get java.lang.InterruptedException on Ctrl+C sometimes it goes
through fine.
I have this code now:
... some stream processing ...
ssc.start()
ssc.awaitTermination()
ssc.stop(stopSparkContext = fals
Spark 1.4.0 introduced built-in shutdown hooks that would shutdown
StreamingContext and SparkContext (similar to yours). If you are also
introducing your shutdown hook, I wonder whats the behavior going to be.
Try doing a jstack to see where the system is stuck. Alternatively, remove
your
Hello,
my Spark streaming v1.3.0 code uses
sys.ShutdownHookThread {
ssc.stop(stopSparkContext = true, stopGracefully = true)
}
to use Ctrl+C in command line to stop it. It returned back to command line
after it finished batch but it doesn't with v1.4.0-v.1.5.0. Was the
behaviour or required cod
; "),
stdout=subprocess.PIPE).communicate()[0] == "'200'":
break
On Thu, Aug 27, 2015 at 3:07 PM, Alexander Pivovarov
wrote:
> I see the following error time to time when try to start slaves on spark
> 1.4.0
>
>
> [hadoop@ip-10-0-27-24
I see the following error time to time when try to start slaves on spark
1.4.0
[hadoop@ip-10-0-27-240 apps]$ pwd
/mnt/var/log/apps
[hadoop@ip-10-0-27-240 apps]$ cat
spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ip-10-0-27-240.ec2.internal.out
Spark Command: /usr/java/latest/bin/java -cp
Hi
Trying to access GPU from a Spark 1.4.0 Docker slave, without much luck. In my
Spark program, I make a system call to a script, which performs various
calculations using GPU. I am able to run this script as standalone, or via
Mesos Marathon; however, calling the script through Spark fails
Thanks Vanzin, spark-submit.cmd works
Thanks
Proust
From: Marcelo Vanzin
To: Proust GZ Feng/China/IBM@IBMCN
Cc: Sean Owen , user
Date: 07/29/2015 10:35 AM
Subject:Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0
Can you run the windows batch files (e.g. spark
nch lib, see
> below stacktrace
>
> LAUNCH_CLASSPATH:
> C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar
> java -cp
> *C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar*
> org.apache.spark.launcher.Main org.apache.spark.deploy.
Although I'm not sure how valuable Cygwin support it is, at least the
release notes need mention that Cygwin is not supported by design from
1.4.0
>From the description of the changeset, looks like remove the supporting is
not intended by the author
Thanks
Proust
From: Sachin
Hi, Owen
Add back the cygwin classpath detection can pass the issue mentioned
before, but there seems lack of further support in the launch lib, see
below stacktrace
LAUNCH_CLASSPATH:
C:\spark-1.4.0-bin-hadoop2.3\lib\spark-assembly-1.4.0-hadoop2.3.0.jar
java -cp
C:\spark-1.4.0-bin-hadoop2.3
nslation in one script, seems
> reasonable.
>
> On Tue, Jul 28, 2015 at 9:13 PM, Steve Loughran
> wrote:
>>
>> there's a spark-submit.cmd file for windows. Does that work?
>>
>> On 27 Jul 2015, at 21:19, Proust GZ Feng wrote:
>>
>> Hi, Spark
>
> there's a spark-submit.cmd file for windows. Does that work?
>
> On 27 Jul 2015, at 21:19, Proust GZ Feng wrote:
>
> Hi, Spark Users
>
> Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin
> support in bin/spark-class
>
>
there's a spark-submit.cmd file for windows. Does that work?
On 27 Jul 2015, at 21:19, Proust GZ Feng
mailto:pf...@cn.ibm.com>> wrote:
Hi, Spark Users
Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin
support in bin/spark-class
The changeset is
https:/
Owen, the problem under Cygwin is while run spark-submit under 1.4.0,
> it simply report
>
> Error: Could not find or load main class org.apache.spark.launcher.Main
>
> This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as
> "/c/spark-1.4.0-bin-hadoop2.3/lib/
Thanks Owen, the problem under Cygwin is while run spark-submit under
1.4.0, it simply report
Error: Could not find or load main class org.apache.spark.launcher.Main
This is because under Cygwin spark-class make the LAUNCH_CLASSPATH as "
/c/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-
It wasn't removed, but rewritten. Cygwin is just a distribution of
POSIX-related utilities so you should be able to use the normal .sh
scripts. In any event, you didn't say what the problem is?
On Tue, Jul 28, 2015 at 5:19 AM, Proust GZ Feng wrote:
> Hi, Spark Users
>
> Loo
Hi, Spark Users
Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of
Cygwin support in bin/spark-class
The changeset is
https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97#diff-fdf4d3e600042c63ffa17b692c4372a3
The changeset said "Add a librar
park-shell for exploration and I have a runner class that
>> executes some tasks with spark-submit. I used to run against
>> 1.4.0-SNAPSHOT. Since then 1.4.0 and 1.4.1 were released so I tried to
>> switch to the official release. Now, when I run the program as a shell,
>>
gt;> data, that is, key-value stores, databases, etc.
>>
>> On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu wrote:
>>
>>> Please take a look at SPARK-2365 which is in progress.
>>>
>>> On Tue, Jul 14, 2015 at 5:18 PM, swetha
>>> wrote:
>>
have a runner class that
> executes some tasks with spark-submit. I used to run against
> 1.4.0-SNAPSHOT. Since then 1.4.0 and 1.4.1 were released so I tried to
> switch to the official release. Now, when I run the program as a shell,
> everything works but when I try to run it wi
I have a spark program that uses dataframes to query hive and I run it both
as a spark-shell for exploration and I have a runner class that executes
some tasks with spark-submit. I used to run against 1.4.0-SNAPSHOT. Since
then 1.4.0 and 1.4.1 were released so I tried to switch to the official
is an array of structs. So, can you change your
>> pattern matching to the following?
>>
>> case Row(rows: Seq[_]) => rows.asInstanceOf[Seq[Row]].map(elem => ...)
>>
>> On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones <
>> garjo...@socialmetrix.com> wrote:
>
Hi forumI am currently using Spark 1.4.0, and started using the ML pipeline
framework.I ran the example program
"ml.JavaSimpleTextClassificationPipeline" which uses the LogisticRegression.
But I wanted to do multiclass classification, so I used
DecisionTreeClassifier pres
; On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar wrote:
>
>> Hi forum
>>
>> I have downloaded the latest spark version 1.4.0 and started using it.
>> But I couldn't find the compute-classpath.sh file in bin/ which I am using
>> in previous versions to provide third party l
ou point me to the patch to fix the serialization stack? Maybe I
>>> can pull it in and rerun my job.
>>>
>>> Chen
>>>
>>> On Wed, Jul 15, 2015 at 4:40 PM, Tathagata Das
>>> wrote:
>>>
>>>> Your streaming job may have been se
t; On Wed, Jul 15, 2015 at 4:40 PM, Tathagata Das
>> wrote:
>>
>>> Your streaming job may have been seemingly running ok, but the DStream
>>> checkpointing must have been failing in the background. It would have been
>>> visible in the log4j logs. In 1.4.0, we
ling in the background. It would have been
>> visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so
>> that checkpointing failures dont get hidden in the background.
>>
>> The fact that the serialization stack is not being shown correctly, is a
>>
in the background. It would have been
> visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so
> that checkpointing failures dont get hidden in the background.
>
> The fact that the serialization stack is not being shown correctly, is a
> known bug in Spark 1.4.0, but is
Your streaming job may have been seemingly running ok, but the DStream
checkpointing must have been failing in the background. It would have been
visible in the log4j logs. In 1.4.0, we enabled fast-failure for that so
that checkpointing failures dont get hidden in the background.
The fact that
anything changed on 1.4.0 w.r.t
> DStream checkpointint?
>
> Detailed error from driver:
>
> 15/07/15 18:00:39 ERROR yarn.ApplicationMaster: User class threw
> exception: *java.io.NotSerializableException: DStream checkpointing has
> been enabled but the DStreams with their
The streaming job has been running ok in 1.2 and 1.3. After I upgraded to
1.4, I started seeing error as below. It appears that it fails in validate
method in StreamingContext. Is there anything changed on 1.4.0 w.r.t
DStream checkpointint?
Detailed error from driver:
15/07/15 18:00:39 ERROR
That has never been the correct way to set you app's classpath.
Instead, look at http://spark.apache.org/docs/latest/configuration.html and
search for "extraClassPath".
On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar wrote:
> Hi forum
>
> I have downloaded the latest
Hi forum
I have downloaded the latest spark version 1.4.0 and started using it.
But I couldn't find the compute-classpath.sh file in bin/ which I am using
in previous versions to provide third party libraries to my application.
Can anyone please let me know where I can provide CLASSPATH wi
Hello all,
I upgraded from spark 1.3.1 to 1.4.0, but I'm experiencing a massive drop in
performance for the application I'm running. I've (somewhat) reproduced this
behaviour in the attached file.
My current spark setup may not be optimal exactly for this reproduction, but
I
swetha
>> wrote:
>>
>>> Hi,
>>>
>>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in
>>> Spark
>>> Streaming to do lookups/updates/deletes in RDDs using keys by storing
>>> them
>>> as key/value pairs
:
> Please take a look at SPARK-2365 which is in progress.
>
> On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote:
>
>> Hi,
>>
>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
>> Streaming to do lookups/updates/deletes in RDDs using keys b
Please take a look at SPARK-2365 which is in progress.
On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote:
> Hi,
>
> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
> as
Hi,
Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
Streaming to do lookups/updates/deletes in RDDs using keys by storing them
as key/value pairs.
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD
rk 1.4;
is there any config option that had been added and it's mandatory?
2015-07-13 22:12 GMT+01:00 Tathagata Das :
> Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the
> Sparkcontext in the driver, which would shutdown the executors. I am not
> sure whether thi
Hello all,
The configuration of my cluster is as follows;
# 4 noded cluster running on Centos OS 6.4
# spark-1.3.0 installed on all
I would like to use SparkR shipped with spark-1.4.0. I checked Cloudera and
find that the latest release CDH5.4 still does not have the spark-1.4.0.
Forums like
Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the
Sparkcontext in the driver, which would shutdown the executors. I am not
sure whether this is related or not, but somehow the executor's shutdown
hook is being called.
Can you check the driver logs to see if driver'
my spark jobs from spark 1.2.1 to spark 1.4.0
> and after deploying it to mesos, it's not working anymore.
>
> The upgrade process was quite easy:
>
> - Create a new docker container for spark 1.4.0.
> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
&
I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
and after deploying it to mesos, it's not working anymore.
The upgrade process was quite easy:
- Create a new docker container for spark 1.4.0.
- Upgrade spark job to use spark 1.4.0 as a dependency and create a new
f
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23732.html>
> To unsubscribe from Spark 1.4.0 - Using SparkR on EC2 Instance, click here
> <http://apache-spark-user-list.1001560.n3.nabbl
nov.com | 617.299.6746
>
>
> From: Yin Huai
> Date: Monday, July 6, 2015 at 11:41 AM
> To: Denny Lee
> Cc: Simeon Simeonov , Andy Huang ,
> user
>
> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>
> Hi Sim,
>
> I
ang
mailto:andy.hu...@servian.com.au>>, user
mailto:user@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data
Hi Sim,
I think the right way to set the PermGen Size is through driver extra JVM
options, i.e.
--conf "spark.driver.extraJavaOptions=-XX:MaxPe
, Jul 6, 2015 at 1:36 PM Simeon Simeonov wrote:
>
>> The file is at
>> https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1
>>
>> The command was included in the gist
>>
>> SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
>&g
.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1
>
> The command was included in the gist
>
> SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
> spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
> com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory
The file is at
https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1
The command was included in the gist
SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
com.databricks:spark-csv_2.10:1.0.3 --driver
ain"
> 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
> broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
> GB)
>
> That did not change up until 4Gb of PermGen space and 8Gb for driver &
> executor each.
>
> I stopped at this point bec
ece0 on
localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)
That did not change up until 4Gb of PermGen space and 8Gb for driver & executor
each.
I stopped at this point because the exercise started looking silly. It is clear
that 1.4.0 is using memory in a substantially different manner.
rved it happening with those who had JDK 6. The problem went away
>> after installing jdk 8. This was only for the tutorial materials which was
>> about loading a parquet file.
>>
>> Regards
>> Andy
>>
>> On Sat, Jul 4, 2015 at 2:54 AM, sim wrote:
&
Jul 4, 2015 at 2:54 AM, sim wrote:
>
>> @bipin, in my case the error happens immediately in a fresh shell in
>> 1.4.0.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-mem
, sim wrote:
> @bipin, in my case the error happens immediately in a fresh shell in 1.4.0.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
> Sent from th
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at
I have a hunch I want to share: I feel that data is not being deallocated in
memory (at least like in 1.3). Once it goes in-memory it just stays there.
Spark SQL works fine, the same query when run on a new shell won't throw
that error, but when run on a shell which has been used for other queries
I will second this. I very rarely used to get out-of-memory errors in 1.3.
Now I get these errors all the time. I feel that I could work on 1.3
spark-shell for long periods of time without spark throwing that error,
whereas in 1.4 the shell needs to be restarted or gets killed frequently.
--
Vie
From: Yin Huai mailto:yh...@databricks.com>>
Date: Thursday, July 2, 2015 at 4:34 PM
To: Simeon Simeonov mailto:s...@swoop.com>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data
Hi Sim,
Seems you already set the PermGe
HiveContext's
methods). Can you just use the sqlContext created by the shell and try
again?
Thanks,
Yin
On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai wrote:
> Hi Sim,
>
> Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
> (explained in https://issues.apache.org/
Hi Sim,
Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
(explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the
command you used to launch Spark shell? This will increase t
A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and
fails with a series of out-of-memory errors in 1.4.0.
This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4>
includes the code and the full output from the 1.3.1 and 1.4.0 runs,
including the comman
onnectException: Connection refused: slave2/...:54845
>>
>> Could you look in the executor logs (stderr on slave2) and see what made
>> it shut down? Since you are doing a join there's a high possibility of OOM
>> etc.
>>
>>
>> Thanks
>> Best Regard
rr on slave2) and see what made
>> it shut down? Since you are doing a join there's a high possibility of OOM
>> etc.
>>
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Jul 1, 2015 at 10:20 AM, Pooja Jain
>> wrote:
>>
>>> Hi,
>&g
d: slave2/...:54845
>
> Could you look in the executor logs (stderr on slave2) and see what made
> it shut down? Since you are doing a join there's a high possibility of OOM
> etc.
>
>
> Thanks
> Best Regards
>
> On Wed, Jul 1, 2015 at 10:20 AM, Pooja Jain wrote:
>
20 AM, Pooja Jain wrote:
> Hi,
>
> We are using Spark 1.4.0 on hadoop using yarn-cluster mode via
> spark-submit. We are facing parquet write issue after doing dataframe joins
>
> We have a full data set and then an incremental data. We are reading them
> as dataframes, joining
Hi,
We are using Spark 1.4.0 on hadoop using yarn-cluster mode via
spark-submit. We are facing parquet write issue after doing dataframe joins
We have a full data set and then an incremental data. We are reading them
as dataframes, joining them, and then writing the data to the hdfs system
in
Just to add to this, here's some more info:
val myDF = hiveContext.read.parquet("s3n://myBucket/myPath/")
Produces these...
2015-07-01 03:25:50,450 INFO [pool-14-thread-4]
(org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening
's3n://myBucket/myPath/part-r-00339.parquet' for reading
That
I wonder if this could be a side effect of Spark-3928. Does ending the path
with *.parquet work?
Original message From: Exie
Date:06/30/2015 9:20 PM (GMT-05:00)
To: user@spark.apache.org Subject: 1.4.0
So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would
rquet("s3n://myBucket/myPath/2014/07/01")
or
val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07")
However since upgrading to Spark 1.4.0 it doesnt seem to be working the same
way.
The first line works, in the "01" folder is all the normal
rquet("s3n://myBucket/myPath/2014/07/01")
or
val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07")
However since upgrading to Spark 1.4.0 it doesnt seem to be working the same
way.
The first line works, in the "01" folder is all the normal files:
20
k
>>>>>> though and it might need some more manual tweaks.
>>>>>>
>>>>>> Thanks
>>>>>> Shivaram
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson <
>>>>>> m...@redo
orkarounds - fire up a separate EC2 instance with RStudio Server
>>>>> that initializes the spark context against a separate Spark cluster.
>>>>>
>>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman <
>>>>> shiva...@eecs.berkeley
Hi Folks,
I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so
far is the data frame reader/writer. Previously:
val myData = hiveContext.load("s3n://someBucket/somePath/","parquet")
Now:
val myData = hiveContext.read.parquet("s3n://someBuc
Thanks Mark for the update. For those interested Vincent Warmerdam also has
some details on making the /root/spark installation work at
https://issues.apache.org/jira/browse/SPARK-8596?focusedCommentId=14604328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14604328
For anyone monitoring the thread, I was able to successfully install and run
a small Spark cluster and model using this method:
First, make sure that the username being used to login to RStudio Server is
the one that was used to install Spark on the EC2 instance. Thanks to
Shivaram for his help h
a...@eecs.berkeley.edu> wrote:
>>>
>>> We don't have a documented way to use RStudio on EC2 right now. We have
>>> a ticket open at https://issues.apache.org/jira/browse/SPARK-8596 to
>>> discuss work-arounds and potential solutions for this.
>>&g
nstallation and usage of the
newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using
RStudio to run on top of it.
Using these instructions (
http://spark.apache.org/docs/latest/ec2-scripts.html
<http://spark.apache.org/docs/latest/ec2-scripts.html> ) we can fire up an
Fri, Jun 26, 2015 at 6:27 AM, RedOakMark
> wrote:
>
>> Good morning,
>>
>> I am having a bit of trouble finalizing the installation and usage of the
>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using
>> RStudio to run on top of it.
>>
>
t;
> I am having a bit of trouble finalizing the installation and usage of the
> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using
> RStudio to run on top of it.
>
> Using these instructions (
> http://spark.apache.org/docs/latest/ec2-scripts.html
> <
t; I am having a bit of trouble finalizing the installation and usage of the
> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using
> RStudio to run on top of it.
>
> Using these instructions (
> http://spark.apache.org/docs/latest/ec2-scripts.html
> <http
Good morning,
I am having a bit of trouble finalizing the installation and usage of the
newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using
RStudio to run on top of it.
Using these instructions (
http://spark.apache.org/docs/latest/ec2-scripts.html
<h
anyway).
>
> 3. Found this issue : https://issues.apache.org/jira/browse/SPARK-5837
> and multiple references to other YARN issues to the same. Continuing to
> understand and explore the possibilities documented there.
>
> Regards,
> Nachiketa
>
> On Fri, Jun 26, 2015 a
issue : https://issues.apache.org/jira/browse/SPARK-5837 and
multiple references to other YARN issues to the same. Continuing to
understand and explore the possibilities documented there.
Regards,
Nachiketa
On Fri, Jun 26, 2015 at 12:52 AM, Nachiketa
wrote:
> Spark 1.4.0 - Custom built f
Spark 1.4.0 - Custom built from source against Hortonworks HDP 2.2 (hadoop
2.6.0+)
HDP 2.2 Cluster (Secure, kerberos)
spark-shell (--master yarn-client) launches fine and the prompt shows up.
Clicking on the Application Master url on the YARN RM UI, throws 500
connect error.
The same build works
e Row(rows: Seq[_]) => rows.asInstanceOf[Seq[Row]].map(elem => ...)
>
> On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones <
> garjo...@socialmetrix.com> wrote:
>
>> Hi All,
>>
>> I am using the new *Apache Spark version 1.4.0 Data-frames API* to
>> ext
es
wrote:
> Hi All,
>
> I am using the new *Apache Spark version 1.4.0 Data-frames API* to
> extract information from Twitter's Status JSON, mostly focused on the Entities
> Object <https://dev.twitter.com/overview/api/entities> - the relevant
>
Hi All,
I am using the new Apache Spark version 1.4.0 Data-frames API to extract
information from Twitter's Status JSON, mostly focused on the Entities Object
<https://dev.twitter.com/overview/api/entities> - the relevant part to this
question is showed below:
{
...
...
queue stream does not support driver checkpoint recovery since the RDDs in
the queue are arbitrary generated by the user and its hard for Spark
Streaming to keep track of the data in the RDDs (thats necessary for
recovering from checkpoint). Anyways queue stream is meant of testing and
development,
It's a generated set of shell commands to run (written in C, highly
optimized numerical computer), which is create from a set of user provided
parameters.
The snippet above is:
task_outfiles_to_cmds = OrderedDict(run_sieving.leftover_tasks)
task_outfiles_to_cmds.update(generate_sieving_t
1 - 100 of 157 matches
Mail list logo