Hi i am using spark 1.3.1 and i am trying to submit spark job to the rest
url of spark ( spark::6066 using standalone cluster of spark
with deploy mode as cluster . Both driver and application and getting
submitted after doing their work (output created) both end up with killed
status. In the work
I figured it out. Needed a block style map and a check for null. The case
class is just to name the transformed columns.
case class Component(name: String, loadTimeMs: Long)
avroFile.filter($"lazyComponents.components".isNotNull)
.explode($"lazyComponents.components")
{ case Row(la
Please file a bug here: https://issues.apache.org/jira/browse/SPARK/
Could you also provide a way to reproduce this bug (including some datasets)?
On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga wrote:
> I've changed the SIFT feature extraction to SURF feature extraction and it
> works...
>
> Fol
I've changed the SIFT feature extraction to SURF feature extraction and it
works...
Following line was changed:
sift = cv2.xfeatures2d.SIFT_create()
to
sift = cv2.xfeatures2d.SURF_create()
Where should I file this as a bug? When not running on Spark it works fine
so I'm saying it's a spark bug.
Yea should have emphasized that. I'm running the same code on the same VM.
It's a VM with spark in standalone mode and I run the unit test directly on
that same VM. So OpenCV is working correctly on that same machine but when
moving the exact same OpenCV code to spark it just crashes.
On Tue, Jun
I see this
Is this a problem with my code or the cluster ? Is there any way to fix it ?
FetchFailed(BlockManagerId(2, phxdpehdc9dn2441.stratus.phx.ebay.com,
59574), shuffleId=1, mapId=80, reduceId=20, message=
org.apache.spark.shuffle.FetchFailedException: Failed to connect to
phxdpehdc9dn2441.st
- I see these in my mapper only task.
-
- *Input Size / Records: *68.0 GB / 577195178
- *Shuffle write: *95.1 GB / 282559291
- *Shuffle spill (memory): *2.8 TB
- *Shuffle spill (disk): *90.3 GB
I understand the first one, can someone give 1/2 liners for the next three
? also tel
Hello,
I have a question with Spark 1.4 ml library. In the copy function, it is
stated that the default implementation doesn't work of Params doesn't work
for models. (
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Model.scala#L49
)
As a result, some feature
Hi,
org.apache.spark.mllib.linalg.Vector =
(1048576,[35587,884670],[3.458767233,3.458767233])
it is sparse vector representation of terms
so the first term(1048576) is the length of vector
[35587,884670] is the index of words
[3.458767233,3.458767233] are the tf-idf values of the terms.
Thanks
S
Which version of Hive jar are you using? Hive 0.13.1 or Hive 0.12.0?
-Original Message-
From: ogoh [mailto:oke...@gmail.com]
Sent: Friday, June 5, 2015 10:10 AM
To: user@spark.apache.org
Subject: SparkSQL : using Hive UDF returning Map throws "rror:
scala.MatchError: interface java.util.
Hi Lee,
You should be able to create a PairRDD using the Nonce as the key, and the
AnalyticsEvent as the value. I'm very new to Spark, but here is some
uncompilable pseudo code that may or may not help:
events.map(event => (event.getNonce, event)).reduceByKey((a, b) =>
a).map(_._2)
The above cod
Hi, I have a RDD with MANY columns (e.g., hundreds), and most of my operation
is on columns, e.g., I need to create many intermediate variables from
different columns, what is the most efficient way to do this?
For example, if my dataRDD[Array[String]] is like below:
123, 523, 534, ..., 893
Hello,
I tested some custom udf on SparkSql's ThriftServer & Beeline (Spark 1.3.1).
Some udfs work fine (access array parameter and returning int or string
type).
But my udf returning map type throws an error:
"Error: scala.MatchError: interface java.util.Map (of class java.lang.Class)
(state=,co
But compile scope is supposed to be added to the assembly.
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
On Thu, Jun 4, 2015 at 1:24 PM, algermissen1971
wrote:
> Hi Iulian,
>
> On 26 May 2015, at 13:04, Iulian Dragoș
> wrote:
>
> >
> >
Thanks so much, Yiannis, Olivier, Huang!
On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas
wrote:
> Hi there,
>
> I would recommend checking out
> https://github.com/spark-jobserver/spark-jobserver which I think gives
> the functionality you are looking for.
> I haven't tested it though.
>
> BR
>
My current example doesn't use a Hive UDAF, but you would do something
pretty similar (it calls a new user defined UDAF, and there are wrappers to
make Spark SQL UDAFs from Hive UDAFs but they are private). So this is
doable, but since it pokes at internals it will likely break between
versions of
Hi there!
I'm trying to read a large .csv file (14GB) into a dataframe from S3 via the
spark-csv package. I want to load this data in parallel utilizing all 20
executors that I have, however by default only 3 executors are being used
(which downloaded 5gb/5gb/4gb).
Here is my script (im using pys
For the first round, you will have 16 reducers working since you have
32 partitions. Two of 32 partitions will know which reducer they will
go by sharing the same key using reduceByKey.
After this step is done, you will have 16 partitions, so the next
round will be 8 reducers.
Sincerely,
DB Tsai
Hi there,
I would recommend checking out
https://github.com/spark-jobserver/spark-jobserver which I think gives the
functionality you are looking for.
I haven't tested it though.
BR
On 5 June 2015 at 01:35, Olivier Girardot wrote:
> You can use it as a broadcast variable, but if it's "too" lar
You can use it as a broadcast variable, but if it's "too" large (more than
1Gb I guess), you may need to share it joining this using some kind of key
to the other RDDs.
But this is the kind of thing broadcast variables were designed for.
Regards,
Olivier.
Le jeu. 4 juin 2015 à 23:50, dgoldenberg
Is the dictionary read-only?
Did you look at
http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables ?
-Original Message-
From: dgoldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Thursday, June 04, 2015 4:50 PM
To: user@spark.apache.org
Subject: How to share larg
We have some pipelines defined where sometimes we need to load potentially
large resources such as dictionaries.
What would be the best strategy for sharing such resources among the
transformations/actions within a consumer? Can they be shared somehow
across the RDD's?
I'm looking for a way to l
Hi all,
I'm trying to run the standalone application SimpleApp.scala following the
instructions on the
http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
I was able to create a .jar file by doing sbt package. However when I tried
to do
$ YOUR_SPARK_HOME/bin/spark-submit
Hi Iulian,
On 26 May 2015, at 13:04, Iulian Dragoș wrote:
>
> On Tue, May 26, 2015 at 10:09 AM, algermissen1971
> wrote:
> Hi,
>
> I am setting up a project that requires Kafka support and I wonder what the
> roadmap is for Scala 2.11 Support (including Kafka).
>
> Can we expect to see 2.1
Hey DB,
Thanks for the reply!
I still don't think this answers my question. For example, if I have a
top() action being executed and I have 32 workers(32 partitions), and I
choose a depth of 4, what does the overlay of intermediate reducers look
like? How many reducers are there excluding the mas
Hi Doug,
sqlContext.table does not officially support database name. It only
supports table name as the parameter. We will add a method to support
database name in future.
Thanks,
Yin
On Thu, Jun 4, 2015 at 8:10 AM, Doug Balog wrote:
> Hi Yin,
> I’m very surprised to hear that its not suppor
Hi Brandon, they are available, but private to ml package. They are now
public in 1.4. For 1.3.1 you can define your transformer in
org.apache.spark.ml package - then you could use these traits.
Thanks,
Peter Rudenko
On 2015-06-04 20:28, Brandon Plaster wrote:
Is "HasInputCol" and "HasOutputCo
Got it. Ignore my similar question on Github comments.
On Thu, Jun 4, 2015 at 11:48 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
> Yeah - We don't have support for running UDFs on DataFrames yet. There is
> an open issue to track this
> https://issues.apache.org/jira/browse/SPAR
Yeah - We don't have support for running UDFs on DataFrames yet. There is
an open issue to track this https://issues.apache.org/jira/browse/SPARK-6817
Thanks
Shivaram
On Thu, Jun 4, 2015 at 3:10 AM, Daniel Emaasit
wrote:
> Hello Shivaram,
> Was the includePackage() function deprecated in SparkR
Hi - I'm having similar problem with switching from ephemeral to persistent
HDFS - it always looks for 9000 port regardless of options I set for 9010
persistent HDFS. Have you figured out a solution? Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Req
I think if you create a bidirectional mapping from AnalyticsEvent to
another type that would wrap it and use the nonce as its equality, you
could then do something like reduceByKey to group by nonce and map back to
AnalyticsEvent after.
On Thu, Jun 4, 2015 at 1:10 PM, lbierman wrote:
> I'm still
By default, the depth of the tree is 2. Each partition will be one node.
Sincerely,
DB Tsai
---
Blog: https://www.dbtsai.com
On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar wrote:
> Hey Reza,
>
> Thanks for your response!
>
> Your response cl
Additionally, I think this document (
https://spark.apache.org/docs/latest/building-spark.html ) should mention
that the protobuf.version might need to be changed to match the one used in
the chosen hadoop version. For instance, with hadoop 2.7.0 I had to change
protobuf.version to 1.5.0 to be able
That might work, but there might also be other steps that are required.
-Sandy
On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa wrote:
> Thanks! It is working fine now with spark-submit. Just out of curiosity,
> how would you use org.apache.spark.deploy.yarn.Client? Adding that
> spark_yarn jar to
Thanks! It is working fine now with spark-submit. Just out of curiosity,
how would you use org.apache.spark.deploy.yarn.Client? Adding that
spark_yarn jar to the configuration inside the application?
On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov wrote:
> You should run it with spark-submit or u
Hey Reza,
Thanks for your response!
Your response clarifies some of my initial thoughts. However, what I don't
understand is how the depth of the tree is used to identify how many
intermediate reducers there will be, and how many partitions are sent to
the intermediate reducers. Could you provide
Deenar,
Thanks for the suggestion.
That is one of the ideas that I have, but didn’t get chance to try it out yet.
One of the things that could potentially cause problems is that we use wide
rows. In addition, the schema is dynamic, with new columns getting added on a
regular basis. That is why I
The issue was that SSH key generated on Spark Master was not transferred to
this new slave. Spark-ec2 script with `start` command omits this step. The
solution is to use `launch` command with `--resume` options. Then the SSH
key is transferred to the new slave and everything goes smooth.
--
View
spark-submit is the recommended way of launching Spark applications on
YARN, because it takes care of submitting the right jars as well as setting
up the classpath and environment variables appropriately.
-Sandy
On Thu, Jun 4, 2015 at 10:30 AM, Saiph Kappa wrote:
> No, I am not. I run it with s
Is "HasInputCol" and "HasOutputCol" available in 1.3.1? I'm getting the
following message when I'm trying to implement a Transformer and importing
org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}:
error: object shared is not a member of package org.apache.spark.ml.param
and
error: tr
No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it
was the same thing since I am passing all the configurations through the
application code. Is that the problem?
On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza wrote:
> Hi Saiph,
>
> Are you launching using spark-submit?
>
> -S
In a regular reduce, all partitions have to send their reduced value to a
single machine, and that machine can become a bottleneck.
In a treeReduce, the partitions talk to each other in a logarithmic number
of rounds. Imagine a binary tree that has all the partitions at its leaves
and the root wil
Hi Saiph,
Are you launching using spark-submit?
-Sandy
On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa wrote:
> Hi,
>
> I've been running my spark streaming application in standalone mode
> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
> but I am having some problems
Hi,
I've been running my spark streaming application in standalone mode without
any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0) but I
am having some problems.
Here are the config parameters of my application:
«
val sparkConf = new SparkConf()
sparkConf.setMaster("yarn-client"
The closest algorithm to decision lists that we have is decision trees
https://spark.apache.org/docs/latest/mllib-decision-tree.html
On Thu, Jun 4, 2015 at 2:14 AM, Sateesh Kavuri
wrote:
> Hi,
>
> I have used weka machine learning library for generating a model for my
> training set. I have used
vm.swappiness=0? Some vendors recommend this set to 0 (zero), although I've
seen this causes even kernel to fail to allocate memory.
It may cause node reboot. If that's the case, set vm.swappiness to 5-10 and
decrease spark.*.memory. Your spark.driver.memory+
spark.executor.memory + OS + etc >> amo
I'm still a bit new to Spark and am struggilng to figure out the best way to
Dedupe my events.
I load my Avro files from HDFS and then I want to dedupe events that have
the same nonce.
For example my code so far:
JavaRDD events = ((JavaRDD>)
context.newAPIHadoopRDD(
context.hadoopC
take(5) will only evaluate enough partitions to provide 5 elements
(sometimes a few more but you get the idea), so it won't trigger a full
evaluation of all partitions unlike count().
On Thursday, June 4, 2015, Piero Cinquegrana
wrote:
> Hi DB,
>
>
>
> Yes I am running count() operations on the
Hi DB,
Yes I am running count() operations on the previous steps and it appears that
something is slow prior to the scaler. I thought that running take(5) and print
the results would execute the command at each step and materialize the RDD, but
is that not the case? That’s how I was testing eac
Hi all!,
I have a .txt file where each row of it it¹s a collection of terms of a
document separated by space. For example:
1 "Hola spark²
2 ..
I followed this example of spark site
https://spark.apache.org/docs/latest/mllib-feature-extraction.html and i get
something like this:
tfidf.first()
or
I talked to Don outside the list and he says that he's seeing this issue
with Apache Spark 1.3 too (not just CDH Spark), so it seems like there is a
real issue here.
On Wed, Jun 3, 2015 at 1:39 PM, Don Drake wrote:
> As part of upgrading a cluster from CDH 5.3.x to CDH 5.4.x I noticed that
> Spa
It is possible to start multiple concurrent drivers, Spark dynamically
allocates ports per "spark application" on driver, master, and workers from
a port range. When you collect results back to the driver, they do not go
through the master. The master is mostly there as a coordinator between the
dr
I would try setting PYSPARK_DRIVER_PYTHON environment variable to the
location of your python binary, especially if you are using a virtual
environment.
-Don
On Wed, Jun 3, 2015 at 8:24 PM, AlexG wrote:
> I have libskylark installed on both machines in my two node cluster in the
> same location
Thanks for the confirmation! We're quite new to Spark, so a little
reassurance is a good thing to have sometimes :-)
The thing that's concerning me at the moment is that my job doesn't seem to
run any faster with more compute resources added to the cluster, and this
is proving a little tricky to d
Hi,
as far as I understand you shouldn't send data to driver. Suppose you have
file in hdfs/s3 or cassandra partitioning, you should create your job such
that every executor/worker of spark will handle part of your input,
transform, filter it and at the end write back to cassandra as output(once
ag
Are you using RC4?
On Wed, Jun 3, 2015 at 10:58 PM, Night Wolf wrote:
> Thanks Yin, that seems to work with the Shell. But on a compiled
> application with Spark-submit it still fails with the same exception.
>
> On Thu, Jun 4, 2015 at 2:46 PM, Yin Huai wrote:
>
>> Can you put the following set
Hi all,
I'm running Spark on AWS EMR and I'm having some issues getting the correct
permissions on the output files using
rdd.saveAsTextFile(''). In hive, I would add a line in the
beginning of the script with
set fs.s3.canned.acl=BucketOwnerFullControl
and that would set the correct grantees f
Hi Yin,
I’m very surprised to hear that its not supported in 1.3 because I’ve been
using it since 1.3.0.
It worked great up until SPARK-6908 was merged into master.
What is the supported way to get DF for a table that is not in the default
database ?
IMHO, If you are not going to support “da
I have a spark app that reads avro & sequence file data and performs join,
reduceByKey
Results:
Command for all runs:
./bin/spark-submit -v --master yarn-cluster --driver-class-path
/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/ha
Hi all,
I am new to spark. I am trying to deploy HDFS (hadoop-2.6.0) and Spark-1.3.1
with four nodes, and each node has 8-cores and 8GB memory.
One is configured as headnode running masters, and 3 others are workers
But when I try to run the Pagerank from HiBench, it always cause a node to
reb
Mohammed
Have you tried registering your Cassandra tables in Hive/Spark SQL using
the data frames API. These should be then available to query via the Spark
SQL/Thrift JDBC Server.
Deenar
On 1 June 2015 at 19:33, Mohammed Guller wrote:
> Nobody using Spark SQL JDBC/Thrift server with DSE Cass
Hello,
I am relatively new to spark and I am currently trying to understand how to
scale large numbers of jobs with spark.
I understand that spark architecture is split in "Driver", "Master" and
"Workers". Master has a standby node in case of failure and workers can scale
out.
All the examples I
Hi,
I encountered a performance issue when join 3 tables in sparkSQL.
Here is the query:
SELECT g.period, c.categoryName, z.regionName, action, list_id, cnt
FROM t_category c, t_zipcode z, click_meter_site_grouped g
WHERE c.refCategoryID = g.category AND z.regionCode = g.region
I need to pay a
Hi
2015-06-04 15:29 GMT+02:00 James Aley :
> Hi,
>
> We have a load of Avro data coming into our data systems in the form of
> relatively small files, which we're merging into larger Parquet files with
> Spark. I've been following the docs and the approach I'm taking seemed
> fairly obvious, and
So a few updates. When I run local as stated before, it works fine. When I
run in Yarn (via Apache Myriad on Mesos) it also runs fine. The only issue
is specifically with Mesos. I wonder if there is some sort of class path
goodness I need to fix or something along that lines. Any tips would be
ap
For your first question, please take a look at HADOOP-9922.
The fix is in hadoop-common module.
Cheers
On Thu, Jun 4, 2015 at 2:53 AM, Jean-Charles RISCH <
risch.jeanchar...@gmail.com> wrote:
> Hello,
> *(Before everything : I use IntellijIdea 14.0.1, SBT and Scala 2.11.6)*
>
> This morning, I w
direct stream isn't a receiver, it isn't required to cache data anywhere
unless you want it to.
If you want it, just call cache.
On Thu, Jun 4, 2015 at 8:20 AM, Dmitry Goldenberg
wrote:
> "set the storage policy for the DStream RDDs to MEMORY AND DISK" - it
> appears the storage level can be sp
Hi Holden, Olivier
>>So for column you need to pass in a Java function, I have some sample
code which does this but it does terrible things to access Spark internals.
I also need to call a Hive UDAF in a dataframe agg function. Are there any
examples of what Column expects?
Deenar
On 2 June 201
Hi,
We have a load of Avro data coming into our data systems in the form of
relatively small files, which we're merging into larger Parquet files with
Spark. I've been following the docs and the approach I'm taking seemed
fairly obvious, and pleasingly simple, but I'm wondering if perhaps it's
not
or you could
1) convert dataframe to RDD
2) use mapPartitions and zipWithIndex within each partition
3) convert RDD back to dataframe you will need to make sure you preserve
partitioning
Deenar
On 1 June 2015 at 02:23, ayan guha wrote:
> If you are on spark 1.3, use repartitionandSort followed
"set the storage policy for the DStream RDDs to MEMORY AND DISK" - it
appears the storage level can be specified in the createStream methods but
not createDirectStream...
On Thu, May 28, 2015 at 9:05 AM, Evo Eftimov wrote:
> You can also try Dynamic Resource Allocation
>
>
>
>
> https://spark.a
Hi Tariq
You need to handle the transaction semantics yourself. You could for
example save from the dataframe to a staging table and then write to the
final table using a single atomic "INSERT INTO finalTable from
stagingTable" call. Remember to clear the staging table first to recover
from previo
Shixiong,
Thanks, interesting point. So if we want to only process one batch then
terminate the consumer, what's the best way to achieve that? Presumably the
listener could set a flag on the driver notifying it that it can terminate.
But the driver is not in a loop, it's basically blocked in
await
It worked.
On Thu, Jun 4, 2015 at 5:14 PM, MEETHU MATHEW
wrote:
> Try using coalesce
>
> Thanks & Regards,
> Meethu M
>
>
>
> On Wednesday, 3 June 2015 11:26 AM, "ÐΞ€ρ@Ҝ (๏̯͡๏)"
> wrote:
>
>
> I am running a series of spark functions with 9000 executors and its
> resulting in 9000+ files that
Try using coalesce Thanks & Regards,
Meethu M
On Wednesday, 3 June 2015 11:26 AM, "ÐΞ€ρ@Ҝ (๏̯͡๏)"
wrote:
I am running a series of spark functions with 9000 executors and its resulting
in 9000+ files that is execeeding the namespace file count qutota.
How can Spark be configured to
Hi,
I am running my graphx application with Spark 1.3.1 on a small cluster. Then it
failed on this exception:
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 1
But actually I found it is caused by “ExecutorLostFailure” indeed, and someone
told it
Hi,
I've worked out how to use explode on my input avro dataset with the
following structure
root
|-- pageViewId: string (nullable = false)
|-- components: array (nullable = true)
||-- element: struct (containsNull = false)
|||-- name: string (nullable = false)
|||-- loadT
Hello Shivaram,
Was the includePackage() function deprecated in SparkR 1.4.0?
I don't see it in the documentation? If it was, does that mean that we can
use R packages on Spark DataFrames the usual way we do for local R
dataframes?
Daniel
--
Daniel Emaasit
Ph.D. Research Assistant
Transportation
Hello,
*(Before everything : I use IntellijIdea 14.0.1, SBT and Scala 2.11.6)*
This morning, I was looking to resolve the "Failed to locate the winutils
binary in the hadoop binary path" error.
I noticed that I can solve it configuring my build.sbt to
"...
libraryDependencies += "org.apache.ha
Hi Matei,
thank you for answering.
Accordingly to what you said, am I mistaken when I say that tuples with the
same key might eventually be spread across more than one node in case an
overloaded worker can no longer accept tuples?
In other words, suppose a worker (processing key K) cannot accept
Hi,
I have used weka machine learning library for generating a model for my
training set. I have used the PART algorithm (decision lists) from weka.
Now, I would like to use spark ML for the PART algo for my training set and
could not seem to find a parallel. Could anyone point out the correspond
That's because you need to add the master's public key (~/.ssh/id_rsa.pub)
to the newly added slaves ~/.ssh/authorized_keys.
I add slaves this way:
- Launch a new instance by clicking on the slave instance and choose *launch
more like this *
*- *Once its launched, ssh into it and add the master p
Replace this line:
img_data = sc.parallelize( list(im.getdata()) )
With:
img_data = sc.parallelize( list(im.getdata()), 3 * No cores you have )
Thanks
Best Regards
On Thu, Jun 4, 2015 at 1:57 AM, Justin Spargur wrote:
> Hi all,
>
> I'm playing around with manipulating images via Pyth
You should not call `jssc.stop(true);` in a StreamingListener. It will
cause a dead-lock: `jssc.stop` won't return until `listenerBus` exits. But
since `jssc.stop` blocks `StreamingListener`, `listenerBus` cannot exit.
Best Regards,
Shixiong Zhu
2015-06-04 0:39 GMT+08:00 dgoldenberg :
> Hi,
>
>
Hi,
I am using Hive 0.14 and spark 0.13. I got
java.lang.NullPointerException when inserted into hive. Any suggestions
please.
hiveContext.sql("INSERT OVERWRITE table 4dim partition (zone=" + ZONE +
",z=" + zz + ",year=" + YEAR + ",month=" + MONTH + ") " +
"select date, hh, x, y, hei
Hi,
I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this
setting can be seen in web ui's environment tab. But, it still eats memory,
i.e. -Xmx set to 512M but RES grows to 1.5G in half a day.
On Wed, Jun 3, 2015 at 12:02 PM, Shixiong Zhu wrote:
> Could you set "spark.shuffle.
Hi
Here's a working example: https://gist.github.com/akhld/b10dc491aad1a2007183
[image: Inline image 1]
Thanks
Best Regards
On Wed, Jun 3, 2015 at 10:09 PM, dgoldenberg
wrote:
> Hi,
>
> I've got a Spark Streaming driver job implemented and in it, I register a
> streaming listener, like so:
>
87 matches
Mail list logo