Hi there,
I would recommend checking out
https://github.com/spark-jobserver/spark-jobserver which I think gives the
functionality you are looking for.
I haven't tested it though.
BR
On 5 June 2015 at 01:35, Olivier Girardot wrote:
> You can use it as a broadcast variable, but if it's "too" lar
Hi there,
I have been using the approach described here:
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
In addition to that, I was wondering if there is a way to set the customize
the order of those values contained in each file.
Thanks a lot!
t all values for specific key will
> reside in just one partition, but it might happen that one partition might
> contain more, than one key (with values). This I’m not sure, but that
> shouldn’t be a big deal as you would iterate over tuple Iterable> and store one key to a specific file.
&
Hi Mike,
I am forwarding you a mail I sent a while ago regarding some related work I
did, hope you find it useful
Hi all,
I recently sent to the dev mailing list about this contribution, but I
thought it might be useful to post it here, since I have seen a lot of
people asking about OS-level met
Hi there,
I have been getting a strange error in spark-1.6.1
The job submitted uses only the executor launched on the Master node while
the other workers are idle.
When I check the errors from the web ui to investigate on the killed
executors I see the error:
Error: invalid log directory
/disk/spa
Hi there,
I have been using Spark 1.5.2 on my cluster without a problem and wanted to
try Spark 1.6.0.
I have the exact same configuration on both clusters.
I am able to start the Standalone Cluster but I fail to submit a job
getting errors like the following:
16/01/05 14:24:14 INFO AppClient$Cli
un Java 8?
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
&g
afe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Tue, Jan 5, 2016 at 9:01 AM, Yiannis Gkoufas
> wrote:
>
>> Hi Dean,
>>
>> thanks so much for the response! It works without a problem now!
>>
>&g
Hi,
I am exploring a bit the dynamic resource allocation provided by the
Standalone Cluster Mode and I was wondering whether this behavior I am
experiencing is expected.
In my configuration I have 3 slaves with 24 cores each.
I have in my spark-defaults.conf:
spark.shuffle.service.enabled true
sp
Hi Matt,
there is some related work I recently did in IBM Research for visualizing
the metrics produced.
You can read about it here
http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/
We recently opensourced it if you are interested to ha
Hi all,
I recently sent to the dev mailing list about this contribution, but I
thought it might be useful to post it here, since I have seen a lot of
people asking about OS-level metrics of Spark. This is the result of the
work we have been doing recently in IBM Research around Spark.
Essentially,
pre-deployed on all Executor nodes?
>
> On Wed, Feb 3, 2016 at 10:36 AM, Yiannis Gkoufas
> wrote:
>
>> Hi Matt,
>>
>> there is some related work I recently did in IBM Research for visualizing
>> the metrics produced.
>> You can read about it here
>>
Hi there,
I have my data stored in HDFS partitioned by month in Parquet format.
The directory looks like this:
-month=201411
-month=201412
-month=201501
-
I want to compute some aggregates for every timestamp.
How is it possible to achieve that by taking advantage of the existing
partitionin
ld be to try to split RDD into multiple RDDs by key.
> You can get distinct keys, collect them on driver and have a loop over they
> keys and filter out new RDD out of the original one by that key.
>
> for( key : keys ) {
> RDD.filter( key ).saveAsTextfile()
> }
>
>
Hi there,
I try to increase the number of executors per worker in the standalone mode
and I have failed to achieve that.
I followed a bit the instructions of this thread:
http://stackoverflow.com/questions/26645293/spark-configuration-memory-instance-cores
and did that:
spark.executor.memory 1g
S
Spark on each worker node. Nothing to do with
> # of executors.
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Yiannis Gkoufas [mailto:johngou...@gmail.com]
> *Sent:* Friday, February 20, 2015 4:55 AM
> *To:* user@spark.apache.org
> *Subject:* Setting the numb
Hi,
I have experienced the same behavior. You are talking about standalone
cluster mode right?
BR
On 21 February 2015 at 14:37, Deep Pradhan
wrote:
> Hi,
> I have been running some jobs in my local single node stand alone cluster.
> I am varying the worker instances for the same job, and the t
Hi all,
I am trying to do the following.
val myObject = new MyObject();
val myObjectBroadcasted = sc.broadcast(myObject);
val rdd1 = sc.textFile("/file1").map(e =>
{
myObject.insert(e._1);
(e._1,1)
});
rdd.cache.count(); //to make sure it is transformed.
val rdd2 = sc.textFile("/file2").map(e
ng to modify myObjrct directly which won't work because you
> are modifying the serialized copy on the executor. You want to do
> myObjectBroadcasted.value.insert and myObjectBroadcasted.value.lookup.
>
>
>
> Sent with Good (www.good.com)
>
>
>
> -Original Me
Hi there,
I assume you are using spark 1.2.1 right?
I faced the exact same issue and switched to 1.1.1 with the same
configuration and it was solved.
On 24 Feb 2015 19:22, "Ted Yu" wrote:
> Here is a tool which may give you some clue:
> http://file-leak-detector.kohsuke.org/
>
> Cheers
>
> On Tu
e's a bug report for this regression? For some
> other (possibly connected) reason I upgraded from 1.1.1 to 1.2.1, but I
> can't remember what the bug was.
>
> Joe
>
>
>
>
> On 24 February 2015 at 19:26, Yiannis Gkoufas
> wrote:
>
>> Hi there,
>>
&
, Yiannis Gkoufas wrote:
> Sorry for the mistake, I actually have it this way:
>
> val myObject = new MyObject();
> val myObjectBroadcasted = sc.broadcast(myObject);
>
> val rdd1 = sc.textFile("/file1").map(e =>
> {
> myObjectBroadcasted.value.insert(e._1);
>
Hi all,
I have downloaded version 1.3.0-rc1 from
https://github.com/apache/spark/archive/v1.3.0-rc1.zip, extracted it and
built it using:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -DskipTests clean package
It doesn't complain for any issues, but when I call sbin/start-all.sh I get
on logs:
t really the error?
>
> java.lang.NoClassDefFoundError: L akka/event/LogSou
>
> Looks like an incomplete class name. Is something corrupt, maybe a config
> file?
>
> On Tue, Mar 3, 2015 at 2:13 AM, Yiannis Gkoufas
> wrote:
> > Hi all,
> >
> > I have downlo
Hi there,
I was trying the new DataFrame API with some basic operations on a parquet
dataset.
I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a
standalone cluster mode.
The code is the following:
val people = sqlContext.parquetFile("/data.parquet");
val res =
people.groupBy("na
//spark.apache.org/docs/latest/configuration.html
>
> Cheng
>
>
> On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
>
>> Hi there,
>>
>> I was trying the new DataFrame API with some basic operations on a
>> parquet dataset.
>> I have 7 nodes of 12 cores and 8GB RA
aggregation in the
> second stage. Can you take a look at the numbers of tasks launched in these
> two stages?
>
> Thanks,
>
> Yin
>
> On Wed, Mar 18, 2015 at 11:42 AM, Yiannis Gkoufas
> wrote:
>
>> Hi there, I set the executor memory to 8g but it didn't help
nd increase it until the OOM disappears. Hopefully this
> will help.
>
> Thanks,
>
> Yin
>
>
> On Wed, Mar 18, 2015 at 4:46 PM, Yiannis Gkoufas
> wrote:
>
>> Hi Yin,
>>
>> Thanks for your feedback. I have 1700 parquet files, sized 100MB each.
>> T
Hi all,
I have 6 nodes in the cluster and one of the nodes is clearly
under-performing:
I was wandering what is the impact of having such issues? Also what is the
recommended way to workaround it?
Thanks a lot,
Yiannis
number of tasks and again the same error. :(
Thanks a lot!
On 19 March 2015 at 17:40, Yiannis Gkoufas wrote:
> Hi Yin,
>
> thanks a lot for that! Will give it a shot and let you know.
>
> On 19 March 2015 at 16:30, Yin Huai wrote:
>
>> Was the OOM thrown during the exe
Actually I realized that the correct way is:
sqlContext.sql("set spark.sql.shuffle.partitions=1000")
but I am still experiencing the same behavior/error.
On 20 March 2015 at 16:04, Yiannis Gkoufas wrote:
> Hi Yin,
>
> the way I set the configuration is:
>
k applications to use on the machine,
> e.g. 1000m, 2g (default: total memory minus 1 GB); note that each
> application's individual memory is configured using its
> spark.executor.memory property.
>
>
> On Fri, Mar 20, 2015 at 9:25 AM, Yiannis Gkoufas
> wr
Hi all,
I am trying to start the thriftserver and I get some errors.
I have hive running and placed hive-site.xml under the conf directory.
>From the logs I can see that the error is:
Call From localhost to localhost:54310 failed
I am assuming that it tries to connect to the wrong port for the n
t 3:52 PM, Yiannis Gkoufas
> wrote:
>
>> Hi all,
>>
>> I am trying to start the thriftserver and I get some errors.
>> I have hive running and placed hive-site.xml under the conf directory.
>> From the logs I can see that the error is:
>>
>> Call Fro
34 matches
Mail list logo