from:"Guillaume Pitel"

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-23 Thread Guillaume Pitel

Hi, So I've done this "Node-centered accumulator", I've written a small piece about it : http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/ Hope it can help someone Guillaume 2015-06-18 15:17 GMT+02:00 Guillaume Pitel <mailto:gui

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel

u loose an executor in between, then that doesn't work anymore, probably you could detect it and recompute the sketches, but it would become over complicated. 2015-06-18 14:27 GMT+02:00 Guillaume Pitel <mailto:guillaume.pi...@exensa.com>>: Hi, Thank you for this confirm

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel

GMT+02:00 Guillaume Pitel <mailto:guillaume.pi...@exensa.com>>: Hi, I'm trying to figure out the smartest way to implement a global count-min-sketch on accumulators. For now, we are doing that with RDDs. It works well, but with one sketch per partition, merging tak

Re: Best way to randomly distribute elements

2015-06-18 Thread Guillaume Pitel

bble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE

Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel

mulator is initialized locally, updated, then sent back to the driver for merging ? So I guess, accumulators may not be the way to go, finally. Any advice ? Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer -

Re: Random pairs / RDD order

2015-04-16 Thread Guillaume Pitel

distribution due to collisions, but I don't think it should hurt too much. Guillaume Hi everyone, However I am not happy with this solution because each element is most likely to be paired with elements that are "closeby" in the partition. This is because sample returns an &qu

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Guillaume Pitel

comments: // SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the // application finishes. On 13.04.2015, at 11:26, Guillaume Pitel <mailto:guillaume.pi...@exensa.com>> wrote: Does it also cleanup spark local dirs ? I thought it was only cleaning $SPARK_HOME/work/

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel

imply received a SIGTERM signal, so perhaps the daemon was terminated by someone or a parent process. Just my guess. Tim On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel mailto:guillaume.pi...@exensa.com>> wrote: Very likely to be this : http://www.linuxdevcenter.com/pub/a/linux/2

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel

XXX08:7077 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM* -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-13 Thread Guillaume Pitel

sk spacein this folder once the shuffle operation is done? If not, I need to write a job to clean it up myself. But how do I know which sub folders there can be removed? Ningjun -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41,

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-10 Thread Guillaume Pitel

here can be removed? Ningjun -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Pairwise computations within partition

2015-04-09 Thread Guillaume Pitel

-- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Join on Spark too slow.

2015-04-09 Thread Guillaume Pitel

amp; k != *"to" *&& k != *"and" *}.cache() *val *res = big.leftOuterJoin(small) res.saveAsTextFile(args(3)) } -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Kryo exception : Encountered unregistered class ID: 13994

2015-04-09 Thread Guillaume Pitel

l Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- eXenSa *Guillaume PITEL,

Re: Incremently load big RDD file into Memory

2015-04-08 Thread Guillaume Pitel

list thats why i am sending you email here. I would be really grateful to you if you reply it. Thanks, On Wed, Apr 8, 2015 at 1:23 PM, Guillaume Pitel mailto:guillaume.pi...@exensa.com>> wrote: This kind of operation is not scalable, not matter what you do, at least if you _really_ w

Maven profile in MLLib netlib-lgpl not working (1.1.1)

2014-12-10 Thread Guillaume Pitel

g> For additional commands, e-mail: user-h...@spark.apache.org <mailto:user-h...@spark.apache.org> -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Mllib native netlib-java/OpenBLAS

2014-12-10 Thread Guillaume Pitel

nloading it and running: mvn -Pnetlib-lgpl -DskipTests clean package -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: What does KryoException: java.lang.NegativeArraySizeException mean?

2014-10-20 Thread Guillaume Pitel

llelism, make sure that your combineByKey has enough different keys, and see what happens. Guillaume Thank you, Guillaume, my dataset is not that large, it's totally ~2GB 2014-10-20 16:58 GMT+08:00 Guillaume Pitel <mailto:guillaume.pi...@exensa.com>>: Hi, It happened t

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-10-20 Thread Guillaume Pitel

:140 Arian -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Delayed hotspot optimizations in Spark

2014-10-10 Thread Guillaume Pitel

Hi Could it be due to GC ? I read it may happen if your program starts with a small heap. What are your -Xms and -Xmx values ? Print GC stats with -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps Guillaume Hello spark users and developers! I am using hdfs + spark sql + hive schema + p

Problem with very slow behaviour of TorrentBroadcast vs. HttpBroadcast

2014-10-01 Thread Guillaume Pitel

ect a configuration error from our side, but are unable to pin it down. Does someone have any idea of the origin of the problem ? For now we're sticking with the HttpBroadcast workaround. Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.

Re: Kyro deserialisation error

2014-07-24 Thread Guillaume Pitel

4¸C4P4ڻ _o4lbʂԛ4각 4^x4ڻ Clearly a stream corruption problem. We've been running fine (afaik) on 1.0.0 for two weeks, switch to 1.0.1 this Monday, and since, this kind of problem randomly occur. Guillaume Pitel Not sure if this helps, but it does seem to

Re: Huge matrix

2014-04-13 Thread Guillaume Pitel

-- Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S

Re: Huge matrix

2014-04-12 Thread Guillaume Pitel

Thanks. -- Guillaume PITEL, Président +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53 eXenSa

Spark powered wikipedia analysis and exploration

2014-03-27 Thread Guillaume Pitel

kipedia in your spare time :) Guillaume -- Guillaume PITEL, Président +33(0)6 2

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Guillaume Pitel (eXenSa)

Maybe with "MEMORY_ONLY", spark has to recompute the RDD several times because they don't fit in memory. It makes things run slower. As a general safe rule, use MEMORY_AND_DISK_SER Guillaume Pitel - Président d'eXenSa Prashant Sharma a écrit : >I think Mahout use

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel

? in SPARK_JAVA_OPTS during SparkContext creation ? It should probably be passed in the spark-env.sh because it can differ on each node Guillaume On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel <guillaume.pi...@exensa.com>

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel

the application. Thanks! -- Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel

application. Thanks! -- Guillaume PITEL, Président +33(0)6 25 48 86 80

Re: Is it common in spark to broadcast a 10 gb variable?

2014-03-12 Thread Guillaume Pitel

s other than the physical memory? Do they stop working when the array elements count exceeds a certain number? --

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

Re: Best way to randomly distribute elements

Accumulators / Accumulables : thread-local, task-local, executor-local ?

Re: Random pairs / RDD order

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

Re: Pairwise computations within partition

Re: Join on Spark too slow.

Re: Kryo exception : Encountered unregistered class ID: 13994

Re: Incremently load big RDD file into Memory

Maven profile in MLLib netlib-lgpl not working (1.1.1)

Re: Mllib native netlib-java/OpenBLAS

Re: What does KryoException: java.lang.NegativeArraySizeException mean?

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Re: Delayed hotspot optimizations in Spark

Problem with very slow behaviour of TorrentBroadcast vs. HttpBroadcast

Re: Kyro deserialisation error

Re: Huge matrix

Re: Huge matrix

Spark powered wikipedia analysis and exploration

Re: K-means faster on Mahout then on Spark

Re: Spark temp dir (spark.local.dir)

Re: Spark temp dir (spark.local.dir)

Re: Spark temp dir (spark.local.dir)

Re: Is it common in spark to broadcast a 10 gb variable?

30 matches

Site Navigation

Mail list logo

Footer information