Is one Spark partition mapped to one and only Spark Task ?

2024-03-24 Thread Sreyan Chakravarty
I am trying to understand the Spark Architecture for my upcoming certification, however there seems to be conflicting information available. https://stackoverflow.com/questions/47782099/what-is-the-relationship-between-tasks-and-partitions Does Spark assign a Spark partition to only a single

Spark partition formula on standalone mode?

2016-06-27 Thread kali.tumm...@gmail.com
60.n3.nabble.com/Spark-partition-formula-on-standalone-mode-tp27237.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: Spark Partition by Columns doesn't work properly

2016-06-09 Thread Chanh Le
Ok, thanks. On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur wrote: > The github repo is https://github.com/datastax/spark-cassandra-connector > > The talk video and slides should be uploaded soon on spark summit website > > > On Wednesday, June 8, 2016, Chanh Le wrote: > >> Thanks, I'll look into it

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Jasleen Kaur
The github repo is https://github.com/datastax/spark-cassandra-connector The talk video and slides should be uploaded soon on spark summit website On Wednesday, June 8, 2016, Chanh Le wrote: > Thanks, I'll look into it. Any luck to get link related to. > > On Thu, Jun 9, 2016, 12:43 PM Jasleen

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
Thanks, I'll look into it. Any luck to get link related to. On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur wrote: > Try using the datastax package. There was a great talk on spark summit > about it. It will take care of the boiler plate code and you can focus on > real business value > > On Wednesda

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Jasleen Kaur
Try using the datastax package. There was a great talk on spark summit about it. It will take care of the boiler plate code and you can focus on real business value On Wednesday, June 8, 2016, Chanh Le wrote: > Hi everyone, > I tested the partition by columns of data frame but it’s not good I me

Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
Hi everyone, I tested the partition by columns of data frame but it’s not good I mean wrong. I am using Spark 1.6.1 load data from Cassandra. I repartition by 2 field date, network_id - 200 partitions I reparation by 1 field date - 200 partitions. but my data is data of 90 days -> I mean if we repa

[Problem Solved]Re: Spark partition size tuning

2016-01-27 Thread Jia Zou
yon (default is 512MB), but above method can't work >>> for Tachyon data. >>> >>> Do you have any suggestions? Thanks very much! >>> >>> Best Regards, >>> Jia >>> >>> >>> -- Forwarded message -- >>> F

Re: Spark partition size tuning

2016-01-27 Thread Jia Zou
gestions? Thanks very much! >> >> Best Regards, >> Jia >> >> >> -- Forwarded message -- >> From: Jia Zou >> Date: Thu, Jan 21, 2016 at 10:05 PM >> Subject: Spark partition size tuning >> To: "user @spark" >

Re: Spark partition size tuning

2016-01-26 Thread Gene Pang
a partition size for input data that > is stored in Tachyon (default is 512MB), but above method can't work for > Tachyon data. > > Do you have any suggestions? Thanks very much! > > Best Regards, > Jia > > > -- Forwarded message -- > From: Jia Zou

Re: Spark partition size tuning

2016-01-26 Thread Pavel Plotnikov
y much! > > Best Regards, > Jia > > > -- Forwarded message ------ > From: Jia Zou > Date: Thu, Jan 21, 2016 at 10:05 PM > Subject: Spark partition size tuning > To: "user @spark" > > > Dear all! > > When using Spark to read from local

Fwd: Spark partition size tuning

2016-01-25 Thread Jia Zou
thod can't work for Tachyon data. Do you have any suggestions? Thanks very much! Best Regards, Jia -- Forwarded message -- From: Jia Zou Date: Thu, Jan 21, 2016 at 10:05 PM Subject: Spark partition size tuning To: "user @spark" Dear all! When using Spark to re

Spark partition size tuning

2016-01-21 Thread Jia Zou
Dear all! When using Spark to read from local file system, the default partition size is 32MB, how can I increase the partition size to 128MB, to reduce the number of tasks? Thank you very much! Best Regards, Jia

Re: Pass spark partition explicitly ?

2015-10-18 Thread sri hari kali charan Tummala
om" > wrote: > >> Hi All, >> >> can I pass number of partitions to all the RDD explicitly while submitting >> the spark Job or di=o I need to mention in my spark code itself ? >> >> Thanks >> Sri >> >> >> >> -- >> View this

Re: Pass spark partition explicitly ?

2015-10-18 Thread Richard Eggert
t: > http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail:

Pass spark partition explicitly ?

2015-10-18 Thread kali.tumm...@gmail.com
Hi All, can I pass number of partitions to all the RDD explicitly while submitting the spark Job or di=o I need to mention in my spark code itself ? Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html

Re: Kafka Spark Partition Mapping

2015-08-24 Thread Syed, Nehal (Contractor)
Koeninger mailto:c...@koeninger.org>> Date: Thursday, August 20, 2015 at 6:33 PM To: Microsoft Office User mailto:nehal_s...@cable.comcast.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Kafka Spark Partit

Re: Kafka Spark Partition Mapping

2015-08-24 Thread Cody Koeninger
stays where JdbcRDD lives? > > Nehal > > From: Cody Koeninger > Date: Thursday, August 20, 2015 at 6:33 PM > To: Microsoft Office User > Cc: "user@spark.apache.org" > Subject: Re: Kafka Spark Partition Mapping > > In general you cannot guarantee which no

Re: Kafka Spark Partition Mapping

2015-08-20 Thread Cody Koeninger
fka partition always > land on same machine on Spark rdd so I can cache some decoration data > locally and later reuse with other messages (that belong to same key). Can > anyone tell me how can I achieve it? Thanks > -- > View this message in context: Ka

Kafka Spark Partition Mapping

2015-08-20 Thread nehalsyed
: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Partition-Mapping-tp24372.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark partition issue with Stanford NLP

2015-05-27 Thread mathewvinoj
rndNLP class is not searializable so i cannot broadcast.any thought suggestion The reason we need to scale to 200 partition is it will run quickly with lesser time to process this data. Any thoughts suggestion is relly helpful

Re: Spark partition issue with Stanford NLP

2015-05-27 Thread vishalvibhandik
che-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-tp23048p23055.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: How to make spark partition sticky, i.e. stay with node?

2015-01-23 Thread Tathagata Das
t; I can make my auxiliary data a RDD. Partition it and cache it. > Later, I can cogroup it with other RDDs and Spark will try to keep the > cached RDD partitions where they are and not shuffle them. > > > > -- > View this message in context: > http://apache-spark-user-list

Re: How to make spark partition sticky, i.e. stay with node?

2015-01-23 Thread mingyu
.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21338.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread mingyu
Also, Setting spark.locality.wait=100 did not work for me. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21325.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread davidkl
+1 :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21323.html Sent from the Apache Spark User List mailing list archive at Nabble.com

How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread mingyu
I posted an question on stackoverflow and haven't gotten any answer yet. http://stackoverflow.com/questions/28079037/how-to-make-spark-partition-sticky-i-e-stay-with-node Is there a way to make a partition stay with a node in Spark Streaming? I need these since I have to load large a

Re: Spark partition

2014-07-30 Thread Haiyang Fu
Hi, you may referer this http://spark.apache.org/docs/latest/tuning.html#level-of-parallelism and http://spark.apache.org/docs/latest/programming-guide.html#parallelized-collections ,both of which are about the RDD partitions.As you are going to load data from hdfs, so you maybe also need to know h

Spark partition

2014-07-30 Thread Sameer Tilak
Hi All, >From the documention RDDs are already partitioned distributed. However, there >is a way to repartition a given RDD using the following function. Can someone >please point out the best practices for using this. I have a 10 GB TSV file >stored in HDFS and I have a 4 node cluster with 1 ma