I am trying to understand the Spark Architecture for my upcoming
certification, however there seems to be conflicting information available.
https://stackoverflow.com/questions/47782099/what-is-the-relationship-between-tasks-and-partitions
Does Spark assign a Spark partition to only a single
60.n3.nabble.com/Spark-partition-formula-on-standalone-mode-tp27237.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
Ok, thanks.
On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur
wrote:
> The github repo is https://github.com/datastax/spark-cassandra-connector
>
> The talk video and slides should be uploaded soon on spark summit website
>
>
> On Wednesday, June 8, 2016, Chanh Le wrote:
>
>> Thanks, I'll look into it
The github repo is https://github.com/datastax/spark-cassandra-connector
The talk video and slides should be uploaded soon on spark summit website
On Wednesday, June 8, 2016, Chanh Le wrote:
> Thanks, I'll look into it. Any luck to get link related to.
>
> On Thu, Jun 9, 2016, 12:43 PM Jasleen
Thanks, I'll look into it. Any luck to get link related to.
On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur
wrote:
> Try using the datastax package. There was a great talk on spark summit
> about it. It will take care of the boiler plate code and you can focus on
> real business value
>
> On Wednesda
Try using the datastax package. There was a great talk on spark summit
about it. It will take care of the boiler plate code and you can focus on
real business value
On Wednesday, June 8, 2016, Chanh Le wrote:
> Hi everyone,
> I tested the partition by columns of data frame but it’s not good I me
Hi everyone,
I tested the partition by columns of data frame but it’s not good I mean wrong.
I am using Spark 1.6.1 load data from Cassandra.
I repartition by 2 field date, network_id - 200 partitions
I reparation by 1 field date - 200 partitions.
but my data is data of 90 days -> I mean if we repa
yon (default is 512MB), but above method can't work
>>> for Tachyon data.
>>>
>>> Do you have any suggestions? Thanks very much!
>>>
>>> Best Regards,
>>> Jia
>>>
>>>
>>> -- Forwarded message --
>>> F
gestions? Thanks very much!
>>
>> Best Regards,
>> Jia
>>
>>
>> -- Forwarded message --
>> From: Jia Zou
>> Date: Thu, Jan 21, 2016 at 10:05 PM
>> Subject: Spark partition size tuning
>> To: "user @spark"
>
a partition size for input data that
> is stored in Tachyon (default is 512MB), but above method can't work for
> Tachyon data.
>
> Do you have any suggestions? Thanks very much!
>
> Best Regards,
> Jia
>
>
> -- Forwarded message --
> From: Jia Zou
y much!
>
> Best Regards,
> Jia
>
>
> -- Forwarded message ------
> From: Jia Zou
> Date: Thu, Jan 21, 2016 at 10:05 PM
> Subject: Spark partition size tuning
> To: "user @spark"
>
>
> Dear all!
>
> When using Spark to read from local
thod can't work for
Tachyon data.
Do you have any suggestions? Thanks very much!
Best Regards,
Jia
-- Forwarded message --
From: Jia Zou
Date: Thu, Jan 21, 2016 at 10:05 PM
Subject: Spark partition size tuning
To: "user @spark"
Dear all!
When using Spark to re
Dear all!
When using Spark to read from local file system, the default partition size
is 32MB, how can I increase the partition size to 128MB, to reduce the
number of tasks?
Thank you very much!
Best Regards,
Jia
om"
> wrote:
>
>> Hi All,
>>
>> can I pass number of partitions to all the RDD explicitly while submitting
>> the spark Job or di=o I need to mention in my spark code itself ?
>>
>> Thanks
>> Sri
>>
>>
>>
>> --
>> View this
t:
> http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail:
Hi All,
can I pass number of partitions to all the RDD explicitly while submitting
the spark Job or di=o I need to mention in my spark code itself ?
Thanks
Sri
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
Koeninger mailto:c...@koeninger.org>>
Date: Thursday, August 20, 2015 at 6:33 PM
To: Microsoft Office User
mailto:nehal_s...@cable.comcast.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: Re: Kafka Spark Partit
stays where JdbcRDD lives?
>
> Nehal
>
> From: Cody Koeninger
> Date: Thursday, August 20, 2015 at 6:33 PM
> To: Microsoft Office User
> Cc: "user@spark.apache.org"
> Subject: Re: Kafka Spark Partition Mapping
>
> In general you cannot guarantee which no
fka partition always
> land on same machine on Spark rdd so I can cache some decoration data
> locally and later reuse with other messages (that belong to same key). Can
> anyone tell me how can I achieve it? Thanks
> --
> View this message in context: Ka
:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Partition-Mapping-tp24372.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
rndNLP class is not searializable so i cannot
broadcast.any thought suggestion
The reason we need to scale to 200 partition is it will run quickly with
lesser time to process this data. Any thoughts suggestion is relly helpful
che-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-tp23048p23055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
t; I can make my auxiliary data a RDD. Partition it and cache it.
> Later, I can cogroup it with other RDDs and Spark will try to keep the
> cached RDD partitions where they are and not shuffle them.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list
.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
Also, Setting spark.locality.wait=100 did not work for me.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
+1 :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I posted an question on stackoverflow and haven't gotten any answer yet.
http://stackoverflow.com/questions/28079037/how-to-make-spark-partition-sticky-i-e-stay-with-node
Is there a way to make a partition stay with a node in Spark Streaming? I
need these since I have to load large a
Hi,
you may referer this
http://spark.apache.org/docs/latest/tuning.html#level-of-parallelism
and
http://spark.apache.org/docs/latest/programming-guide.html#parallelized-collections
,both of which are about the RDD partitions.As you are going to load data
from hdfs, so you maybe also need to know
h
Hi All,
>From the documention RDDs are already partitioned distributed. However, there
>is a way to repartition a given RDD using the following function. Can someone
>please point out the best practices for using this. I have a 10 GB TSV file
>stored in HDFS and I have a 4 node cluster with 1 ma
29 matches
Mail list logo