It would be nice to have a "what's new in 2.0.0" equivalent to
https://spark.apache.org/releases/spark-release-1-6-0.html available or am
I just missing it?
On Wed, 8 Jun 2016 at 13:15 Sean Owen wrote:
> OK, this is done:
>
> http://spark.apache.org/documentation.html
> http://spark.apache.org/d
Frankly speaking, I think reduceByKey with Partitioner has the same problem too
and it should not be exposed to public user either. Because it is a little hard
to fully understand how the partitioner behaves without looking at the actual
code.
And if there exits a basic contract of a Partitio
Hi,
Pipeline as of now seems to be having a series of transformers and
estimators in a serial fashion.
Is it possible to create a DAG sort of thing -
Eg -
Two transformers running in parallel to cleanse data (a custom built
Transformer) in some way and then their outputs ( two outputs ) used for
s
reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result
Why reduceByKey with Partitioner exists then?
On Wed, Jun 8, 2016 at 9:22 PM, 汪洋 wrote:
> Hi Alexander,
>
> I think it does not guarantee to be right if an arbitrary Partitioner is
> passed in.
>
> I have created a note
The example violates the basic contract of a Partitioner.
It does make sense to take Partitioner as a param to distinct - though it
is fairly trivial to simulate that in user code as well ...
Regards
Mridul
On Wednesday, June 8, 2016, 汪洋 wrote:
> Hi Alexander,
>
> I think it does not guarantee
Hi Alexander,
I think it does not guarantee to be right if an arbitrary Partitioner is passed
in.
I have created a notebook and you can check it out.
(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366
most of the RDD methods which shuffle data take Partitioner as a parameter
But rdd.distinct does not have such signature
Should I open a PR for that?
/**
* Return a new RDD containing the distinct elements in this RDD.
*/
def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] =
null
Yes you can :)
On Wed, Jun 8, 2016 at 6:00 PM, Alexander Pivovarov
wrote:
> Can I just enable spark.kryo.registrationRequired and look at error
> messages to get unregistered classes?
>
> On Wed, Jun 8, 2016 at 5:52 PM, Reynold Xin wrote:
>
>> Due to type erasure they have no difference, altho
Can I just enable spark.kryo.registrationRequired and look at error
messages to get unregistered classes?
On Wed, Jun 8, 2016 at 5:52 PM, Reynold Xin wrote:
> Due to type erasure they have no difference, although watch out for Scala
> tuple serialization.
>
>
> On Wednesday, June 8, 2016, Ted Yu
Due to type erasure they have no difference, although watch out for Scala
tuple serialization.
On Wednesday, June 8, 2016, Ted Yu wrote:
> I think the second group (3 classOf's) should be used.
>
> Cheers
>
> On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov > wrote:
>
>> if my RDD is RDD[(St
I think the second group (3 classOf's) should be used.
Cheers
On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov
wrote:
> if my RDD is RDD[(String, (Long, MyClass))]
>
> Do I need to register
>
> classOf[MyClass]
> classOf[(Any, Any)]
>
> or
>
> classOf[MyClass]
> classOf[(Long, MyClass)]
> cl
if my RDD is RDD[(String, (Long, MyClass))]
Do I need to register
classOf[MyClass]
classOf[(Any, Any)]
or
classOf[MyClass]
classOf[(Long, MyClass)]
classOf[(String, (Long, MyClass))]
?
We were able to reproduce it with a minimal example. I've opened a jira
issue:
https://issues.apache.org/jira/browse/SPARK-15825
On Wed, Jun 8, 2016 at 12:43 PM, Koert Kuipers wrote:
> great!
>
> we weren't able to reproduce it because the unit tests use a
> broadcast-join while on the cluster
great!
we weren't able to reproduce it because the unit tests use a broadcast-join
while on the cluster it uses sort-merge-join. so the issue is in
sort-merge-join.
we are now able to reproduce it in tests using
spark.sql.autoBroadcastJoinThreshold=-1
it produces weird looking garbled results in
I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
similar looking issue. Analyzing the core dump from the segv with Memory
Analyzer it looks very much like a UTF8String is very corrupt.
Cheers,
On Fri, 27 May 2016 at 21:00 Koert Kuipers wrote:
> hello all,
> after getting ou
OK, this is done:
http://spark.apache.org/documentation.html
http://spark.apache.org/docs/2.0.0-preview/
http://spark.apache.org/docs/preview/
On Tue, Jun 7, 2016 at 4:59 PM, Shivaram Venkataraman
wrote:
> As far as I know the process is just to copy docs/_site from the build
> to the appropriat
16 matches
Mail list logo