shall we document this in the API doc?
Best,
--
Nan Zhu
On Sunday, September 21, 2014 at 12:18 PM, Debasish Das wrote:
> zipWithUniqueId is also affected...
>
> I had to persist the dictionaries to make use of the indices lower down in
> the flow...
>
> On Sun, Sep 21, 2014 at 1:15 AM, S
Yes, Matei made a JIRA last week and I just suggested a PR:
https://github.com/apache/spark/pull/2508
On Sep 23, 2014 2:55 PM, "Nan Zhu" wrote:
> shall we document this in the API doc?
>
> Best,
>
> --
> Nan Zhu
>
> On Sunday, September 21, 2014 at 12:18 PM, Debasish Das wrote:
>
> zipWithUnique
great, thanks
--
Nan Zhu
On Tuesday, September 23, 2014 at 9:58 AM, Sean Owen wrote:
> Yes, Matei made a JIRA last week and I just suggested a PR:
> https://github.com/apache/spark/pull/2508
> On Sep 23, 2014 2:55 PM, "Nan Zhu" (mailto:zhunanmcg...@gmail.com)> wrote:
> > shall we document t
zipWithUniqueId is also affected...
I had to persist the dictionaries to make use of the indices lower down in
the flow...
On Sun, Sep 21, 2014 at 1:15 AM, Sean Owen wrote:
> Reference - https://issues.apache.org/jira/browse/SPARK-3098
> I imagine zipWithUniqueID is also affected, but may not h
Reference - https://issues.apache.org/jira/browse/SPARK-3098
I imagine zipWithUniqueID is also affected, but may not happen to have
exhibited in your test.
On Sun, Sep 21, 2014 at 2:13 AM, Debasish Das wrote:
> Some more debug revealed that as Sean said I have to keep the dictionaries
> persisted
Some more debug revealed that as Sean said I have to keep the dictionaries
persisted till I am done with the RDD manipulation.
Thanks Sean for the pointer...would it be possible to point me to the JIRA
as well ?
Are there plans to make it more transparent for the users ?
Is it possible for t
I changed zipWithIndex to zipWithUniqueId and that seems to be working...
What's the difference between zipWithIndex vs zipWithUniqueId ?
For zipWithIndex we don't need to run the count to compute the offset which
is needed for zipWithUniqueId and so zipWithIndex is efficient ? It's not
very clea
I did not persist / cache it as I assumed zipWithIndex will preserve
order...
There is also zipWithUniqueId...I am trying that...If that also shows the
same issue, we should make it clear in the docs...
On Sat, Sep 20, 2014 at 1:44 PM, Sean Owen wrote:
> From offline question - zipWithIndex is
>From offline question - zipWithIndex is being used to assign IDs. From a
recent JIRA discussion I understand this is not deterministic within a
partition so the index can be different when the RDD is reevaluated. If you
need it fixed, persist the zipped RDD on disk or in memory.
On Sep 20, 2014 8:
Hi,
I am building a dictionary of RDD[(String, Long)] and after the dictionary
is built and cached, I find key "almonds" at value 5187 using:
rdd.filter{case(product, index) => product == "almonds"}.collect
Output:
Debug product almonds index 5187
Now I take the same dictionary and write it out
10 matches
Mail list logo