Re: Does reduceByKey only work properly for numeric keys?

Sean Owen Sat, 18 Apr 2015 08:32:43 -0700

Do these datetime objects implement a the notion of equality you'd
expect? (This may be a dumb question; I'm thinking of the equivalent
of equals() / hashCode() from the Java world.)


On Sat, Apr 18, 2015 at 4:17 PM, SecondDatke
<lovejay-lovemu...@outlook.com> wrote:
> I'm trying to solve a Word-Count like problem, the difference lies in that,
> I need the count of a specific word among a specific timespan in a social
> message stream.
>
> My data is in the format of (time, message), and I transformed (flatMap
> etc.) it into a series of (time, word_id), the time is represented with
> Python datetime.datetime class. And I continued to transform it to ((time,
> word_id), 1) then use reduceByKey for result.
>
> But the dataset returned is a little weird, just like the following:
>
> format:
> ((timespan with datetime.datetime, wordid), freq)
>
> ((datetime.datetime(2009, 10, 6, 2, 0), 0), 8)
> ((datetime.datetime(2009, 10, 6, 3, 0), 0), 3)
> ((datetime.datetime(2009, 10, 6, 3, 0), 0), 14)
>
> As you can see, there are DUPLICATED keys, but as a result of reducedByKey,
> all keys SHOULD BE UNIQUE.
>
> I tried to convert the key to a string (like '2006-12-02 21:00:00-000') and
> reducedByKey again, the problem stays. It seems the only way left for me is
> convert the date to a timestamp, but this time it works.
>
> Is this expected behavior of reduceByKey(and all other transformations that
> work with keys)?
>
> Currently I'm still working on it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Does reduceByKey only work properly for numeric keys?

Reply via email to