if the num of user-item pairs to predict aren't too large, say millions,
you could transform the target dataframe and save the result to a hive
table, then build cache based on that table for online services.
if it's not the case(such as billions of user item pairs to predict), you
have to start a
for my understanding, all transformations are thread-safe cause dataframe
is just a description of the calculation and it's immutable, so the case
above is all right. just be careful with the actions.
On Sun, Feb 12, 2017 at 4:06 PM, Mendelson, Assaf
wrote:
> Hi,
>
> I was wondering if dataframe
why not sync binlog of mysql(hopefully the data is immutable and the table
is append-only), send the log through kafka and then consume it by spark
streaming?
On Fri, Dec 30, 2016 at 9:01 AM, Michael Armbrust
wrote:
> We don't support this yet, but I've opened this JIRA as it sounds
> generally