Hi, Sining,

There are a few questions to be asked s.t. we know your application use
case better.

1) In what format is your old userid-db data?
2) Is the old userid-db data partitioned using the same key and the same
number of partitions as you expect to consume in your Samza job?

Generally speaking, we would have to employ a batch-to-stream push job
because
1) Your old userid-db may not already be a RocksDB database file.
2) Your old userid-db may not be partitioned the same way as you expect to
consume in your Samza job.
3) The location of a specific partition of your userid-db in a Samza job is
dynamically allocated as YARN schedules the containers in the cluster.
Hence, where to copy the offline data over is not known apriori.

-Yi


On Thu, Jun 30, 2016 at 5:35 PM, 李斯宁 <lisin...@gmail.com> wrote:

> hi guys,
> I am trying use samza for realtime process.  I need to join stream with a
> userid-db.  How can I import initial data from other place into kv store?
>
> From the document, I can imagine how to build the userid-db from empty by
> consuming log stream.  But in my case, I have historical userid-db data,
> and I don't want to process long history log to build the userid-db from
> empty. So I need to import userid-db from my old batch processing system.
>
> any reply is appreciated, thanks in advance.
>
> --
> 李斯宁
>

Reply via email to