Hi, Sining, There are a few questions to be asked s.t. we know your application use case better.
1) In what format is your old userid-db data? 2) Is the old userid-db data partitioned using the same key and the same number of partitions as you expect to consume in your Samza job? Generally speaking, we would have to employ a batch-to-stream push job because 1) Your old userid-db may not already be a RocksDB database file. 2) Your old userid-db may not be partitioned the same way as you expect to consume in your Samza job. 3) The location of a specific partition of your userid-db in a Samza job is dynamically allocated as YARN schedules the containers in the cluster. Hence, where to copy the offline data over is not known apriori. -Yi On Thu, Jun 30, 2016 at 5:35 PM, 李斯宁 <lisin...@gmail.com> wrote: > hi guys, > I am trying use samza for realtime process. I need to join stream with a > userid-db. How can I import initial data from other place into kv store? > > From the document, I can imagine how to build the userid-db from empty by > consuming log stream. But in my case, I have historical userid-db data, > and I don't want to process long history log to build the userid-db from > empty. So I need to import userid-db from my old batch processing system. > > any reply is appreciated, thanks in advance. > > -- > 李斯宁 >