Hi, Sining, Yes! What you did is exactly what I meant by "batch-to-stream job"! Enjoy Samza!
-Yi On Mon, Jul 11, 2016 at 8:50 AM, 李斯宁 <lisin...@gmail.com> wrote: > hi, Yi > Thanks to your respones. > My old userid-db is stored in a hdfs folder, and I have found a way to > import my userid data. > > 1) Create a MapReduce job to completely write userid data to a kafka topic, > let's call it "import_uid" > 2) In samza JoinTask's configuration, set "import_uid" as a bootstrap input > stream. > 3) In the task, when processing "import_uid" topic, write to the kv store. > 4) In the task, when processing my realtime stream, read from kv store and > do the join. > > The "key point" is import data with bootstrap stream. > Is this your "batch-to-stream" approach mean? > > > On Thu, Jul 7, 2016 at 1:05 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Sining, > > > > There are a few questions to be asked s.t. we know your application use > > case better. > > > > 1) In what format is your old userid-db data? > > 2) Is the old userid-db data partitioned using the same key and the same > > number of partitions as you expect to consume in your Samza job? > > > > Generally speaking, we would have to employ a batch-to-stream push job > > because > > 1) Your old userid-db may not already be a RocksDB database file. > > 2) Your old userid-db may not be partitioned the same way as you expect > to > > consume in your Samza job. > > 3) The location of a specific partition of your userid-db in a Samza job > is > > dynamically allocated as YARN schedules the containers in the cluster. > > Hence, where to copy the offline data over is not known apriori. > > > > -Yi > > > > > > On Thu, Jun 30, 2016 at 5:35 PM, 李斯宁 <lisin...@gmail.com> wrote: > > > > > hi guys, > > > I am trying use samza for realtime process. I need to join stream > with a > > > userid-db. How can I import initial data from other place into kv > store? > > > > > > From the document, I can imagine how to build the userid-db from empty > by > > > consuming log stream. But in my case, I have historical userid-db > data, > > > and I don't want to process long history log to build the userid-db > from > > > empty. So I need to import userid-db from my old batch processing > system. > > > > > > any reply is appreciated, thanks in advance. > > > > > > -- > > > 李斯宁 > > > > > > > > > -- > 李斯宁 >