Hi, Sining,

Yes! What you did is exactly what I meant by "batch-to-stream job"! Enjoy
Samza!

-Yi

On Mon, Jul 11, 2016 at 8:50 AM, 李斯宁 <lisin...@gmail.com> wrote:

> hi, Yi
> Thanks to your respones.
> My old userid-db is stored in a hdfs folder, and I have found a way to
> import my userid data.
>
> 1) Create a MapReduce job to completely write userid data to a kafka topic,
> let's call it "import_uid"
> 2) In samza JoinTask's configuration, set "import_uid" as a bootstrap input
> stream.
> 3) In the task, when processing "import_uid" topic, write to the kv store.
> 4) In the task, when processing my realtime stream, read from kv store and
> do the join.
>
> The "key point" is import data with bootstrap stream.
> Is this your "batch-to-stream" approach mean?
>
>
> On Thu, Jul 7, 2016 at 1:05 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Sining,
> >
> > There are a few questions to be asked s.t. we know your application use
> > case better.
> >
> > 1) In what format is your old userid-db data?
> > 2) Is the old userid-db data partitioned using the same key and the same
> > number of partitions as you expect to consume in your Samza job?
> >
> > Generally speaking, we would have to employ a batch-to-stream push job
> > because
> > 1) Your old userid-db may not already be a RocksDB database file.
> > 2) Your old userid-db may not be partitioned the same way as you expect
> to
> > consume in your Samza job.
> > 3) The location of a specific partition of your userid-db in a Samza job
> is
> > dynamically allocated as YARN schedules the containers in the cluster.
> > Hence, where to copy the offline data over is not known apriori.
> >
> > -Yi
> >
> >
> > On Thu, Jun 30, 2016 at 5:35 PM, 李斯宁 <lisin...@gmail.com> wrote:
> >
> > > hi guys,
> > > I am trying use samza for realtime process.  I need to join stream
> with a
> > > userid-db.  How can I import initial data from other place into kv
> store?
> > >
> > > From the document, I can imagine how to build the userid-db from empty
> by
> > > consuming log stream.  But in my case, I have historical userid-db
> data,
> > > and I don't want to process long history log to build the userid-db
> from
> > > empty. So I need to import userid-db from my old batch processing
> system.
> > >
> > > any reply is appreciated, thanks in advance.
> > >
> > > --
> > > 李斯宁
> > >
> >
>
>
>
> --
> 李斯宁
>

Reply via email to