Re: HadoopIndexer job with input as the Druid datasource and configured segments table

Samarth Jain Mon, 15 Apr 2019 14:12:42 -0700

Thanks for the reply, Jihoon. I am slightly worried about simply switching
the parameter values as described above since the two tables are used
extensively in the code base. Will raise an issue.


On Mon, Apr 15, 2019 at 2:04 PM Jihoon Son <ghoon...@gmail.com> wrote:

> Hi Samarth,
>
> it definitely looks a bug to me.
> I'm not sure there's a workaround for this problem though.
>
> Jihoon
>
> On Mon, Apr 15, 2019 at 1:39 PM Samarth Jain <samarth.j...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We are building out a realtime ingestion pipeline using Kafka Indexing
> > service for Druid. In order to achieve better rollup, I was trying out
> the
> > hadoop based reingestion job
> > http://druid.io/docs/latest/ingestion/update-existing-data.html  which
> > basically uses the datasource itself as the input.
> >
> > When I ran the job, it failed because it was trying to read segment
> > metadata from druid_segments table and not from the table,
> > customprefix_segments, I specified in the metadataUpdateSpec.
> >
> > "metadataUpdateSpec": {
> >       "connectURI": "jdbc:mysql...",
> >       "password": "XXXXXXX",
> >       "segmentTable": "customprefix_segments",
> >       "type": "mysql",
> >       "user": "XXXXXXXX"
> > },
> >
> > Looking at the code, I see that the segmentTable specified in the spec is
> > actually passed in as pending_segments table (3rd param is for
> > pending_segments and 4th param is for segments table)
> >
> >
> https://github.com/apache/incubator-druid/blob/master/indexing-hadoop/src/main/java/org/apache/druid/indexer/updater/MetadataStorageUpdaterJobSpec.java#L92
> >
> > As a result, the re-ingestion job tries to read from the default segments
> > table named DRUID_SEGMENTS which isn't present.
> >
> > Is this intentional or a bug?
> >
> > Is there a way to configure the segments table name for this kind of
> > re-ingestion job?
> >
>

Re: HadoopIndexer job with input as the Druid datasource and configured segments table

Reply via email to