Re: HadoopIndexer job with input as the Druid datasource and configured segments table

Jihoon Son Mon, 15 Apr 2019 14:05:17 -0700

Hi Samarth,

it definitely looks a bug to me.
I'm not sure there's a workaround for this problem though.


Jihoon

On Mon, Apr 15, 2019 at 1:39 PM Samarth Jain <samarth.j...@gmail.com> wrote:

> Hi,
>
> We are building out a realtime ingestion pipeline using Kafka Indexing
> service for Druid. In order to achieve better rollup, I was trying out the
> hadoop based reingestion job
> http://druid.io/docs/latest/ingestion/update-existing-data.html  which
> basically uses the datasource itself as the input.
>
> When I ran the job, it failed because it was trying to read segment
> metadata from druid_segments table and not from the table,
> customprefix_segments, I specified in the metadataUpdateSpec.
>
> "metadataUpdateSpec": {
>       "connectURI": "jdbc:mysql...",
>       "password": "XXXXXXX",
>       "segmentTable": "customprefix_segments",
>       "type": "mysql",
>       "user": "XXXXXXXX"
> },
>
> Looking at the code, I see that the segmentTable specified in the spec is
> actually passed in as pending_segments table (3rd param is for
> pending_segments and 4th param is for segments table)
>
> https://github.com/apache/incubator-druid/blob/master/indexing-hadoop/src/main/java/org/apache/druid/indexer/updater/MetadataStorageUpdaterJobSpec.java#L92
>
> As a result, the re-ingestion job tries to read from the default segments
> table named DRUID_SEGMENTS which isn't present.
>
> Is this intentional or a bug?
>
> Is there a way to configure the segments table name for this kind of
> re-ingestion job?
>

Re: HadoopIndexer job with input as the Druid datasource and configured segments table

Reply via email to