yashmayya opened a new pull request, #13548:
URL: https://github.com/apache/pinot/pull/13548

   - Currently, a number of quickstarts (`GenericQuickstart`, 
`MultistageEngineQuickStart` etc.) fail locally with errors like:
   ```
   java.lang.RuntimeException: Caught exception during running - 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
        at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
 ~[classes/:?]
        at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
 ~[classes/:?]
        at 
org.apache.pinot.tools.BootstrapTableTool.setupOfflineData(BootstrapTableTool.java:253)
 ~[classes/:?]
        at 
org.apache.pinot.tools.BootstrapTableTool.bootstrapOfflineTable(BootstrapTableTool.java:194)
 ~[classes/:?]
        at 
org.apache.pinot.tools.BootstrapTableTool.execute(BootstrapTableTool.java:104) 
~[classes/:?]
        at 
org.apache.pinot.tools.admin.command.QuickstartRunner.bootstrapTable(QuickstartRunner.java:232)
 ~[classes/:?]
        at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:86) 
~[classes/:?]
        at 
org.apache.pinot.tools.GenericQuickstart.execute(GenericQuickstart.java:78) 
~[classes/:?]
        at 
org.apache.pinot.tools.GenericQuickstart.main(GenericQuickstart.java:83) 
~[classes/:?]
   Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for 
file - 
file:/Users/yash/Repos/pinot/pinot-tools/target/classes/examples/batch/fineFoodReviews/rawdata/fine_food_reviews_with_embeddings_1k.parquet.gzip
        at 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:287)
 ~[classes/:?]
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
 ~[?:?]
        at 
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
~[?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 ~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
   Caused by: java.lang.NoClassDefFoundError: 
org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
        at 
org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetHadoopConfiguration(ParquetUtils.java:91)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.init(ParquetNativeRecordReader.java:66)
 ~[classes/:?]
        at 
org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:148)
 ~[classes/:?]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:144)
 ~[classes/:?]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:120)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:115)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:265)
 ~[classes/:?]
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
 ~[?:?]
        at 
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
~[?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 ~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?]
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
        at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
 ~[?:?]
        at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
 ~[?:?]
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) 
~[?:?]
        at 
org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetHadoopConfiguration(ParquetUtils.java:91)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.init(ParquetNativeRecordReader.java:66)
 ~[classes/:?]
        at 
org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:148)
 ~[classes/:?]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:144)
 ~[classes/:?]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:120)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:115)
 ~[classes/:?]
        at 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:265)
 ~[classes/:?]
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
 ~[?:?]
        at 
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
~[?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
   ```
   - The shaded woodstox dependency comes from `hadoop-client-runtime` which 
wasn't in the dependency tree for `pinot-parquet` before this change. There's 
not many additional transitive dependencies being brought in through this -
   ```
    +- org.apache.hadoop:hadoop-client-runtime:jar:3.3.4:compile
    |  +- org.apache.hadoop:hadoop-client-api:jar:3.3.4:runtime
    |  \- commons-logging:commons-logging:jar:1.1.3:runtime
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to