yashmayya opened a new pull request, #13548:
URL: https://github.com/apache/pinot/pull/13548
- Currently, a number of quickstarts (`GenericQuickstart`,
`MultistageEngineQuickStart` etc.) fail locally with errors like:
```
java.lang.RuntimeException: Caught exception during running -
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
~[classes/:?]
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
~[classes/:?]
at
org.apache.pinot.tools.BootstrapTableTool.setupOfflineData(BootstrapTableTool.java:253)
~[classes/:?]
at
org.apache.pinot.tools.BootstrapTableTool.bootstrapOfflineTable(BootstrapTableTool.java:194)
~[classes/:?]
at
org.apache.pinot.tools.BootstrapTableTool.execute(BootstrapTableTool.java:104)
~[classes/:?]
at
org.apache.pinot.tools.admin.command.QuickstartRunner.bootstrapTable(QuickstartRunner.java:232)
~[classes/:?]
at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:86)
~[classes/:?]
at
org.apache.pinot.tools.GenericQuickstart.execute(GenericQuickstart.java:78)
~[classes/:?]
at
org.apache.pinot.tools.GenericQuickstart.main(GenericQuickstart.java:83)
~[classes/:?]
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for
file -
file:/Users/yash/Repos/pinot/pinot-tools/target/classes/examples/batch/fineFoodReviews/rawdata/fine_food_reviews_with_embeddings_1k.parquet.gzip
at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:287)
~[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.lang.NoClassDefFoundError:
org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
at
org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetHadoopConfiguration(ParquetUtils.java:91)
~[classes/:?]
at
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.init(ParquetNativeRecordReader.java:66)
~[classes/:?]
at
org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:148)
~[classes/:?]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:144)
~[classes/:?]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:120)
~[classes/:?]
at
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:115)
~[classes/:?]
at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:265)
~[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
at
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
~[?:?]
at
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
~[?:?]
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
~[?:?]
at
org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetHadoopConfiguration(ParquetUtils.java:91)
~[classes/:?]
at
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.init(ParquetNativeRecordReader.java:66)
~[classes/:?]
at
org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:148)
~[classes/:?]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:144)
~[classes/:?]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:120)
~[classes/:?]
at
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:115)
~[classes/:?]
at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:265)
~[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
```
- The shaded woodstox dependency comes from `hadoop-client-runtime` which
wasn't in the dependency tree for `pinot-parquet` before this change. There's
not many additional transitive dependencies being brought in through this -
```
+- org.apache.hadoop:hadoop-client-runtime:jar:3.3.4:compile
| +- org.apache.hadoop:hadoop-client-api:jar:3.3.4:runtime
| \- commons-logging:commons-logging:jar:1.1.3:runtime
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]