kkrugler opened a new issue #7393:
URL: https://github.com/apache/pinot/issues/7393
This is with Pinot 0.8.0, Hadoop 2.7.1, Java 8. When starting the job, we
get the error below. This doesn't happen with Pinot 0.7.1
```
21/09/03 17:05:49 INFO batch.IngestionJobLauncher: Trying to create instance
for class
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
21/09/03 17:05:49 ERROR command.LaunchDataIngestionJobCommand: Got exception
to kick off standalone data ingestion job -
java.lang.RuntimeException: Failed to create IngestionJobRunner instance for
class -
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:139)
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:103)
at
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:143)
at
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: java.lang.ClassNotFoundException:
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at
org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:77)
at
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:302)
at
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:273)
at
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:254)
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:137)
... 9 more
```
Looking earlier in the logs, I see:
```
21/09/03 17:05:49 INFO plugin.PluginManager: Trying to load plugin
[pinot-batch-ingestion-hadoop] from location
[/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pinot-batch-ing
estion/pinot-batch-ingestion-hadoop]
21/09/03 17:05:49 INFO plugin.PluginManager: Successfully loaded plugin
[pinot-batch-ingestion-hadoop] from jar files:
[file:/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pin
ot-batch-ingestion/pinot-batch-ingestion-hadoop/pinot-batch-ingestion-hadoop-0.8.0-SNAPSHOT-shaded.jar]
21/09/03 17:05:49 INFO plugin.PluginManager: Successfully Loaded plugin
[pinot-batch-ingestion-hadoop] from dir
[/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pinot-batch-ing
estion/pinot-batch-ingestion-hadoop]
```
So it seems like the plugin was loaded successfully.
If we change the run script to set `HADOOP_CLASSPATH` to explicitly include
the `pinot-batch-ingestion-hadoop-0.8.0-SNAPSHOT-shaded.jar`, then we get past
this first ClassNotFoundException, but then it fails in the Hadoop map task
with a similar issue, even though we see the plugin tarball in the Hadoop
distributed cache, and it has the expected set of plugins, and we see the
plugin getting loaded (added extra logging, which is why the above build
version is 0.8.0-SNAPSHOT).
Kulbir Nijjer had the same issue with his Spark-based segment building job,
which he worked around via:
> @Ken Krugler yes there are some oddities around how things are working in
certain Spark versions vs. others with same exact steps and we are looking into
it. On Spark side in between using dependencyJarDir in ingestion yaml (need to
manually upload all plugin jars to HDFS/S3/GCS path) and using --jars has
worked well in my testing. Just by using -Dplugins in the command should cause
jars to be packaged and added to YARN distributed cache but somehow that's not
always happening.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]