kkrugler opened a new issue #7393:
URL: https://github.com/apache/pinot/issues/7393


   This is with Pinot 0.8.0, Hadoop 2.7.1, Java 8. When starting the job, we 
get the error below. This doesn't happen with Pinot 0.7.1
   
   ```
   21/09/03 17:05:49 INFO batch.IngestionJobLauncher: Trying to create instance 
for class 
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
   21/09/03 17:05:49 ERROR command.LaunchDataIngestionJobCommand: Got exception 
to kick off standalone data ingestion job - 
   java.lang.RuntimeException: Failed to create IngestionJobRunner instance for 
class - 
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
           at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:139)
           at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:103)
           at 
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:143)
           at 
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:74)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
           at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
   Caused by: java.lang.ClassNotFoundException: 
org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
           at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
           at 
org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:77)
           at 
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:302)
           at 
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:273)
           at 
org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:254)
           at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:137)
           ... 9 more
   ```
   Looking earlier in the logs, I see:
   ```
   21/09/03 17:05:49 INFO plugin.PluginManager: Trying to load plugin 
[pinot-batch-ingestion-hadoop] from location 
[/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pinot-batch-ing
   estion/pinot-batch-ingestion-hadoop]
   21/09/03 17:05:49 INFO plugin.PluginManager: Successfully loaded plugin 
[pinot-batch-ingestion-hadoop] from jar files: 
[file:/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pin
   
ot-batch-ingestion/pinot-batch-ingestion-hadoop/pinot-batch-ingestion-hadoop-0.8.0-SNAPSHOT-shaded.jar]
   21/09/03 17:05:49 INFO plugin.PluginManager: Successfully Loaded plugin 
[pinot-batch-ingestion-hadoop] from dir 
[/ebs/workflow/apache-pinot-0.8.0-SNAPSHOT-bin/plugins/pinot-batch-ing
   estion/pinot-batch-ingestion-hadoop]
   ```
   So it seems like the plugin was loaded successfully.
   
   If we change the run script to set `HADOOP_CLASSPATH` to explicitly include 
the `pinot-batch-ingestion-hadoop-0.8.0-SNAPSHOT-shaded.jar`, then we get past 
this first ClassNotFoundException, but then it fails in the Hadoop map task 
with a similar issue, even though we see the plugin tarball in the Hadoop 
distributed cache, and it has the expected set of plugins, and we see the 
plugin getting loaded (added extra logging, which is why the above build 
version is 0.8.0-SNAPSHOT).
   
   Kulbir Nijjer had the same issue with his Spark-based segment building job, 
which he worked around via:
   
   > @Ken Krugler yes there are some oddities around how things are working in 
certain Spark versions vs. others with same exact steps and we are looking into 
it. On Spark side in between using dependencyJarDir in ingestion yaml (need to 
manually upload all plugin jars to HDFS/S3/GCS path) and using --jars has 
worked well in my testing. Just by using -Dplugins in the command should cause 
jars to be packaged and added to YARN distributed cache but somehow that's not 
always happening.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to