Hi, have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing.
Best, Matthias [1] https://flink.apache.org/downloads.html#additional-components On Wed, Feb 10, 2021 at 1:24 PM meneldor <[email protected]> wrote: > Hello, > I am using PyFlink and I want to write records from the table sql api as > parquet files on AWS S3. I followed the documentations but it seems that > I'm missing some dependencies or/and configuration. Here is the SQL: > >> CREATE TABLE sink_table( >> `id` VARCHAR, >> `type` VARCHAR, >> `machn` VARCHAR, >> `lastacct_id` BIGINT, >> `upd_ts` BIGINT >> ) WITH ( >> 'connector' = 'filesystem', >> 'path' = 's3a://my-bucket/flink_sink', >> 'format' = 'parquet' >> ) >> >> This is the configuration in flink-conf.yaml: > > s3.endpoint: https://s3.us-west-1.amazonaws.com >> s3.path.style.access: true >> s3.access-key: ***KEY-STRING*** >> s3.secret-key: ***KEY-SECRET-STRING*** >> s3.entropy.key: _entropy_ >> s3.entropy.length: 8 >> hadoop.s3.socket-timeout: 10m >> > I downloaded flink-s3-fs-hadoop-1.12.1.jar and > flink-hadoop-compatibility_2.11-1.12.1.jar in plugins/ and > flink-sql-parquet_2.11-1.12.1.jar in lib/ > > Here is the exception: > >> Traceback (most recent call last): >> File "s3_sink.py", line 101, in <module> >> """) >> File >> "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/table/table_environment.py", >> line 766, in execute_sql >> return TableResult(self._j_tenv.executeSql(stmt)) >> File >> "/home/user/miniconda3/lib/python3.7/site-packages/py4j/java_gateway.py", >> line 1286, in __call__ >> answer, self.gateway_client, self.target_id, self.name) >> File >> "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/util/exceptions.py", >> line 147, in deco >> return f(*a, **kw) >> File >> "/home/user/miniconda3/lib/python3.7/site-packages/py4j/protocol.py", line >> 328, in get_return_value >> format(target_id, ".", name), value) >> py4j.protocol.Py4JJavaError: An error occurred while calling >> o14.executeSql. >> : java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration >> > at >> org.apache.flink.formats.parquet.ParquetFileFormatFactory.getParquetConfiguration(ParquetFileFormatFactory.java:115) > > at >> org.apache.flink.formats.parquet.ParquetFileFormatFactory.access$000(ParquetFileFormatFactory.java:51) > > at >> org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:103) > > at >> org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:97) > > at >> org.apache.flink.table.filesystem.FileSystemTableSink.createWriter(FileSystemTableSink.java:373) > > at >> org.apache.flink.table.filesystem.FileSystemTableSink.createStreamingSink(FileSystemTableSink.java:183) > > at >> org.apache.flink.table.filesystem.FileSystemTableSink.consume(FileSystemTableSink.java:145) > > at >> org.apache.flink.table.filesystem.FileSystemTableSink.lambda$getSinkRuntimeProvider$0(FileSystemTableSink.java:134) > > at >> org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalSink.createSinkTransformation(CommonPhysicalSink.scala:95) > > at >> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:103) > > at >> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43) > > at >> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59) > > at >> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43) > > at >> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66) > > at >> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65) > > at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > > at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > > at >> org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65) > > at >> org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:167) > > at >> org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1329) > > at >> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:676) > > at >> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767) > > at >> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) > > at >> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at >> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > > at >> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > > at >> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > > at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > > at >> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > > at >> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > > at >> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > > at java.base/java.lang.Thread.run(Thread.java:834) > > Caused by: java.lang.ClassNotFoundException: >> org.apache.hadoop.conf.Configuration >> > at >> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > > at >> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) > > ... 40 more > > > Thank you! >
