[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536560#comment-16536560 ]
ASF GitHub Bot commented on FLINK-9407: --------------------------------------- Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/6075#discussion_r200888987 --- Diff: flink-connectors/flink-orc/pom.xml --- @@ -54,6 +54,14 @@ under the License. <optional>true</optional> </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-filesystem_${scala.binary.version}</artifactId> + <version>${project.version}</version> + <!-- Projects depending on this project, won't depend on flink-filesystem. --> + <optional>true</optional> + </dependency> + <dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-core</artifactId> --- End diff -- Yes. We can upgrade it. Will update. > Support orc rolling sink writer > ------------------------------- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector > Reporter: zhangminglei > Assignee: zhangminglei > Priority: Major > Labels: pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-----+---+-------+ > | name|age|married| > +-----+---+-------+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-----+---+-------+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)