Thank you very much for coming back and reporting the good news! :-) If you think that there is something that we can do to improve Flink's ORC input format, for example log a warning, please open a Jira.
Thank you, Fabian Am Mi., 25. Sept. 2019 um 05:14 Uhr schrieb 163 <shuyun...@163.com>: > Hi Fabian, > > After debugging in local mode, I found that Flink orc connector is no > problem, but some fields in our schema is in capital form,so these fields > can not be matched. > But the program directly read orc file using includeColumns method, which > will use equalsIgnoreCase to match the column, so it can read the fields. > > Thanks for your Help! > > Qi Shu > > > 在 2019年9月24日,下午4:36,Fabian Hueske <fhue...@gmail.com> 写道: > > Hi QiShu, > > It might be that Flink's OrcInputFormat has a bug. > Can you open a Jira issue to report the problem? > In order to be able to fix this, we need as much information as possible. > It would be great if you could create a minimal example of an ORC file and > a program that reproduces the issue. > If that's not possible, we need the schema of an Orc file that cannot be > correctly read. > > Thanks, > Fabian > > Am Mo., 23. Sept. 2019 um 11:40 Uhr schrieb ShuQi <shuyun...@163.com>: > >> Hi Guys, >> >> The Flink version is 1.9.0. I use OrcTableSource to read ORC file in HDFS >> and the job is executed successfully, no any exception or error. But some >> fields(such as tagIndustry) are always null, actually these fields are not >> null. I can read these fields by direct reading it. Below is my code: >> >> //main >> final ParameterTool params = ParameterTool.fromArgs(args); >> >> final ExecutionEnvironment env = >> ExecutionEnvironment.getExecutionEnvironment(); >> >> env.getConfig().setGlobalJobParameters(params); >> >> Configuration config = new Configuration(); >> >> >> OrcTableSource orcTableSource = OrcTableSource >> .builder() >> .path(params.get("input")) >> .forOrcSchema(TypeDescription.fromString(TYPE_INFO)) >> .withConfiguration(config) >> .build(); >> >> DataSet<Row> dataSet = orcTableSource.getDataSet(env); >> >> DataSet<Tuple2<String, Integer>> counts = dataSet.flatMap(new >> Tokenizer()).groupBy(0).sum(1); >> >> //read field >> public void flatMap(Row row, Collector<Tuple2<String, Integer>> out) { >> >> String content = ((String) row.getField(6)); >> String tagIndustry = ((String) row.getField(35)); >> >> LOGGER.info("arity: " + row.getArity()); >> LOGGER.info("content: " + content); >> LOGGER.info("tagIndustry: " + tagIndustry); >> LOGGER.info("==========================="); >> >> if (Strings.isNullOrEmpty(content) || >> Strings.isNullOrEmpty(tagIndustry) || !tagIndustry.contains("FD001")) { >> return; >> } >> // normalize and split the line >> String[] tokens = content.toLowerCase().split("\\W+"); >> >> // emit the pairs >> for (String token : tokens) { >> if (token.length() > 0) { >> out.collect(new Tuple2<>(token, 1)); >> } >> } >> } >> >> Thanks for your help! >> >> QiShu >> >> >> >> >> > >