Just in case it is a metrics bug, I will add a step to do my own counting in the Flink job.
Michael > On Apr 25, 2018, at 9:52 AM, TechnoMage <mla...@technomage.com> wrote: > > I have another java program reading the topic to monitor the test. It > receives 60,000 records on the “travel” topic, while the kafka consumer only > reports 4,138. That and the incongruity of the source to the maps are what > seems very weird. I have several other topics where the source is built with > the exact same code, and where there are multiple maps following, and the > numbers all match expectations. > > Michael > >> On Apr 25, 2018, at 12:59 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org >> <mailto:tzuli...@apache.org>> wrote: >> . >> Hi, >> >> Just to clarify your observation here: >> >> Is the problem the fact that map operators after the “source: travel” Kafka >> topic source do not receive all records from the source? >> This does seem weird, but as of now I don’t really have ideas yet of how >> this could maybe be Flink related. >> >> One other thing to be sure of - have you verified that the outputs of your >> test are incorrect? >> Or are you assuming that it is incorrect based on the weird metric numbers >> shown on the web ui? >> >> Cheers, >> Gordon >> >> On 25 April 2018 at 6:13:07 AM, TechnoMage (mla...@technomage.com >> <mailto:mla...@technomage.com>) wrote: >> >>> Still trying to figure out what is happening with this kafka topic. >>> >>> 1) No exceptions in the task manager log or the UI exceptions tab. >>> 2) Topic partitions being reset to offset 0 confirmed in log. >>> 3) Other topics in this and other tests show full consumption of messages >>> (all JSON format text). >>> 4) The source shows more records output than are received by 2 of the 3 >>> following stages. >>> >>> The diagram for the job is below, as is the GUI showing tasks and record >>> counts. >>> >>> <38f1e887-8f89-453c-8289-442ae2af1...@hsd1.co.comcast.net >>> <mailto:38f1e887-8f89-453c-8289-442ae2af1...@hsd1.co.comcast.net>.> >>> >>>> On Apr 23, 2018, at 11:23 AM, TechnoMage <mla...@technomage.com >>>> <mailto:mla...@technomage.com>> wrote: >>>> >>>> I have been using the kafka connector sucessfully for a while now. But, >>>> am getting weird results in one case. >>>> >>>> I have a test that submits 3 streams to kafka topics, and monitors them on >>>> a separate process. The flink job has a source for each topic, and one >>>> such is fed to 3 separate map functions that lead to other operators. >>>> This topic only shows 6097 out of 30000 published, and the map functions >>>> following the source only show a fraction of that as received. The >>>> consumer is configured to start at the begining and in other cases the >>>> same code receives all messages published. The parallelism is 6 if that >>>> makes a difference, as is the partitioning on the topics. >>>> >>>> The code for creating the topic is below. >>>> >>>> Any suggestions on why it is missing so many messages would be welcome. >>>> >>>> Michael >>>> >>>> String topic = a.kafkaTopicName(ets); >>>> Properties props = new Properties(); >>>> props.setProperty("bootstrap.servers", servers); >>>> props.setProperty("group.id <http://group.id/>", >>>> UUID.randomUUID().toString()); >>>> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); >>>> DataStream<String> ds = consumers.get(a.eventType); >>>> if (ds == null) { >>>> FlinkKafkaConsumer011<String> cons = new >>>> FlinkKafkaConsumer011<String>( >>>> topic, new SimpleStringSchema(), props); >>>> cons.setStartFromEarliest(); >>>> ds = env.addSource(cons).name(et.name).rebalance(); >>>> consumers.put(a.eventType, ds); >>>> }