Hi Youssef, You need to provide more background context:
- Which Hive sink are you using? We are working on the official Hive sink for community and will be released in 1.9. So did you develop yours in house? - What do you mean by 1st, 2nd, 3rd window? You mean the parallel instances of the same operator, or do you have you have 3 windowing operations chained? - What does your Hive table look like? E.g. is it partitioned or non-partitioned? If partitioned, how many partitions do you have? is it writing in static partition or dynamic partition mode? what format? how large? - What does your sink do - is each parallelism writing to multiple partitions or a single partition/table? Is it only appending data or upserting? On Wed, Jul 3, 2019 at 1:38 AM Youssef Achbany <youssef.achb...@euranova.eu> wrote: > Dear all, > > I'm working for a big project and one of the challenge is to read Kafka > topics and copy them via Hive command into Hive managed tables in order to > enable ACID HIVE properties. > > I try it but I have a issue with back pressure: > - The first window read 20.000 events and wrote them in Hive tables > - The second, third, ... send only 100 events because the write in Hive > take more time than the read of a Kafka topic. But writing 100 events or > 50.000 events takes +/- the same time for Hive. > > Someone have already do this source and sink? Could you help on this? > Or have you some tips? > It seems that defining a size window on number of event instead time is not > possible. Is it true? > > Thank you for your help > > Youssef > > -- > ♻ Be green, keep it on the screen >