Hi Youssef,

You need to provide more background context:

- Which Hive sink are you using? We are working on the official Hive sink
for community and will be released in 1.9. So did you develop yours in
house?
- What do you mean by 1st, 2nd, 3rd window? You mean the parallel instances
of the same operator, or do you have you have 3 windowing operations
chained?
- What does your Hive table look like? E.g. is it partitioned or
non-partitioned? If partitioned, how many partitions do you have? is it
writing in static partition or dynamic partition mode? what format? how
large?
- What does your sink do - is each parallelism writing to multiple
partitions or a single partition/table? Is it only appending data or
upserting?

On Wed, Jul 3, 2019 at 1:38 AM Youssef Achbany <youssef.achb...@euranova.eu>
wrote:

> Dear all,
>
> I'm working for a big project and one of the challenge is to read Kafka
> topics and copy them via Hive command into Hive managed tables in order to
> enable ACID HIVE properties.
>
> I try it but I have a issue with back pressure:
> - The first window read 20.000 events and wrote them in Hive tables
> - The second, third, ... send only 100 events because the write in Hive
> take more time than the read of a Kafka topic. But writing 100 events or
> 50.000 events takes +/- the same time for Hive.
>
> Someone have already do this source and sink? Could you help on this?
> Or have you some tips?
> It seems that defining a size window on number of event instead time is not
> possible. Is it true?
>
> Thank you for your help
>
> Youssef
>
> --
> ♻ Be green, keep it on the screen
>

Reply via email to