Re: Performance issue with RegistryAvroSerializationSchema

2020-02-07 Thread Robert Metzger
Steve, thanks a lot for looking into this closer! Let's discuss the resolution of the issue in the ticket Dawid has created: https://issues.apache.org/jira/browse/FLINK-15941 Best, Robert On Thu, Feb 6, 2020 at 6:59 PM Steve Whelan wrote: > Robert, > > You are correct that it is using a *Cache

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Steve Whelan
Robert, You are correct that it is using a *CachedSchemaRegistryClient* object. Therefore, *schemaRegistryClient.*register() should be checking the cache first before sending a request to the Registry. However, turning on debug logging of my Registry, I can see a request being sent for every seria

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Dawid Wysakowicz
Hi Steve, I think your observation is correct. If I am not mistaken we should use *schemaRegistryClient.getId(subject, schema); *instead of** **schemaRegistryClient.register(subject, schema);. **The former should perform an http request only if the schema is not in the cache. I created an issue t

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Robert Metzger
Hi, thanks a lot for your message. It's certainly not intentional to do a HTTP request for every single message :) Isn't the *schemaRegistryClient *an instance of CachedSchemaRegistryClient, which, as the name says, caches? Can you check with a debugger at runtime what registry client is used, and

Performance issue with RegistryAvroSerializationSchema

2020-02-03 Thread Steve Whelan
Hi, I'm running Flink v1.9. I backported the commit adding serialization support for Confluent's schema registry[1]. Using the code as is, I saw a nearly 50% drop in peak throughput for my job compared to using *AvroRowSerializationSchema*. Looking at the code, *RegistryAvroSerializationSchema.se