mg>that depends on the underlying protocol you are attempting to proxy (see below)
________________________________ From: James Grant <ja...@queeg.org> Sent: Monday, March 25, 2019 1:21 PM To: users@kafka.apache.org Subject: Re: Proxying the Kafka protocol Thank you all. We have in the past exposed message streams backed by Kafka via a HTTP/POST and Websocket service which worked very well. We were able to filter messages based on schema compliance and it was very simple for the teams that generate the data to use. It also had no trouble scaling to the 100K messages / sec levels. However not exposing the Kafka protocol has it's drawbacks when you try to bring in other tools and teams who are already familiar with Kafka. So we looked for something that would provide: * Native Kafka protocol support MG>is the protocol you are trying to proxy is tcp/ip..then try juniper tcp/ip proxy: MG>https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-network-tcp-proxy-understanding.html Understanding TCP Proxy - Technical Documentation - Support - Juniper Networks - Juniper Networks - Network Security & Performance.<https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-network-tcp-proxy-understanding.html> Understanding TCP Proxy. A TCP proxy is a server that acts as an intermediary between a client and the destination server. Clients establish connections to the TCP proxy server, which then establishes a connection to the destination server. www.juniper.net MG>if the protocol you are trying to proxy is http OR https then try implementing squid http://www.squid-cache.org/ [http://upload.wikimedia.org/wikipedia/en/thumb/b/b7/Squid-cache_logo.jpg/200px-Squid-cache_logo.jpg1zA]<http://www.squid-cache.org/> squid : Optimising Web Delivery<http://www.squid-cache.org/> Squid: Optimising Web Delivery. Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. www.squid-cache.org MG>implementing either of the above proxies would whitelist/blacklist and implement NAT configurations for *all applications* MG>if on the other hand all you need to do is rewrite metadata then stick with your "kafka proxy" * Single endpoint access to make access between networks easier * Schema (and possibly other business logic) enforcement. I took a couple of weeks to create a PoC that works, at least, with the producer and consumer command line tools. I have this working now and can insert a predicate into the PRODUCE message handler that can reject messages. We plan to develop this further and take it beyond a PoC. I’d be keen to understand if you think this kind of component could be a good addition to the Kafka ecosystem? Are there any other capabilities that might be a good fit with this proxy layer? And most importantly, does anybody foresee any fundamental issues with this approach? James Grant Developer - Expedia Group On Tue, 19 Mar 2019 at 16:13, Hans Jespersen <h...@confluent.io> wrote: > > > You might want to take a look at kafka-proxy ( see > https://github.com/grepplabs/kafka-proxy < > https://github.com/grepplabs/kafka-proxy>). > It’s a true kafka protocol proxy and modified the metadata like advertized > listeners so it works when there is no ip routing between the client and > the brokers. > > -hans > > > > > > > On Mar 19, 2019, at 8:19 AM, James Grant <ja...@queeg.org> wrote: > > > > Hello, > > > > We would like to expose a Kafka cluster running on one network to clients > > that are running on other networks without having to have full routing > > between the two networks. In this case these networks are in different > AWS > > accounts but the concept applies more widely. We would like to access > Kafka > > over a single (or very few) host names. > > > > In addition we would like to filter incoming messages to enforce some > level > > of data quality and also impose some access control. > > > > A solution we are looking into is to provide a Kafka protocol level proxy > > that presents to clients as a single node Kafka cluster holding all the > > topics and partitions of the cluster behind it. This proxy would be able > to > > operate in a load balanced cluster behind a single DNS entry and would > also > > be able to intercept and filter/alter messages as they passed through. > > > > The advantages we see in this approach over the HTTP proxy is that it > > presents the Kafka protocol whilst also meaning that we can use a typical > > TCP level load balancer that it is easy to route connections to. This > means > > that we continue to use native Kafka clients. > > > > Does anything like this already exist? Does anybody think it would > useful? > > Does anybody know of any reason it would be impossible (or a bad idea) to > > do? > > > > James Grant > > > > Developer - Expedia Group > >