1) Kafka Gradle thing would be great to figure out. Samza, Aurora (maybe a
few other Apache projects) would benefit too. We need a better way to
bootstrap the gradle-wrapper.jar.I created
https://issues.apache.org/jira/browse/KAFKA-1714 to track that.

2) I have seen, more than a few times, a Kafka deployment setting the
advertised.host.name as the same value (except with the ".") as broker.id (
host.name often not even used). That becomes especially helpful when using
cloud formation, chef, puppet, ansible, etc. We could auto-advertise the
broker list best better created
https://issues.apache.org/jira/browse/KAFKA-1715 for that.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Sat, Oct 18, 2014 at 12:03 AM, Ewen Cheslack-Postava <m...@ewencp.org>
wrote:

> The first issue he runs into is one I also find frustrating -- with
> cloud providers pushing SSDs, you have to use a pretty large instance
> type to get a reasonable test setup. I'm not sure if he couldn't launch
> an older type like m1.large (I think some newer AWS accounts aren't able
> to) or if he just didn't see it as an option since they are hidden by
> default. Even the largest general purpose instance types are pretty
> wimpy wrt storage, only 80GB local instance storage.
>
> The hostname issues are a well known pain point and unfortunately there
> aren't any great solutions that aren't EC2-specific. Here's a quick run
> down:
>
> * None of the images for popular distros on EC2 will auto-set the
> hostname beyond what EC2 already sets up (which isn't publicly
> routable). The following details might explain why they can't. For
> example, a recent Ubuntu image gives:
>
>   ubuntu@ip-172-30-2-76:~$ hostname
>   ip-172-30-2-76
>
>   ubuntu@ip-172-30-2-76:~$ cat /etc/hosts
>   127.0.0.1 localhost
>
>   # The following lines are desirable for IPv6 capable hosts
>   ::1 ip6-localhost ip6-loopback
>   --- cut irrelevant pieces ---
>
> * Sometimes the hostname is set, but isn't useful. For example, in this
> Ubuntu image, the hostname is set to "ip-[ip-address-]", but that isn't
> routable, so generates really irritating behavior. Running on the server
> itself (which is running in a VPC, see below for more details):
>
>   scala> InetAddress.getLocalHost
>   java.net.UnknownHostException: ip-172-30-2-76: ip-172-30-2-76: Name or
>   service not known
>           at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
>           at .<init>(<console>:9)
>           at .<clinit>(<console>)
>           at .<init>(<console>:11)
>           at .<clinit>(<console>)
>           at $print(<console>)
>           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>           at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>           at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>           at java.lang.reflect.Method.invoke(Method.java:606)
>           at
>
> scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
>           at
>
> scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
>           at
>
> scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
>           at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
>           at java.lang.Thread.run(Thread.java:745)
>   Caused by: java.net.UnknownHostException: ip-172-30-2-76: Name or
>   service not known
>           at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>           at
>           java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>           at
>
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>           at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
>           ... 14 more
>
> * As described in a bunch of places, the only reliable way to get public
> DNS info is through EC2's own instance metadata API:
> https://forums.aws.amazon.com/thread.jspa?threadID=77788 For example:
>
>   curl -s http://169.254.169.254/latest/meta-data/public-hostname
>
> might give something like:
>
>   ec2-203-0-113-25.compute-1.amazonaws.com
>
> * But you may not even *have* a public DNS hostname. If you launch in a
> VPC, you'll only get one if you set the VPC to generate them (and I'm
> pretty sure the default is to not create them):
> http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html The
> output of the curl call above will just be empty.
>
> * AWS is pretty aggressively trying to move away from EC2-Classic (i.e.
> non-VPC instances), so most new instances will end up in VPCs unless you
> are working in a grandfathered account + AZ. If VPC without public DNS
> is the default, we'll have to carefully guide new users in generating a
> setup that works properly if we try to use hostnames.
>
> * Even if you try moving the IP addresses, you still have to deal with
> VPCs. You can't directly get your public IP address without accessing
> something outside the host since you're in a VPC. You need to use the
> instance metadata API to look it up, i.e.,
>
>   curl -s http://169.254.169.254/latest/meta-data/public-ipv4
>
> * And yet another problem with IPs: unless you use an elastic IP, you're
> not guaranteed they'll be stable:
>
>   Auto-assign Public IP
>
>   Requests a public IP address from Amazon's public IP address pool,
>   to make your instance reachable from the Internet. In most cases, the
>   public IP address is associated with the instance until it’s stopped
>   or
>   terminated, after which it’s no longer available for you to use. If
>   you
>   require a persistent public IP address that you can associate and
>   disassociate at will, use an Elastic IP address (EIP) instead. You can
>   allocate your own EIP, and associate it to your instance after launch.
>
> I know Spark had some similar issues -- using their (very convenient!)
> ec2 script, you still ended up with some stuff in their web interface
> that linked to internal addresses such that the links wouldn't work. I'm
> not sure if they have figured out a decent work around. But as you can
> see from the above, it's unlikely you can use generic approaches to get
> the info we need -- it'll need to be platform specific, which probably
> means it's better to determine it outside the main Kafka code and
> provide it via advertised.host.name.
>
> -Ewen
>
> On Fri, Oct 17, 2014, at 05:11 PM, Gwen Shapira wrote:
> > Basically, the issue (or at least one of very many possible network
> > issues...) is that the server has "localhost" hardcoded as its
> > canonical name in /etc/hosts:
> >
> > [root@Billc-cent70x64 ~]# cat /etc/hosts
> > 127.0.0.1   localhost localhost.localdomain localhost4
> > localhost4.localdomain4 Billc-cent70x64
> > ::1         localhost localhost.localdomain localhost6
> > localhost6.localdomain6
> >
> > Unfortunately a very common default for RedHat and Centos machines.
> >
> > As the blog mentions, a good solution (other than instructing Kafka on
> > the right name to advertise) is to add the correct IP and hostname to
> > /etc/hosts. We may want to add this option to the FAQ.
> >
> > Gwen
> >
> >
> >
> >
> > On Fri, Oct 17, 2014 at 7:56 PM, Gwen Shapira <gshap...@cloudera.com>
> > wrote:
> > > It looks like we are using canonical hostname:
> > >
> > >  def register() {
> > >     val advertisedHostName =
> > >       if(advertisedHost == null || advertisedHost.trim.isEmpty)
> > >         InetAddress.getLocalHost.getCanonicalHostName
> > >       else
> > >         advertisedHost
> > >     val jmxPort =
> > > System.getProperty("com.sun.management.jmxremote.port", "-1").toInt
> > >     ZkUtils.registerBrokerInZk(zkClient, brokerId, advertisedHostName,
> > > advertisedPort, zkSessionTimeoutMs, jmxPort)
> > >   }
> > >
> > > So never mind :)
> > >
> > >
> > > On Fri, Oct 17, 2014 at 6:36 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > >> Hmm, yes, actually I don't think I actually understand the issue.
> Basically
> > >> as I understand it we do InetAddress.getLocalHost.getHostAddress
> which on
> > >> AWS picks the wrong hostname/ip and then the producer can't connect.
> People
> > >> eventually find this FAQ, but I was hoping there was a more automatic
> way
> > >> since everyone is on AWS these days. Maybe getCanonicalHostName would
> fix
> > >> it?
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers
> > >> ?
> > >>
> > >> -Jay
> > >>
> > >> On Fri, Oct 17, 2014 at 3:19 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> > >>
> > >>> In #2, do you refer to advertising the "internal" hostname instead of
> > >>> the external one?
> > >>> In this case, will it be enough to use getCanonicalHostName (which
> > >>> uses a name service)?
> > >>>
> > >>> Note that I think the problem the blog reported (wrong name
> > >>> advertised) is somewhat orthogonal to the question of which interface
> > >>> we bind to (which should probably be the default interface).
> > >>>
> > >>> Gwen
> > >>>
> > >>> On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > >>> > This guy documented a few struggles getting going with Kafka. Not
> sure if
> > >>> > there is anything we can do to make it better?
> > >>> > http://ispyker.blogspot.com/2014/10/kafka-part-1.html
> > >>> >
> > >>> > 1. Would be great to figure out the apache/gradle thing.
> > >>> > 2. The problem of having Kafka advertise localhost on AWS is really
> > >>> common.
> > >>> > I was thinking one possible solution for this would be to get all
> the
> > >>> > interfaces and prefer non-localhost interfaces if they exist.
> > >>> >
> > >>> > -Jay
> > >>>
>

Reply via email to