If spooldir doesn't suit, there's also https://github.com/streamthoughts/kafka-connect-file-pulse to check out. Also bear in mind tools like filebeat from Elastic support Kafka as a target.
-- Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff On Wed, 15 Jan 2020 at 12:48, George <george...@gmail.com> wrote: > Hi Tom > > will do. for now I have 4 specific file types I need to ingest. > > 1. reading apache web server log files, http.log's. > 2. reading in our custom log files > 3. reading in log4j log files > 4. mysql connection as a source > 5. cassandra connection, as a sink > > I can not use NFS mounting the source file system to the Connect cluster, > we don't allow NFS. > > I'm hoping to pull #1-#3 in as each line a the value field of a JSON > message, then maybe use stream process, or kSQL to unpack into a 2nd > message which can then be consumed, analysed etc. > > bit amazed there is not a predefined connector for http logs files though > > G > > > On Wed, Jan 15, 2020 at 12:32 PM Tom Bentley <tbent...@redhat.com> wrote: > > > Hi George, > > > > Since you mentioned CDC specifically you might want to check out > Debezium ( > > https://debezium.io/) which operates as a connector of the sort Robin > > referred to and does CDC for MySQL and others. > > > > Cheers, > > > > Tom > > > > On Wed, Jan 15, 2020 at 10:18 AM Robin Moffatt <ro...@confluent.io> > wrote: > > > > > The integration part of Apache Kafka that you're talking about is > > > called Kafka Connect. Kafka Connect runs as its own process, known as > > > a Kafka Connect Worker, either on its own or as part of a cluster. > Kafka > > > Connect will usually be deployed on a separate instance from the Kafka > > > brokers. > > > > > > Kafka Connect connectors will usually connect to the external system > over > > > the network if that makes sense (e.g. a database) but not always (e.g. > if > > > its acting as a syslog endpoint, or maybe processing local files). > > > > > > You can learn more about Kafka Connect and its deployment model here: > > > https://rmoff.dev/crunch19-zero-to-hero-kafka-connect > > > > > > > > > -- > > > > > > Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | > @rmoff > > > > > > > > > On Wed, 15 Jan 2020 at 03:43, George <george...@gmail.com> wrote: > > > > > > > Hi all. > > > > > > > > Please advise, a real noob here still, unpacking how the stack still > > > > works... > > > > > > > > if I have a mySQL server, or a web server, or a 2 node JBOSS cluster. > > > > > > > > If I want to use the mysql connector to connect to the MySQL DB to > pull > > > > data using CDC... then I need to install the Kafka stack on the DB > > > server, > > > > I understand that this will be a stand alone install, assume with no > > > > zookeeper involved. > > > > > > > > Similarly for the apache web server and the 2 JBOSS servers > > > > > > > > G > > > > > > > > -- > > > > You have the obligation to inform one honestly of the risk, and as a > > > person > > > > you are committed to educate yourself to the total risk in any > > activity! > > > > > > > > Once informed & totally aware of the risk, > > > > every fool has the right to kill or injure themselves as they see > fit! > > > > > > > > > > > > -- > You have the obligation to inform one honestly of the risk, and as a person > you are committed to educate yourself to the total risk in any activity! > > Once informed & totally aware of the risk, > every fool has the right to kill or injure themselves as they see fit! >