Kafka and Spark 1.3.0

2015-03-13 Thread Niek Sanders
The newest version of Spark came out today. https://spark.apache.org/releases/spark-release-1-3-0.html Apparently they made improvements to the Kafka connector for Spark Streaming (see Approach 2): http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html Best, Niek

Re: Is AWS Kinesis Kafka?

2014-11-13 Thread Niek Sanders
Many similarities. For Kinesis right now: * only a 1 day max retention * max 50KB message size * guaranteed throughput based on MB/sec in and out. * servers hosting the shards abstracted away by SaaS For collaborative consumption, Kinesis uses DynamoDB whereas Kafka uses Zookeeper. Until recent

Re: Announcing Confluent

2014-11-06 Thread Niek Sanders
Congrats! On Thu, Nov 6, 2014 at 10:28 AM, Jay Kreps wrote: > Hey all, > > I’m happy to announce that Jun Rao, Neha Narkhede and I are creating a > company around Kafka called Confluent. We are planning on productizing the > kind of Kafka-based real-time data platform we built out at LinkedIn. We

Re: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka

2014-05-20 Thread Niek Sanders
If you really only care about small scale (no HA, no horizontal scaling), you could also consider using Redis instead of Kafka for queueing. - Niek On Tue, May 20, 2014 at 2:23 PM, S Ahmed wrote: > Yes agreed, but I have done some load testing before and kafka was doing > 10's of thousands of m

Re: Memory consumption in Kafka

2014-03-24 Thread Niek Sanders
How are you measuring memory usage? I would expect the OS page cache to take 100% of unused memory, but that's not the same as being OOM. - Niek On Mon, Mar 24, 2014 at 12:38 PM, Cassa L wrote: > Hi, > We have been doing some evaluation testing against Kafka.We have 48GB RAM > on each broker.

Re: 0.8.1 stability

2014-03-18 Thread Niek Sanders
A point release focusing on stability would definitely be nice. And maybe a table in a wiki marking the stability of various features (core, replication, synchronous messageing, compaction, rebalancing, topic create, etc), so that people don't end up in the danger zone on prod deployments. - Niek

Re: Why would one choose a partition when producing?

2013-11-05 Thread Niek Sanders
Using a custom partitioner lets you do a "gather" step and exploit data locality. Example use case: topic messages consumer splits message by customer id. Each customer id has their own database table. With a custom partitioner, you can send all data for a given customer id to same partition and

Re: at-least-once guarantee?

2013-08-07 Thread Niek Sanders
With at-least-once, you can retry until your target confirms delivery. Trivial. Exactly once means handling all sorts of nasty cases. E.g. delivered but not confirmed by recipient due to crash. Delivered but not yet processed by client because its in incoming queue. (Sending again would mean >