The newest version of Spark came out today.
https://spark.apache.org/releases/spark-release-1-3-0.html
Apparently they made improvements to the Kafka connector for Spark
Streaming (see Approach 2):
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
Best,
Niek
Many similarities.
For Kinesis right now:
* only a 1 day max retention
* max 50KB message size
* guaranteed throughput based on MB/sec in and out.
* servers hosting the shards abstracted away by SaaS
For collaborative consumption, Kinesis uses DynamoDB whereas Kafka
uses Zookeeper.
Until recent
Congrats!
On Thu, Nov 6, 2014 at 10:28 AM, Jay Kreps wrote:
> Hey all,
>
> I’m happy to announce that Jun Rao, Neha Narkhede and I are creating a
> company around Kafka called Confluent. We are planning on productizing the
> kind of Kafka-based real-time data platform we built out at LinkedIn. We
If you really only care about small scale (no HA, no horizontal
scaling), you could also consider using Redis instead of Kafka for
queueing.
- Niek
On Tue, May 20, 2014 at 2:23 PM, S Ahmed wrote:
> Yes agreed, but I have done some load testing before and kafka was doing
> 10's of thousands of m
How are you measuring memory usage? I would expect the OS page cache
to take 100% of unused memory, but that's not the same as being OOM.
- Niek
On Mon, Mar 24, 2014 at 12:38 PM, Cassa L wrote:
> Hi,
> We have been doing some evaluation testing against Kafka.We have 48GB RAM
> on each broker.
A point release focusing on stability would definitely be nice.
And maybe a table in a wiki marking the stability of various features
(core, replication, synchronous messageing, compaction, rebalancing,
topic create, etc), so that people don't end up in the danger zone on
prod deployments.
- Niek
Using a custom partitioner lets you do a "gather" step and exploit data
locality.
Example use case: topic messages consumer splits message by customer id.
Each customer id has their own database table. With a custom partitioner,
you can send all data for a given customer id to same partition and
With at-least-once, you can retry until your target confirms delivery.
Trivial.
Exactly once means handling all sorts of nasty cases. E.g. delivered but
not confirmed by recipient due to crash. Delivered but not yet processed
by client because its in incoming queue. (Sending again would mean >