Build failed in Jenkins: Kafka-trunk #187

2014-05-20 Thread Apache Jenkins Server
See 

Changes:

[jjkoshy] KAFKA-1437; Consumer metadata response should include (empty) 
coordinator information if the coordinator is unavailable; reviewed by Neha 
Narkhede and Guozhang Wang.

[neha.narkhede] KAFKA-1431 ConsoleConsumer - Option to clean zk consumer 
path;reviewed by Neha Narkhede and Jun Rao

[jay.kreps] KAFKA-1445 Send all partitions, regardless of how full, whenever we 
are sending a request to a broker. Patch from Guozhang.

[neha.narkhede] KAFKA-1179 createMessageStreams() in 
javaapi.ZookeeperConsumerConnector does not throw; reviewed by Neha Narkhede

[junrao] kafka-1453; Add a channel queue jmx in Mirror Maker;  patched by 
Guozhang Wang; reviewed by Jun Rao

[junrao] kafka-1453 (follow-up); Add a channel queue jmx in Mirror Maker;  
patched by Guozhang Wang; reviewed by Jun Rao

[junrao] kafka-1453 (2nd follow-up); Add a channel queue jmx in Mirror Maker;  
patched by Guozhang Wang; reviewed by Jun Rao

--
[...truncated 1689 lines...]

kafka.integration.PrimitiveApiTest > testMultiProduce FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:144)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:125)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:32)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33)
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:39)

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:144)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:125)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:32)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33)
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:39)

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:144)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.(NIOServerCnxn.java:125)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:32)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:39)
at 
kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33)
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:39)

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
FAILED
java.net.Bind

Re: Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-20 Thread François Langelier
Take a look at Camus 



François Langelier
Étudiant en génie Logiciel - École de Technologie
Supérieure
Capitaine Club Capra 
VP-Communication - CS Games  2014
Jeux de Génie  2011 à 2014
Argentier Fraternité du Piranha  2012-2014
Comité Organisateur Olympiades ÉTS 2012
Compétition Québécoise d'Ingénierie 2012 - Compétition Senior


On 19 May 2014 05:28, Hangjun Ye  wrote:

> Hi there,
>
> I recently started to use Kafka for our data analysis pipeline and it works
> very well.
>
> One problem to us so far is expanding our cluster when we need more storage
> space.
> Kafka provides some scripts for helping do this but the process wasn't
> smooth.
>
> To make it work perfectly, seems Kafka needs to do some jobs that a
> distributed file system has already done.
> So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe
> make the Kafka storage engine pluggable and HDFS is one option?
>
> The pros might be that HDFS has already handled storage management
> (replication, corrupted disk/machine, migration, load balance, etc.) very
> well and it frees Kafka and the users from the burden, and the cons might
> be performance degradation.
> As Kafka does very well on performance, possibly even with some degree of
> degradation, it's still competitive for the most situations.
>
> Best,
> --
> Hangjun Ye
>


[jira] [Commented] (KAFKA-1444) kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the toString method

2014-05-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003643#comment-14003643
 ] 

Sriharsha Chintalapani commented on KAFKA-1444:
---

Created reviewboard https://reviews.apache.org/r/21712/diff/
 against branch origin/trunk

> kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the 
> toString method
> -
>
> Key: KAFKA-1444
> URL: https://issues.apache.org/jira/browse/KAFKA-1444
> Project: Kafka
>  Issue Type: Bug
>Reporter: Simon Cooper
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-1444.patch
>
>
> the kafka.javaapi.PartitionMetadata and TopicMetadata classes don't forward 
> the toString method to the underlying 
> kafka.api.PartitionMetadata/TopicMetadata classes along with the other 
> methods. This means toString on these classes doesn't work properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1444) kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the toString method

2014-05-20 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1444:
--

Attachment: KAFKA-1444.patch

> kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the 
> toString method
> -
>
> Key: KAFKA-1444
> URL: https://issues.apache.org/jira/browse/KAFKA-1444
> Project: Kafka
>  Issue Type: Bug
>Reporter: Simon Cooper
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-1444.patch
>
>
> the kafka.javaapi.PartitionMetadata and TopicMetadata classes don't forward 
> the toString method to the underlying 
> kafka.api.PartitionMetadata/TopicMetadata classes along with the other 
> methods. This means toString on these classes doesn't work properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21712: Patch for KAFKA-1444

2014-05-20 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21712/
---

Review request for kafka.


Bugs: KAFKA-1444
https://issues.apache.org/jira/browse/KAFKA-1444


Repository: kafka


Description
---

KAFKA-1444. kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward 
the toString method.


Diffs
-

  core/src/main/scala/kafka/javaapi/TopicMetadata.scala 
d08c3f4af51e742ac441e65c97d547f097c169ca 
  core/src/main/scala/kafka/javaapi/TopicMetadataResponse.scala 
252a0c9d6c12b5e79426b7eca85ab0bb067b42b0 

Diff: https://reviews.apache.org/r/21712/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



Re: Review Request 21588: Fix KAFKA-1430

2014-05-20 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21588/#review43423
---


The patch looks promising. The places that we make a call to unblock 
producer/consumer need some adjustment. The following is the summary.

1.To unblock produce requests:
1.1 ack=-1: when HW moves (not completely handled in the patch)
1.2 ack>1: when a follower fetch request is received (not handled in the patch)

2. To unblock regular consumer requests:
2.1 when HW actually moves (not completely handled in the patch)

3. To unblock follower consumer requests:
3.1 when log append completes in leader replica (handled in the patch)

To handle 1.1 and 2.1:
We can probably register an onLeaderAndHWChangeCallback in Partition. The 
callback will be invoked in Partition.maybeIncrementLeaderHW() when HW actually 
moves. The callback will try to unblock both produce and consumer requests. 
Also, when a replica transitions from the leader to the follower, we should 
also call onLeaderAndHWChangeCallback too.

To handle 1.2:
We need to call unblockProducer() in KafkaApis.maybeUpdatePartitionHw().



core/src/main/scala/kafka/cluster/Partition.scala


It seems that we can maintain both segmentBaseOffset and position 
incrementally. We can pass both info through 
ReplicaManager.recordFollowerPosition() while updating the logEndOffset in the 
follower replicas.

Since we always update/read offset/position/baseOffset together, we 
probably can package them better. I was thinking of creating the following case 
class. In each Replica (including the leader replica), we maintain a volatile 
var logEndOffset of OffsetMetadata. Everytime logEndOffset changes, we create a 
new instance of OffsetMetadata and update logEndOffset. This makes sure that 
any reader will always read the three values in OffsetMetadata corresponding to 
a same offset. The implementation in the patch has the issue that a reader may 
see those values corresponding to different offsets since there are maintained 
separately.

case class OffsetMetadata{
  val offset
  val segmentBaseOffset
  val relativePositionInSegment
}

In the leader replica (perhaps it's better in Partition?), we also maintain 
a volatile var highWatermark of OffsetMetadata.



core/src/main/scala/kafka/log/FileMessageSet.scala


Log.read() could return an empty ByteBufferMessageSet. In this case, we 
still need to know the initial fetch position and baseOffset. So, instead of 
tracking position and baseOffset in FileMessageSet, we could probably return a 
FetchDataInfo which includes a MessageSet and position and baseOffset.



core/src/main/scala/kafka/server/KafkaApis.scala


We probably can just simplify the comment to "Add the produce request for 
watch if it's not satisfied".



core/src/main/scala/kafka/server/KafkaApis.scala


Perhaps we can optimize this a bit: only call unblock if HW is actually 
updated. We can return whether HW is updated in recordFollowerPosition().



core/src/main/scala/kafka/server/KafkaApis.scala


typo satisified



core/src/main/scala/kafka/server/RequestPurgatory.scala


either "the rest of" or "remaining"



core/src/main/scala/kafka/server/RequestPurgatory.scala


Should that be decrement?



core/src/main/scala/kafka/server/RequestPurgatory.scala


Should that be decrement?



core/src/main/scala/kafka/server/RequestPurgatory.scala


The purgeSatisfied() here is redundant since it's already done in 
purgeSatisfied().



core/src/main/scala/kafka/server/RequestPurgatory.scala





- Jun Rao


On May 16, 2014, 10:55 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21588/
> ---
> 
> (Updated May 16, 2014, 10:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1430
> https://issues.apache.org/jira/browse/KAFKA-1430
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> 1. change the watch() API to checkAndMaybeWatch(). In that function, 
> purgatory will try to add the delayed request to each keyed watchers list.
> 
> 1.a. When the watcher is trying to add the delayed request, it first check if 
> it is already satisified, and only add the re

[jira] [Commented] (KAFKA-1458) kafka hanging on shutdown

2014-05-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003725#comment-14003725
 ] 

Jun Rao commented on KAFKA-1458:


Could you try 0.8.1.1, which fixed some deadlock issues during shutdown?

> kafka hanging on shutdown
> -
>
> Key: KAFKA-1458
> URL: https://issues.apache.org/jira/browse/KAFKA-1458
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: James Blackburn
>
> I tried to restart the kafka broker because of KAFKA-1407. However a normal 
> kill wouldn't kill it:
> jstack shows:
> {code}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.0-b56 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x1c2e8800 nid=0x6174 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "SIGTERM handler" daemon prio=10 tid=0x1c377800 nid=0x6076 waiting 
> for monitor entry [0x431f2000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "SIGTERM handler" daemon prio=10 tid=0x1c652800 nid=0x6069 waiting 
> for monitor entry [0x430f1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "SIGTERM handler" daemon prio=10 tid=0x1c204000 nid=0x6068 waiting 
> for monitor entry [0x44303000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "SIGTERM handler" daemon prio=10 tid=0x1c319000 nid=0x605b waiting 
> for monitor entry [0x409f3000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "SIGTERM handler" daemon prio=10 tid=0x1c625000 nid=0x604c waiting 
> for monitor entry [0x439fa000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "SIGTERM handler" daemon prio=10 tid=0x1c2e9800 nid=0x5d8b waiting 
> for monitor entry [0x438f9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xd042f068> (a java.lang.Class for 
> java.lang.Shutdown)
> at java.lang.Terminator$1.handle(Terminator.java:52)
> at sun.misc.Signal$1.run(Signal.java:212)
> at java.lang.Thread.run(Thread.java:724)
> "Thread-2" prio=10 tid=0x1c31a000 nid=0x3d4f waiting on condition 
> [0x44707000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xd04f28b8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at kafka.utils.Utils$.inLock(Utils.scala:536)
> at 
> kafka.controller.KafkaController.shutdo

[jira] [Updated] (KAFKA-1459) kafka.tools.ConsumerOffsetChecker throws NoNodeException

2014-05-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1459:
---

Labels: newbie  (was: )

> kafka.tools.ConsumerOffsetChecker throws NoNodeException
> 
>
> Key: KAFKA-1459
> URL: https://issues.apache.org/jira/browse/KAFKA-1459
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Scott Clasen
>  Labels: newbie
>
> When using the kafka.tools.ConsumerOffsetChecker to check offsets for 
> consumers that are doing manual offset management, and offsets for some but 
> not all partitions have been stored, the offset checker will throw a no node 
> exception.  It should probably return 0 for partitions that dont have an 
> offset recorded yet.
> In this case I was using github.com/pinterest/secor, which may read thousands 
> or millions of messages from a partition before committing an offset.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1449) Extend wire protocol to allow CRC32C

2014-05-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003736#comment-14003736
 ] 

Jun Rao commented on KAFKA-1449:


If we need to change the attribute or the magic in the message format, we 
probably have to wait until all consumers are upgraded before allowing the 
producer/broker to use the new format. So, the change will have to guarded by a 
config option that defaults to not enabling the feature.

> Extend wire protocol to allow CRC32C
> 
>
> Key: KAFKA-1449
> URL: https://issues.apache.org/jira/browse/KAFKA-1449
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Albert Strasheim
>Assignee: Neha Narkhede
> Fix For: 0.9.0
>
>
> Howdy
> We are currently building out a number of Kafka consumers in Go, based on a 
> patched version of the Sarama library that Shopify released a while back.
> We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
> and lots of cores. We have various consumers computing all kinds of 
> aggregates on a reasonably high volume access log stream (1.1e6 messages/sec 
> peak, about 500-600 bytes per message uncompressed).
> When profiling our consumer, our single hottest function (until we disabled 
> it), was the CRC32 checksum validation, since the deserialization and 
> aggregation in these consumers is pretty cheap.
> We believe things could be improved by extending the wire protocol to support 
> CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
> calculation.
> https://en.wikipedia.org/wiki/SSE4#SSE4.2
> It might be hard to use from Java, but consumers written in most other 
> languages will benefit a lot.
> To give you an idea, here are some benchmarks for the Go CRC32 functions 
> running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:
> BenchmarkCrc32KB   90196 ns/op 363.30 MB/s
> BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s
> I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
> CRC32-C speed should be close to what one achieves in Go.
> (Met Todd and Clark at the meetup last night. Thanks for the great 
> presentation!)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1462) Add new request and response formats for the new consumer and coordinator communication

2014-05-20 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-1462:


 Summary: Add new request and response formats for the new consumer 
and coordinator communication
 Key: KAFKA-1462
 URL: https://issues.apache.org/jira/browse/KAFKA-1462
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang


We need to add the request / response formats according to the new format 
protocol once their design is final:

https://cwiki.apache.org/confluence/display/KAFKA
/Kafka+0.9+Consumer+Rewrite+Design



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1462) Add new request and response formats for the new consumer and coordinator communication

2014-05-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1462:
-

Fix Version/s: 0.9.0

> Add new request and response formats for the new consumer and coordinator 
> communication
> ---
>
> Key: KAFKA-1462
> URL: https://issues.apache.org/jira/browse/KAFKA-1462
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
> Fix For: 0.9.0
>
>
> We need to add the request / response formats according to the new format 
> protocol once their design is final:
> https://cwiki.apache.org/confluence/display/KAFKA
> /Kafka+0.9+Consumer+Rewrite+Design



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1316) Refactor Sender

2014-05-20 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004009#comment-14004009
 ] 

Jay Kreps commented on KAFKA-1316:
--

I started on this and I think I have a design that at least works for the 
producer, let's think if it would also work for the consumer.

The plan is that Sender remains, but most of the logic is moved into a new 
class Client which will be shared by the producer, consumer, and any additional 
clients (admin, for example).

The Client class will incapsulate the connection state and the management of 
metadata. The client will exposes two methods:
{noformat}
class Client {
  /* Initiate a connection to the given node (if one doesn't already exist). 
Return true if we already have a ready connection. */
  boolean ready(Node node, long now);

  /* Send new requests and return any completed requests */
  List poll(List requests, long ms);
}
{noformat}

The poll request takes a list of requests for ready connections and attempts to 
send them. It returns any completed requests. The responses returned will not 
generally be for the requests being sent but for previous requests.

ClientRequest is just a renaming and generalization of InFlightRequest. 
ClientResponse is a new class that will reference the original ClientRequest as 
well as maintain the response information received (which we currently handle 
inline in Sender).

The user of this class (e.g. Sender) has to use the ready() method to ensure it 
only initiates requests to ready connections.

What needs to be thought through is whether these APIs suffice for the consumer.

> Refactor Sender
> ---
>
> Key: KAFKA-1316
> URL: https://issues.apache.org/jira/browse/KAFKA-1316
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Jay Kreps
>Assignee: Jay Kreps
>
> Currently most of the logic of the producer I/O thread is in Sender.java.
> However we will need to do a fair number of similar things in the new 
> consumer. Specifically:
>  - Track in-flight requests
>  - Fetch metadata
>  - Manage connection lifecycle
> It may be possible to refactor some of this into a helper class that can be 
> shared with the consumer. This will require some detailed thought.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Neha Narkhede


> On May 19, 2014, 5:30 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java, 
> > lines 70-71
> > 
> >
> > Instead of introducing ConsumerRecords, I'd prefer just returning 
> > Map>.
> 
> Guozhang Wang wrote:
> The question I think we are trying to answer is how to expose the 
> per-partition error code back to the user. So far it seems we do not have an 
> ideal solution yet.

Agree with Guozhang. For that, it's worth keeping it the way it is.


> On May 19, 2014, 5:30 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java, 
> > lines 99-119
> > 
> >
> > Since these apis have Map in the return value, they are really intended 
> > as a batch api. So, would it better to have the input as a set of 
> > TopicPartitions? This will also make sure that the passed in partitions are 
> > unique.

We discussed this offline, changing it to Collection.


> On May 19, 2014, 5:30 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/internals/FutureOffsetMetadata.java,
> >  lines 1-36
> > 
> >
> > Do we still need this class?

Removed.


> On May 19, 2014, 5:30 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java,
> >  lines 36-47
> > 
> >
> > Would it be better to use Set[TopicPartition] instead of ellipsis? This 
> > will make it clear that they are unique.

Changing to Collection.


> On May 19, 2014, 5:30 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java, 
> > lines 84-87
> > 
> >
> > It's going to be a bit confusing to the caller to expect both an error 
> > code in the return value and an exception. It seems that we can just 
> > translate exceptions into error codes. In async mode, the return value will 
> > be null. So it's impossible for the caller to get the error code. However, 
> > by choosing the async mode, the caller doesn't assume the commit to succeed 
> > immediately. It's probably ok just to try to commit again when it's called.

Makes sense. Changed the javadoc to reflect that it does not throw any exception


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/#review43365
---


On May 16, 2014, 6:46 p.m., Neha Narkhede wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19731/
> ---
> 
> (Updated May 16, 2014, 6:46 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1328
> https://issues.apache.org/jira/browse/KAFKA-1328
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> 1. Improved documentation on the position() API 2. Changed signature of 
> commit API to remove Future and include a sync flag
> 
> 
> Included Jun's review suggestions part 2, except change to the commit() API 
> since it needs more thought
> 
> 
> Review comments from Jun and Guozhang
> 
> 
> Checked in ConsumerRecordMetadata
> 
> 
> Fixed the javadoc usage examples in KafkaConsumer to match the API changes
> 
> 
> Changed the signature of poll to return Map to 
> organize the ConsumerRecords around topic and then optionally around 
> partition. This will serve the group management as well as custom partition 
> subscription use cases
> 
> 
> 1. Changed the signature of poll() to return Map List> 2. Changed ConsumerRecord to throw an exception if an 
> error is detected for the partition. For example, if a single large message 
> is larger than the total memory just for that partition, we don't want poll() 
> to throw an exception since that will affect the processing of the remaining 
> partitions as well
> 
> 
> Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) 
> mutually exclusive
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> 1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
> Future
> 
> 
> Fixed configs to match the producer side configs for metrics
> 
> 
> Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG
> 
> 
> Addressing review comments from Tim and Guozhang
> 
> 
> Rebasing after producer side config cleanup
> 
> 
> Added license headers
> 
> 
> Cleaned java

Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Neha Narkhede


> On May 19, 2014, 6:22 p.m., Guozhang Wang wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java, 
> > lines 99-119
> > 
> >
> > Should we discuss about code conventions of using ellipse (i.e. an 
> > array list) versus using Set, since Jun's reason here also applies to 
> > subscribe/unsubscribe?

It depends on the nature of usage. It is inconvenient to use 
subscribe/unsubscribe with a Set (requires multiple lines of code). Also this 
call modifies in memory state so it doesn't need to be batched. Since 
performance is not a concern and it aids usability, I would argue to keep the 
ellipses in subscribe/unsubscribe and probably change it to Collection in other 
places. 


> On May 19, 2014, 6:22 p.m., Guozhang Wang wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java,
> >  lines 39-53
> > 
> >
> > I thought originally this class is used to wrap and expose the 
> > per-partition error code so that we do not need to throw an exception on 
> > every record of that partition. Is that still true?

The purpose of this class is to return the records grouped easily by topic as 
well as (optionally) a partition


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/#review43385
---


On May 16, 2014, 6:46 p.m., Neha Narkhede wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19731/
> ---
> 
> (Updated May 16, 2014, 6:46 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1328
> https://issues.apache.org/jira/browse/KAFKA-1328
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> 1. Improved documentation on the position() API 2. Changed signature of 
> commit API to remove Future and include a sync flag
> 
> 
> Included Jun's review suggestions part 2, except change to the commit() API 
> since it needs more thought
> 
> 
> Review comments from Jun and Guozhang
> 
> 
> Checked in ConsumerRecordMetadata
> 
> 
> Fixed the javadoc usage examples in KafkaConsumer to match the API changes
> 
> 
> Changed the signature of poll to return Map to 
> organize the ConsumerRecords around topic and then optionally around 
> partition. This will serve the group management as well as custom partition 
> subscription use cases
> 
> 
> 1. Changed the signature of poll() to return Map List> 2. Changed ConsumerRecord to throw an exception if an 
> error is detected for the partition. For example, if a single large message 
> is larger than the total memory just for that partition, we don't want poll() 
> to throw an exception since that will affect the processing of the remaining 
> partitions as well
> 
> 
> Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) 
> mutually exclusive
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> 1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
> Future
> 
> 
> Fixed configs to match the producer side configs for metrics
> 
> 
> Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG
> 
> 
> Addressing review comments from Tim and Guozhang
> 
> 
> Rebasing after producer side config cleanup
> 
> 
> Added license headers
> 
> 
> Cleaned javadoc for ConsumerConfig
> 
> 
> Fixed minor indentation in ConsumerConfig
> 
> 
> Improve docs on ConsumerConfig
> 
> 
> 1. Added ClientUtils 2. Added basic constructor implementation for 
> KafkaConsumer
> 
> 
> Improved MockConsumer
> 
> 
> Chris's feedback and also consumer rewind example code
> 
> 
> Added commit() and commitAsync() APIs to the consumer and updated docs and 
> examples to reflect that
> 
> 
> 1. Added consumer usage examples to javadoc 2. Changed signature of APIs that 
> accept or return offsets from list of offsets to map of offsets
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Included Jun's review comments and renamed positions to seek. Also included 
> position()
> 
> 
> Changes to javadoc for positions()
> 
> 
> Changed the javadoc for ConsumerRebalanceCallback
> 
> 
> Changing unsubscribe to also take in var args for topic list
> 
> 
> Incorporated first round of feedback from Jay, Pradeep and Mattijs on the 
> mailing list
> 
> 
> Updated configs
> 
> 
> Javadoc for consumer complete
> 
> 
> Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer
> 
> 
> Added the i

Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Neha Narkhede


> On May 19, 2014, 6:51 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java,
> >  lines 27-28
> > 
> >
> > Have we added the config for selecting the strategy for assigning 
> > partitions?

yes


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/#review43395
---


On May 20, 2014, 10:55 p.m., Neha Narkhede wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19731/
> ---
> 
> (Updated May 20, 2014, 10:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1328
> https://issues.apache.org/jira/browse/KAFKA-1328
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Converted ellipsis to Collection in a bunch of places, removed TimeUnit from 
> the poll() API
> 
> 
> 1. Improved documentation on the position() API 2. Changed signature of 
> commit API to remove Future and include a sync flag
> 
> 
> Included Jun's review suggestions part 2, except change to the commit() API 
> since it needs more thought
> 
> 
> Review comments from Jun and Guozhang
> 
> 
> Checked in ConsumerRecordMetadata
> 
> 
> Fixed the javadoc usage examples in KafkaConsumer to match the API changes
> 
> 
> Changed the signature of poll to return Map to 
> organize the ConsumerRecords around topic and then optionally around 
> partition. This will serve the group management as well as custom partition 
> subscription use cases
> 
> 
> 1. Changed the signature of poll() to return Map List> 2. Changed ConsumerRecord to throw an exception if an 
> error is detected for the partition. For example, if a single large message 
> is larger than the total memory just for that partition, we don't want poll() 
> to throw an exception since that will affect the processing of the remaining 
> partitions as well
> 
> 
> Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) 
> mutually exclusive
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> 1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
> Future
> 
> 
> Fixed configs to match the producer side configs for metrics
> 
> 
> Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG
> 
> 
> Addressing review comments from Tim and Guozhang
> 
> 
> Rebasing after producer side config cleanup
> 
> 
> Added license headers
> 
> 
> Cleaned javadoc for ConsumerConfig
> 
> 
> Fixed minor indentation in ConsumerConfig
> 
> 
> Improve docs on ConsumerConfig
> 
> 
> 1. Added ClientUtils 2. Added basic constructor implementation for 
> KafkaConsumer
> 
> 
> Improved MockConsumer
> 
> 
> Chris's feedback and also consumer rewind example code
> 
> 
> Added commit() and commitAsync() APIs to the consumer and updated docs and 
> examples to reflect that
> 
> 
> 1. Added consumer usage examples to javadoc 2. Changed signature of APIs that 
> accept or return offsets from list of offsets to map of offsets
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Included Jun's review comments and renamed positions to seek. Also included 
> position()
> 
> 
> Changes to javadoc for positions()
> 
> 
> Changed the javadoc for ConsumerRebalanceCallback
> 
> 
> Changing unsubscribe to also take in var args for topic list
> 
> 
> Incorporated first round of feedback from Jay, Pradeep and Mattijs on the 
> mailing list
> 
> 
> Updated configs
> 
> 
> Javadoc for consumer complete
> 
> 
> Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer
> 
> 
> Added the initial interfaces and related documentation for the consumer. More 
> docs required to complete the public API
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
> PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java
>  PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java 
> PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/OffsetMetadata.java 
> PRE-CRE

Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/
---

(Updated May 20, 2014, 10:55 p.m.)


Review request for kafka.


Bugs: KAFKA-1328
https://issues.apache.org/jira/browse/KAFKA-1328


Repository: kafka


Description (updated)
---

Converted ellipsis to Collection in a bunch of places, removed TimeUnit from 
the poll() API


1. Improved documentation on the position() API 2. Changed signature of commit 
API to remove Future and include a sync flag


Included Jun's review suggestions part 2, except change to the commit() API 
since it needs more thought


Review comments from Jun and Guozhang


Checked in ConsumerRecordMetadata


Fixed the javadoc usage examples in KafkaConsumer to match the API changes


Changed the signature of poll to return Map to 
organize the ConsumerRecords around topic and then optionally around partition. 
This will serve the group management as well as custom partition subscription 
use cases


1. Changed the signature of poll() to return Map> 
2. Changed ConsumerRecord to throw an exception if an error is detected for the 
partition. For example, if a single large message is larger than the total 
memory just for that partition, we don't want poll() to throw an exception 
since that will affect the processing of the remaining partitions as well


Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) mutually 
exclusive


Changed the package to org.apache.kafka.clients.consumer from 
kafka.clients.consumer


Changed the package to org.apache.kafka.clients.consumer from 
kafka.clients.consumer


1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
Future


Fixed configs to match the producer side configs for metrics


Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG


Addressing review comments from Tim and Guozhang


Rebasing after producer side config cleanup


Added license headers


Cleaned javadoc for ConsumerConfig


Fixed minor indentation in ConsumerConfig


Improve docs on ConsumerConfig


1. Added ClientUtils 2. Added basic constructor implementation for KafkaConsumer


Improved MockConsumer


Chris's feedback and also consumer rewind example code


Added commit() and commitAsync() APIs to the consumer and updated docs and 
examples to reflect that


1. Added consumer usage examples to javadoc 2. Changed signature of APIs that 
accept or return offsets from list of offsets to map of offsets


Improved example for using ConsumerRebalanceCallback


Improved example for using ConsumerRebalanceCallback


Included Jun's review comments and renamed positions to seek. Also included 
position()


Changes to javadoc for positions()


Changed the javadoc for ConsumerRebalanceCallback


Changing unsubscribe to also take in var args for topic list


Incorporated first round of feedback from Jay, Pradeep and Mattijs on the 
mailing list


Updated configs


Javadoc for consumer complete


Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer


Added the initial interfaces and related documentation for the consumer. More 
docs required to complete the public API


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/OffsetMetadata.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
90cacbd8941b7c8f15d1417c821945c1ac1b4d00 
  clients/src/main/java/org/apache/kafka/common/utils/ClientUtils.java 
PRE-CREATION 
  
clients/src/test/java/org/apache/kafka/clients/consumer/ConsumerExampleTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/19731/diff/


Testing
---


Thanks,

Neha Narkhede



[jira] [Commented] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004094#comment-14004094
 ] 

Neha Narkhede commented on KAFKA-1328:
--

Updated reviewboard https://reviews.apache.org/r/19731/
 against branch trunk

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1328:
-

Attachment: KAFKA-1328_2014-05-20_15:55:01.patch

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/
---

(Updated May 20, 2014, 11:34 p.m.)


Review request for kafka.


Bugs: KAFKA-1328
https://issues.apache.org/jira/browse/KAFKA-1328


Repository: kafka


Description (updated)
---

Fixed inconsistent javadoc for position(), committed() and offsetsBeforeTime()


Converted ellipsis to Collection in a bunch of places, removed TimeUnit from 
the poll() API


1. Improved documentation on the position() API 2. Changed signature of commit 
API to remove Future and include a sync flag


Included Jun's review suggestions part 2, except change to the commit() API 
since it needs more thought


Review comments from Jun and Guozhang


Checked in ConsumerRecordMetadata


Fixed the javadoc usage examples in KafkaConsumer to match the API changes


Changed the signature of poll to return Map to 
organize the ConsumerRecords around topic and then optionally around partition. 
This will serve the group management as well as custom partition subscription 
use cases


1. Changed the signature of poll() to return Map> 
2. Changed ConsumerRecord to throw an exception if an error is detected for the 
partition. For example, if a single large message is larger than the total 
memory just for that partition, we don't want poll() to throw an exception 
since that will affect the processing of the remaining partitions as well


Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) mutually 
exclusive


Changed the package to org.apache.kafka.clients.consumer from 
kafka.clients.consumer


Changed the package to org.apache.kafka.clients.consumer from 
kafka.clients.consumer


1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
Future


Fixed configs to match the producer side configs for metrics


Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG


Addressing review comments from Tim and Guozhang


Rebasing after producer side config cleanup


Added license headers


Cleaned javadoc for ConsumerConfig


Fixed minor indentation in ConsumerConfig


Improve docs on ConsumerConfig


1. Added ClientUtils 2. Added basic constructor implementation for KafkaConsumer


Improved MockConsumer


Chris's feedback and also consumer rewind example code


Added commit() and commitAsync() APIs to the consumer and updated docs and 
examples to reflect that


1. Added consumer usage examples to javadoc 2. Changed signature of APIs that 
accept or return offsets from list of offsets to map of offsets


Improved example for using ConsumerRebalanceCallback


Improved example for using ConsumerRebalanceCallback


Included Jun's review comments and renamed positions to seek. Also included 
position()


Changes to javadoc for positions()


Changed the javadoc for ConsumerRebalanceCallback


Changing unsubscribe to also take in var args for topic list


Incorporated first round of feedback from Jay, Pradeep and Mattijs on the 
mailing list


Updated configs


Javadoc for consumer complete


Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer


Added the initial interfaces and related documentation for the consumer. More 
docs required to complete the public API


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/OffsetMetadata.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
90cacbd8941b7c8f15d1417c821945c1ac1b4d00 
  clients/src/main/java/org/apache/kafka/common/utils/ClientUtils.java 
PRE-CREATION 
  
clients/src/test/java/org/apache/kafka/clients/consumer/ConsumerExampleTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/19731/diff/


Testing
---


Thanks,

Neha Narkhede



[jira] [Commented] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004127#comment-14004127
 ] 

Neha Narkhede commented on KAFKA-1328:
--

Updated reviewboard https://reviews.apache.org/r/19731/
 against branch trunk

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch, 
> KAFKA-1328_2014-05-20_16:34:37.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1328:
-

Attachment: KAFKA-1328_2014-05-20_16:34:37.patch

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch, 
> KAFKA-1328_2014-05-20_16:34:37.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 19731: Patch for KAFKA-1328

2014-05-20 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19731/#review43558
---

Ship it!



clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java


Could we add in the comment that the timeout is in ms?


- Jun Rao


On May 20, 2014, 11:34 p.m., Neha Narkhede wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19731/
> ---
> 
> (Updated May 20, 2014, 11:34 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1328
> https://issues.apache.org/jira/browse/KAFKA-1328
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixed inconsistent javadoc for position(), committed() and offsetsBeforeTime()
> 
> 
> Converted ellipsis to Collection in a bunch of places, removed TimeUnit from 
> the poll() API
> 
> 
> 1. Improved documentation on the position() API 2. Changed signature of 
> commit API to remove Future and include a sync flag
> 
> 
> Included Jun's review suggestions part 2, except change to the commit() API 
> since it needs more thought
> 
> 
> Review comments from Jun and Guozhang
> 
> 
> Checked in ConsumerRecordMetadata
> 
> 
> Fixed the javadoc usage examples in KafkaConsumer to match the API changes
> 
> 
> Changed the signature of poll to return Map to 
> organize the ConsumerRecords around topic and then optionally around 
> partition. This will serve the group management as well as custom partition 
> subscription use cases
> 
> 
> 1. Changed the signature of poll() to return Map List> 2. Changed ConsumerRecord to throw an exception if an 
> error is detected for the partition. For example, if a single large message 
> is larger than the total memory just for that partition, we don't want poll() 
> to throw an exception since that will affect the processing of the remaining 
> partitions as well
> 
> 
> Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) 
> mutually exclusive
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> Changed the package to org.apache.kafka.clients.consumer from 
> kafka.clients.consumer
> 
> 
> 1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a 
> Future
> 
> 
> Fixed configs to match the producer side configs for metrics
> 
> 
> Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG
> 
> 
> Addressing review comments from Tim and Guozhang
> 
> 
> Rebasing after producer side config cleanup
> 
> 
> Added license headers
> 
> 
> Cleaned javadoc for ConsumerConfig
> 
> 
> Fixed minor indentation in ConsumerConfig
> 
> 
> Improve docs on ConsumerConfig
> 
> 
> 1. Added ClientUtils 2. Added basic constructor implementation for 
> KafkaConsumer
> 
> 
> Improved MockConsumer
> 
> 
> Chris's feedback and also consumer rewind example code
> 
> 
> Added commit() and commitAsync() APIs to the consumer and updated docs and 
> examples to reflect that
> 
> 
> 1. Added consumer usage examples to javadoc 2. Changed signature of APIs that 
> accept or return offsets from list of offsets to map of offsets
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Improved example for using ConsumerRebalanceCallback
> 
> 
> Included Jun's review comments and renamed positions to seek. Also included 
> position()
> 
> 
> Changes to javadoc for positions()
> 
> 
> Changed the javadoc for ConsumerRebalanceCallback
> 
> 
> Changing unsubscribe to also take in var args for topic list
> 
> 
> Incorporated first round of feedback from Jay, Pradeep and Mattijs on the 
> mailing list
> 
> 
> Updated configs
> 
> 
> Javadoc for consumer complete
> 
> 
> Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer
> 
> 
> Added the initial interfaces and related documentation for the consumer. More 
> docs required to complete the public API
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
> PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java
>  PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java 
> PRE-CREATION 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
> PRE-CREATION 
>   clients/src/main/java/org/apache/kafka/clients/consumer/OffsetMetadata.java 
> PRE-CREATION 
>   clients/src/main/java/org/apac

[jira] [Commented] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004136#comment-14004136
 ] 

Neha Narkhede commented on KAFKA-1328:
--

Checked in the latest patch. Will keep this JIRA open until the javadoc is 
updated with the new APIs.

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch, 
> KAFKA-1328_2014-05-20_16:34:37.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21744: Patch for KAFKA-1446

2014-05-20 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21744/
---

Review request for kafka.


Bugs: KAFKA-1446
https://issues.apache.org/jira/browse/KAFKA-1446


Repository: kafka


Description
---

KAFKA-1446.Consumer metrics for rebalance.


Diffs
-

  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
c032d263b5eedd8f3075e92cc9a9b0be864720ec 

Diff: https://reviews.apache.org/r/21744/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



[jira] [Updated] (KAFKA-1446) Consumer metrics for rebalance

2014-05-20 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1446:
--

Attachment: KAFKA-1446.patch

> Consumer metrics for rebalance
> --
>
> Key: KAFKA-1446
> URL: https://issues.apache.org/jira/browse/KAFKA-1446
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Clark Haskins
>  Labels: newbie, usability
> Attachments: KAFKA-1446.patch
>
>
> The Kafka consumer should have metrics around the number of seconds spent in 
> rebalance over the last minute as well as the number of rebalances started 
> during the previous minute.
> The other important thing about these metrics is that they should only be 
> updated once per minute for example the rebalance time should not increase 
> from second to second during a rebalance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1446) Consumer metrics for rebalance

2014-05-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004167#comment-14004167
 ] 

Sriharsha Chintalapani commented on KAFKA-1446:
---

Created reviewboard https://reviews.apache.org/r/21744/diff/
 against branch origin/trunk

> Consumer metrics for rebalance
> --
>
> Key: KAFKA-1446
> URL: https://issues.apache.org/jira/browse/KAFKA-1446
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Clark Haskins
>  Labels: newbie, usability
> Attachments: KAFKA-1446.patch
>
>
> The Kafka consumer should have metrics around the number of seconds spent in 
> rebalance over the last minute as well as the number of rebalances started 
> during the previous minute.
> The other important thing about these metrics is that they should only be 
> updated once per minute for example the rebalance time should not increase 
> from second to second during a rebalance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1446) Consumer metrics for rebalance

2014-05-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004180#comment-14004180
 ] 

Sriharsha Chintalapani commented on KAFKA-1446:
---

[~nehanarkhede] [~clarkhaskins] can you please check the reviewboard and let me 
know if those are the metrics you are looking for.


> Consumer metrics for rebalance
> --
>
> Key: KAFKA-1446
> URL: https://issues.apache.org/jira/browse/KAFKA-1446
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Clark Haskins
>  Labels: newbie, usability
> Attachments: KAFKA-1446.patch
>
>
> The Kafka consumer should have metrics around the number of seconds spent in 
> rebalance over the last minute as well as the number of rebalances started 
> during the previous minute.
> The other important thing about these metrics is that they should only be 
> updated once per minute for example the rebalance time should not increase 
> from second to second during a rebalance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 21744: Patch for KAFKA-1446

2014-05-20 Thread Joel Koshy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21744/#review43564
---


This looks good. One minor issue with the rebalance time metric in the
patch is that if a consumer spends an excessive amount of time in rebalance
(say due to retries and a lot of partitions it spends several minutes) then
that will not get reflected in the stats until after the rebalance completes
(or fails). That is probably why Clark was suggesting an update once a
minute for these stats. However, I don't see a good place to "poke" the
meter/timer during the rebalance.  We do plan to add a consumer state metric
that I think will cover almost the same thing - i.e., what Clark suggested
and we can keep this patch as is.




core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala


Can just call these -RebalancesPerMin and -RebalanceTime


- Joel Koshy


On May 21, 2014, 12:48 a.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21744/
> ---
> 
> (Updated May 21, 2014, 12:48 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1446
> https://issues.apache.org/jira/browse/KAFKA-1446
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1446.Consumer metrics for rebalance.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
> c032d263b5eedd8f3075e92cc9a9b0be864720ec 
> 
> Diff: https://reviews.apache.org/r/21744/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



Jenkins build is back to normal : Kafka-trunk #188

2014-05-20 Thread Apache Jenkins Server
See 



Re: Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-20 Thread Hangjun Ye
Thanks Jun and Francois.

We used Kafka 0.8.0 previously. We got some weird error when expanding
cluster and it couldn't be finished.
Now we use 0.8.1.1, I would have a try on cluster expansion sometime.

I read the discussion on that jira issue and I agree with points raised
there.
HDFS was also improved a lot since then and many issues have been resolved
(e.g. SPOF).

We have a team for building and providing storage/computing platform for
our company and we have already provided a Hadoop cluster.
If Kafka has an option to store data on HDFS, we just need to allocate some
space quota for it on our cluster (and increase it on demand) and it might
reduce our operational cost a lot.

Another (and maybe more aggressive) thought is about the deployment. Jun
has a good point: "HDFS only provides data redundancy, but not
computational redundancy". If Kafka could be deployed on YARN, it could
offload some computational resource management to YARN and we don't have to
allocate machines physically. Kafka still needs to take care of load
balance and partition assignment among brokers by itself.
Many computational frameworks like spark/samza have such an option and it's
a big attractive point for us.

Best,
Hangjun


2014-05-20 21:00 GMT+08:00 François Langelier :

> Take a look at Camus 
>
>
>
> François Langelier
> Étudiant en génie Logiciel - École de Technologie
> Supérieure
> Capitaine Club Capra 
> VP-Communication - CS Games  2014
> Jeux de Génie  2011 à 2014
> Argentier Fraternité du Piranha 
> 2012-2014
> Comité Organisateur Olympiades ÉTS 2012
> Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
>
>
> On 19 May 2014 05:28, Hangjun Ye  wrote:
>
> > Hi there,
> >
> > I recently started to use Kafka for our data analysis pipeline and it
> works
> > very well.
> >
> > One problem to us so far is expanding our cluster when we need more
> storage
> > space.
> > Kafka provides some scripts for helping do this but the process wasn't
> > smooth.
> >
> > To make it work perfectly, seems Kafka needs to do some jobs that a
> > distributed file system has already done.
> > So just wondering if any thoughts to make Kafka work on top of HDFS?
> Maybe
> > make the Kafka storage engine pluggable and HDFS is one option?
> >
> > The pros might be that HDFS has already handled storage management
> > (replication, corrupted disk/machine, migration, load balance, etc.) very
> > well and it frees Kafka and the users from the burden, and the cons might
> > be performance degradation.
> > As Kafka does very well on performance, possibly even with some degree of
> > degradation, it's still competitive for the most situations.
> >
> > Best,
> > --
> > Hangjun Ye
> >
>



-- 
Hangjun Ye


Re: Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-20 Thread Steve Morin
Hangjun,
  Does having Kafka in Yarn would be a big architectural change from where
it is now?  From what I have seen on most typical setup you want machines
optimized for Kafka, not just it on top of hdfs.
-Steve


On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye  wrote:

> Thanks Jun and Francois.
>
> We used Kafka 0.8.0 previously. We got some weird error when expanding
> cluster and it couldn't be finished.
> Now we use 0.8.1.1, I would have a try on cluster expansion sometime.
>
> I read the discussion on that jira issue and I agree with points raised
> there.
> HDFS was also improved a lot since then and many issues have been resolved
> (e.g. SPOF).
>
> We have a team for building and providing storage/computing platform for
> our company and we have already provided a Hadoop cluster.
> If Kafka has an option to store data on HDFS, we just need to allocate some
> space quota for it on our cluster (and increase it on demand) and it might
> reduce our operational cost a lot.
>
> Another (and maybe more aggressive) thought is about the deployment. Jun
> has a good point: "HDFS only provides data redundancy, but not
> computational redundancy". If Kafka could be deployed on YARN, it could
> offload some computational resource management to YARN and we don't have to
> allocate machines physically. Kafka still needs to take care of load
> balance and partition assignment among brokers by itself.
> Many computational frameworks like spark/samza have such an option and it's
> a big attractive point for us.
>
> Best,
> Hangjun
>
>
> 2014-05-20 21:00 GMT+08:00 François Langelier :
>
> > Take a look at Camus 
> >
> >
> >
> > François Langelier
> > Étudiant en génie Logiciel - École de Technologie
> > Supérieure
> > Capitaine Club Capra 
> > VP-Communication - CS Games  2014
> > Jeux de Génie  2011 à 2014
> > Argentier Fraternité du Piranha 
> > 2012-2014
> > Comité Organisateur Olympiades ÉTS 2012
> > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> >
> >
> > On 19 May 2014 05:28, Hangjun Ye  wrote:
> >
> > > Hi there,
> > >
> > > I recently started to use Kafka for our data analysis pipeline and it
> > works
> > > very well.
> > >
> > > One problem to us so far is expanding our cluster when we need more
> > storage
> > > space.
> > > Kafka provides some scripts for helping do this but the process wasn't
> > > smooth.
> > >
> > > To make it work perfectly, seems Kafka needs to do some jobs that a
> > > distributed file system has already done.
> > > So just wondering if any thoughts to make Kafka work on top of HDFS?
> > Maybe
> > > make the Kafka storage engine pluggable and HDFS is one option?
> > >
> > > The pros might be that HDFS has already handled storage management
> > > (replication, corrupted disk/machine, migration, load balance, etc.)
> very
> > > well and it frees Kafka and the users from the burden, and the cons
> might
> > > be performance degradation.
> > > As Kafka does very well on performance, possibly even with some degree
> of
> > > degradation, it's still competitive for the most situations.
> > >
> > > Best,
> > > --
> > > Hangjun Ye
> > >
> >
>
>
>
> --
> Hangjun Ye
>


[jira] [Resolved] (KAFKA-1328) Add new consumer APIs

2014-05-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1328.
--

Resolution: Fixed

Pushed the javadoc updates as well.

> Add new consumer APIs
> -
>
> Key: KAFKA-1328
> URL: https://issues.apache.org/jira/browse/KAFKA-1328
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: KAFKA-1328.patch, KAFKA-1328_2014-04-10_17:13:24.patch, 
> KAFKA-1328_2014-04-10_18:30:48.patch, KAFKA-1328_2014-04-11_10:54:19.patch, 
> KAFKA-1328_2014-04-11_11:16:44.patch, KAFKA-1328_2014-04-12_18:30:22.patch, 
> KAFKA-1328_2014-04-12_19:12:12.patch, KAFKA-1328_2014-05-05_11:35:07.patch, 
> KAFKA-1328_2014-05-05_11:35:41.patch, KAFKA-1328_2014-05-09_17:18:55.patch, 
> KAFKA-1328_2014-05-16_11:46:02.patch, KAFKA-1328_2014-05-20_15:55:01.patch, 
> KAFKA-1328_2014-05-20_16:34:37.patch
>
>
> New consumer API discussion is here - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3CCAOG_4QYBHwyi0xN=hl1fpnrtkvfjzx14ujfntft3nn_mw3+...@mail.gmail.com%3E
> This JIRA includes reviewing and checking in the new consumer APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-20 Thread Hangjun Ye
Hi Steve,

Yes, what I want is that Kafka doesn't have to care about machines
physically (as an option).

Best,
Hangjun

2014-05-21 11:46 GMT+08:00 Steve Morin :

> Hangjun,
>   Does having Kafka in Yarn would be a big architectural change from where
> it is now?  From what I have seen on most typical setup you want machines
> optimized for Kafka, not just it on top of hdfs.
> -Steve
>
>
> On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye  wrote:
>
> > Thanks Jun and Francois.
> >
> > We used Kafka 0.8.0 previously. We got some weird error when expanding
> > cluster and it couldn't be finished.
> > Now we use 0.8.1.1, I would have a try on cluster expansion sometime.
> >
> > I read the discussion on that jira issue and I agree with points raised
> > there.
> > HDFS was also improved a lot since then and many issues have been
> resolved
> > (e.g. SPOF).
> >
> > We have a team for building and providing storage/computing platform for
> > our company and we have already provided a Hadoop cluster.
> > If Kafka has an option to store data on HDFS, we just need to allocate
> some
> > space quota for it on our cluster (and increase it on demand) and it
> might
> > reduce our operational cost a lot.
> >
> > Another (and maybe more aggressive) thought is about the deployment. Jun
> > has a good point: "HDFS only provides data redundancy, but not
> > computational redundancy". If Kafka could be deployed on YARN, it could
> > offload some computational resource management to YARN and we don't have
> to
> > allocate machines physically. Kafka still needs to take care of load
> > balance and partition assignment among brokers by itself.
> > Many computational frameworks like spark/samza have such an option and
> it's
> > a big attractive point for us.
> >
> > Best,
> > Hangjun
> >
> >
> > 2014-05-20 21:00 GMT+08:00 François Langelier :
> >
> > > Take a look at Camus 
> > >
> > >
> > >
> > > François Langelier
> > > Étudiant en génie Logiciel - École de Technologie
> > > Supérieure
> > > Capitaine Club Capra 
> > > VP-Communication - CS Games  2014
> > > Jeux de Génie  2011 à 2014
> > > Argentier Fraternité du Piranha 
> > > 2012-2014
> > > Comité Organisateur Olympiades ÉTS 2012
> > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > >
> > >
> > > On 19 May 2014 05:28, Hangjun Ye  wrote:
> > >
> > > > Hi there,
> > > >
> > > > I recently started to use Kafka for our data analysis pipeline and it
> > > works
> > > > very well.
> > > >
> > > > One problem to us so far is expanding our cluster when we need more
> > > storage
> > > > space.
> > > > Kafka provides some scripts for helping do this but the process
> wasn't
> > > > smooth.
> > > >
> > > > To make it work perfectly, seems Kafka needs to do some jobs that a
> > > > distributed file system has already done.
> > > > So just wondering if any thoughts to make Kafka work on top of HDFS?
> > > Maybe
> > > > make the Kafka storage engine pluggable and HDFS is one option?
> > > >
> > > > The pros might be that HDFS has already handled storage management
> > > > (replication, corrupted disk/machine, migration, load balance, etc.)
> > very
> > > > well and it frees Kafka and the users from the burden, and the cons
> > might
> > > > be performance degradation.
> > > > As Kafka does very well on performance, possibly even with some
> degree
> > of
> > > > degradation, it's still competitive for the most situations.
> > > >
> > > > Best,
> > > > --
> > > > Hangjun Ye
> > > >
> > >
> >
> >
> >
> > --
> > Hangjun Ye
> >
>



-- 
Hangjun Ye