Re: [PR] MINOR: cleanup KStream JavaDocs (4/N) - stream-table-inner-join [kafka]

via GitHub Thu, 30 Jan 2025 11:14:31 -0800


lucasbru commented on code in PR #18721:
URL: https://github.com/apache/kafka/pull/18721#discussion_r1936150222



##########
streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:
##########
@@ -2097,276 +2103,115 @@ <VO, VR> KStream<K, VR> outerJoin(final KStream<K, 
VO> otherStream,
      * <td>&lt;K1:ValueJoiner(C,b)&gt;</td>
      * </tr>
      * </table>
-     * Both input streams (or to be more precise, their underlying source 
topics) need to have the same number of
-     * partitions.
-     * If this is not the case, you would need to call {@link 
#repartition(Repartitioned)} for this {@code KStream}
-     * before doing the join, specifying the same number of partitions via 
{@link Repartitioned} parameter as the given
+     *
+     * By default, {@code KStream} records are processed by performing a 
lookup for matching records in the
+     * <em>current</em> (i.e., processing time) internal {@link KTable} state.
+     * This default implementation does not handle out-of-order records in 
either input of the join well.
+     * See {@link #join(KTable, ValueJoiner, Joined)} on how to configure a 
stream-table join to handle out-of-order
+     * data.
+     *
+     * <p>{@code KStream} and {@link KTable} (or to be more precise, their 
underlying source topics) need to have the
+     * same number of partitions (cf. {@link #join(GlobalKTable, 
KeyValueMapper, ValueJoiner)}).
+     * If this is not the case (and if not auto-repartitioning happens for the 
{@code KStream}, see further below),
+     * you would need to call {@link #repartition(Repartitioned)} for this 
{@code KStream} before doing the join,
+     * specifying the same number of partitions via {@link Repartitioned} 
parameter as the given {@link KTable}.
+     * Furthermore, {@code KStream} and {@link KTable} need to be 
co-partitioned on the join key
+     * (i.e., use the same partitioner).
+     * Note: Kafka Streams cannot verify the used partitioning strategy, so it 
is the user's responsibility to ensure
+     * that the same partitioner is used for both inputs for the join.

Review Comment:
   ```suggestion
        * that the same partitioner is used for both inputs of the join.
   ```
   avoid the "for-chain".



##########
streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:
##########
@@ -2097,276 +2103,115 @@ <VO, VR> KStream<K, VR> outerJoin(final KStream<K, 
VO> otherStream,
      * <td>&lt;K1:ValueJoiner(C,b)&gt;</td>
      * </tr>
      * </table>
-     * Both input streams (or to be more precise, their underlying source 
topics) need to have the same number of
-     * partitions.
-     * If this is not the case, you would need to call {@link 
#repartition(Repartitioned)} for this {@code KStream}
-     * before doing the join, specifying the same number of partitions via 
{@link Repartitioned} parameter as the given
+     *
+     * By default, {@code KStream} records are processed by performing a 
lookup for matching records in the
+     * <em>current</em> (i.e., processing time) internal {@link KTable} state.
+     * This default implementation does not handle out-of-order records in 
either input of the join well.
+     * See {@link #join(KTable, ValueJoiner, Joined)} on how to configure a 
stream-table join to handle out-of-order
+     * data.
+     *
+     * <p>{@code KStream} and {@link KTable} (or to be more precise, their 
underlying source topics) need to have the
+     * same number of partitions (cf. {@link #join(GlobalKTable, 
KeyValueMapper, ValueJoiner)}).
+     * If this is not the case (and if not auto-repartitioning happens for the 
{@code KStream}, see further below),

Review Comment:
   ```suggestion
        * If this is not the case (and if no auto-repartitioning happens for 
the {@code KStream}, see further below),
   ```



##########
streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:
##########
@@ -2097,276 +2103,115 @@ <VO, VR> KStream<K, VR> outerJoin(final KStream<K, 
VO> otherStream,
      * <td>&lt;K1:ValueJoiner(C,b)&gt;</td>
      * </tr>
      * </table>
-     * Both input streams (or to be more precise, their underlying source 
topics) need to have the same number of
-     * partitions.
-     * If this is not the case, you would need to call {@link 
#repartition(Repartitioned)} for this {@code KStream}
-     * before doing the join, specifying the same number of partitions via 
{@link Repartitioned} parameter as the given
+     *
+     * By default, {@code KStream} records are processed by performing a 
lookup for matching records in the
+     * <em>current</em> (i.e., processing time) internal {@link KTable} state.
+     * This default implementation does not handle out-of-order records in 
either input of the join well.
+     * See {@link #join(KTable, ValueJoiner, Joined)} on how to configure a 
stream-table join to handle out-of-order
+     * data.
+     *
+     * <p>{@code KStream} and {@link KTable} (or to be more precise, their 
underlying source topics) need to have the
+     * same number of partitions (cf. {@link #join(GlobalKTable, 
KeyValueMapper, ValueJoiner)}).
+     * If this is not the case (and if not auto-repartitioning happens for the 
{@code KStream}, see further below),
+     * you would need to call {@link #repartition(Repartitioned)} for this 
{@code KStream} before doing the join,
+     * specifying the same number of partitions via {@link Repartitioned} 
parameter as the given {@link KTable}.
+     * Furthermore, {@code KStream} and {@link KTable} need to be 
co-partitioned on the join key
+     * (i.e., use the same partitioner).
+     * Note: Kafka Streams cannot verify the used partitioning strategy, so it 
is the user's responsibility to ensure

Review Comment:
   ```suggestion
        * Note: Kafka Streams cannot verify the used partitioner, so it is the 
user's responsibility to ensure
   ```
   we can get by without a new concept of partitioning strategies



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] MINOR: cleanup KStream JavaDocs (4/N) - stream-table-inner-join [kafka]

Reply via email to