Create KIP permission

2020-10-03 Thread Javier Freire Riobo
Hi,

My UID is javier.freire. I wanted to create a KIP to add to Kafka Streams
the ability to convert a changelog stream to a stream by computing the
value by comparing the old and new value of the record.

These are the changes:

https://github.com/javierfreire/kafka/commit/d32169f06452388800ceb2a9e1ef86d1921d1345

Thank you


[jira] [Resolved] (KAFKA-10569) Running aggregate queries on KSQL client side is getting to ERROR Shutdown broker because all log dirs in ...

2020-10-03 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-10569.
-
Resolution: Invalid

> Running aggregate queries on KSQL client side is getting to ERROR Shutdown 
> broker because all log dirs in ...
> -
>
> Key: KAFKA-10569
> URL: https://issues.apache.org/jira/browse/KAFKA-10569
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.5.0
> Environment: local
>Reporter: Petre Gordan
>Priority: Critical
> Attachments: KSQLDBServerSideErrors.txt, KafkaClusterLogs.txt, 
> ProductsOrders.txt, ZiikeeperSideLog.txt, kafka-server-start.bat, ksql, 
> ksql-server-start, ksql-server.properties, schema-registry.properties, 
> server.properties, zookeeper-server-start.bat, zookeeper.properties
>
>
> Working on Windows 10 and confluent-5.5.0-2.12.zip and kafka_2.12-2.5.0. I'm 
> running locally:
>  * in powershell, zookeeper with: *bin\windows\zookeeper-server-start.bat 
> config\zookeeper.properties*
>  * in powershell, kafka-server
> *bin\windows\kafka-server-start.bat config\server.properties*
>  * in bash (with ubuntu) ksqldb server
> sudo bin/ksql-server-start etc/ksqldb/ksql-server.properties
>  * in bash (with ubuntu) ksql client
> sudo bin/ksql [http://0.0.0.0:8088|http://0.0.0.0:8088/]
> After all of these are sorted, than, I'm starting to practice the Kafka. So, 
> I'm creating tables, streams, making inserts, and all good. I can do small 
> queries like: 
> select * from products emit changes;, etc.
> All good until at this step.
> When, I'm trying to run every type of aggregate query, than is showing the 
> result after a while, but into the end, after I will press Ctrl+C to 
> terminate that and to do other query, everything is down.
> For example, see the attached .sql script, and after I will run that script 
> the products table and orders stream are created with success and populated 
> with success.
> After that if I run this query:
> select ProductRowKey, count(ProductRowKey) from orders group by ProductRowKey 
> emit changes;
> I can see the results, all good, but into the end if I will press Ctrl + C, 
> than everything is down.
>  
> Looking into the logs and taking based on the time history the main raised 
> warning and issues are:
>  * first is raised this: 
> Query terminated due to exception:org.eclipse.jetty.io.EofException 
> (io.confluent.ksql.rest.server.resources.streaming.QueryStreamWriter:95)
>  * than this: INFO stream-client [_confluent-ksql-default_transient State 
> transition from RUNNING to PENDING_SHUTDOWN 
> (org.apache.kafka.streams.KafkaStreams:285)
>  * than these:
> INFO stream-thread [_confluent-ksql-default_transient Informed to shut down 
> (org.apache.kafka.streams.processor.internals.StreamThread:1116)
>  State transition from RUNNING to PENDING_SHUTDOWN 
> (org.apache.kafka.streams.processor.internals.StreamThread:221)
>  Shutdown complete 
> (org.apache.kafka.streams.processor.internals.StreamThread:1150)
>  * than this:
> INFO stream-thread [qtp2032891036-47] Deleting obsolete state directory 0_0 
> for task 0_0 as 1ms has elapsed
>  * than this:
> WARN Could not clean up the schema registry for query: 
> _confluent-ksql-default_transient_
>  * than this:
> WARN [Producer clientId=producer-1] Connection to node 0 
> (localhost/127.0.0.1:9092) could not be established. Broker may not be 
> available. (org.apache.kafka.clients.NetworkClient:763)
> all of these from above on the KSQLDB server side logs,
>  * finally this on the Kafka cluster side:
> ERROR Shutdown broker because all log dirs in  have failed 
> (kafka.log.LogManager)
>  
> And after that everything is down. Please see all the attached files to get 
> all the info.
>  
> Please help me with these.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-406: GlobalStreamThread should honor custom reset policy

2020-10-03 Thread Navinder Brar

 

Thanks a lot, Matthias for detailed feedback. I tend to agree with changing the 
state machine

itself if required. I think at the end of the day InvalidOffsetException is a 
rare event and is not

as frequent as rebalancing. So, pausing all tasks for once in while should be 
ok from a processing

standpoint. 







I was also wondering if instead of adding RESTORING state b/w REBALANCING & 
RUNNING

can we add it before REBALANCING. Whenever an application starts anyways there 
is no need for

active/replica tasks to be present there for us to build global stores there. 
We can restore global stores first

and then trigger a rebalancing to get the tasks assigned. This might help us in 
shielding the users

from changing what they listen to currently(which is REBALANCING -> RUNNING). 
So, we go

RESTORING -> REBALANCING -> RUNNING. The only drawback here might be that 
replicas would

also be paused while we are restoring global stores but as Matthias said we 
would want to give

complete bandwidth to restoring global stores in such a case and considering it 
is a rare event this

should be ok. On the plus side, this would not lead to any race condition and 
we would not need to

change the behavior of any stores. But this also means that this RESTORING 
state is only for global stores

like the GLOBAL_RESTORING state we discussed before :) as regular tasks still 
restore inside REBALANCING.

@John, @Sophie do you think this would work?







Regards,




Navinder




On Wednesday, 30 September, 2020, 09:39:07 pm IST, Matthias J. Sax 
 wrote:  
 
 I guess we need to have some cleanup mechanism for this case anyway,
because, the global thread can enter RESTORING state at any point in
time, and thus, even if we set a flag to pause processing on the
StreamThreads we are subject to a race condition.

Beside that, on a high level I am fine with either "busy waiting" (ie,
just lock the global-store and retry) or setting a flag. However, there
are some trade-offs to consider:

As we need a cleanup mechanism anyway, it might be ok to just use a
single mechanism. -- We should consider the impact in EOS though, as we
might need to wipe out the store of regular tasks for this case. Thus,
setting a flag might actually help to prevent that we repeatably wipe
the store on retries... On the other hand, we plan to avoid wiping the
store in case of error for EOS anyway, and if we get this improvement,
we might not need the flag.

For the client state machine: I would actually prefer to have a
RESTORING state and I would also prefer to pause _all_ tasks. This might
imply that we want a flag. In the past, we allowed to interleave restore
and processing in StreamThread (for regular tasks) what slowed down
restoring and we changed it back to not process any tasks until all
tasks are restored). Of course, in our case we have two different
threads (not a single one). However, the network is still shared, so it
might be desirable to give the full network bandwidth to the global
consumer to restore as fast as possible (maybe an improvement we could
add to `StreamThreads` too, if we have multiple threads)? And as a side
effect, it does not muddy the waters what each client state means.

Thus, overall, I tend to prefer a flag on `StreamThread` as it seems to
provide a cleaner end-to-end solution (and we avoid the dependency to
improve EOS state management).

Btw: I am not sure if we actually need to preserve compatibility for the
state machine? To me, it seems not to be a strict contract, and I would
personally be ok to just change it.


-Matthias


On 9/22/20 11:08 PM, Navinder Brar wrote:
> Thanks a lot John for these suggestions. @Matthias can share your thoughts on 
> the last two comments made in this chain.
> 
> Thanks,Navinder 
> 
>    On Monday, 14 September, 2020, 09:03:32 pm IST, John Roesler 
> wrote:  
>  
>  Hi Navinder,
> 
> Thanks for the reply.
> 
> I wasn't thinking of an _exponential_ backoff, but
> otherwise, yes, that was the basic idea. Note, the mechanism
> would be similar (if not the same) to what Matthias is
> implementing for KIP-572:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams
> 
> Regarding whether we'd stay in RUNNING during global
> restoration or not, I can see your point. It seems like we
> have three choices with how we set the state during global
> restoration:
> 1. stay in RUNNING: Users might get confused, since
> processing could get stopped for some tasks. On the other
> hand, processing for tasks not blocked by the global
> restoration could proceed (if we adopt the other idea), so
> maybe it still makes sense.
> 2. transition to REBALANCING: Users might get confused,
> since there is no actual rebalance. However, the current
> state for Kafka Streams during state restoration is actually
> REBALANCING, so it seems people already should understand
> that REBALANCING really means REBALANCING|RESTORING. This
> choice would