Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Eric Beabes
Yes, of course, if 'sort by within a partition' is not available in the Connector we will start without it BUT was wondering if it is - OR - if someone has a better idea. On Thu, Apr 29, 2021 at 12:32 PM Mich Talebzadeh wrote: > Well, you will need to set-up a pilot, test these scenarios and com

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Mich Talebzadeh
Well, you will need to set-up a pilot, test these scenarios and come up with a minimal Viable Product (MVP) .It will be difficult to give accurate answers but would have thought that you could have written your processed (enriched dataframe) to a database table partitioned by hourly rate and then

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Eric Beabes
Correct. Question you've asked seems to be the one we're looking an answer for. If we set the processingTime to 60 minutes that will require tons of memory, right? What happens if the batch fails? Reprocess for the same hour? Not sure if this is the right approach. That's why we're thinking we wi

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Mich Talebzadeh
Let say that you have your readStream created in SSS with socket format. Now you want to process these files/messages result = streamingDataFrame.select( \ . writeStream. \ outputMode('append'). \ o

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Eric Beabes
Source (devices) are sending messages to AWS SQS (Not Kafka). Each message contains the path of the file on S3. (We have no control on the source. They won't change the way it's being done.) SSS will be listening to the SQS queue. We are thinking SSS will read each SQS message, get the file locati

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Mich Talebzadeh
Ok thanks for the info. One question I forgot to ask is what is the streaming interval that the source is sending messages to Kafka to be processed inside SSS? For example are these market data etc? HTH view my Linkedin profile *

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Eric Beabes
We're thinking Kafka will allow us to scale to billions of messages in a day. That's the promise of Kafka, right? No other reason really. Main goal is to "batch" the messages per hour, create file(s) on S3 which are sorted by device_id so that we can do more aggregations which can later be sliced &

Re: Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Mich Talebzadeh
Hi Eric, On your second point "Is there a better way to do this" You are going to use Spark Structured Streaming (SSS) to clean and enrich the data and then push the messages to Kafka. I assume you will be using foreachBatch in this case. What purpose is there for Kafka to receive the enriched d

Kafka S3 Connector: Sort by a field within a partition

2021-04-29 Thread Eric Beabes
We’ve a use case where lots of messages will come in via AWS SQS from various devices. We’re thinking of reading these messages using Spark Structured Streaming, cleaning them up as needed & saving each message on Kafka. Later we’re thinking of using Kafka S3 Connector to push them to S3 on an hour

kafka upgrade 2.0 - 2.7

2021-04-29 Thread David Ballano Fernandez
Hi All, I am planning to do a rolling upgrade of our kafka cluster from 2.0 to kafka 2.7 I wanted to make sure that my assumptions about client compatibility are correct. After reading some documentation, I understood that after kafka broker 0.10.1 any java client should be supported and also al

Re: What's so special about 2,8,9,15,56,72 error codes?

2021-04-29 Thread Israel Ekpo
Hi Nikita, The errors may be caused by the clients, but they are detected and announced by the server (broker). It is not always a server issue, the error codes sometimes indicate issues caused by the clients. However, it is the server that detects and notifies the clients of those issues when ex

Re: Mirror Maker 2: Incoming messages on source and target kafka cluster mismatch after mirroring

2021-04-29 Thread fighter
We migrated data from 7 node source cluster to target 9 node cluster. MM source connector producer does not have any compression as i mentioned earlier uses default producer. We use ByteArrayConverter for both producer key and value, idempotency =true. Is there a chance the JMX metrics can go wrong

Re: Changing Replication Factor

2021-04-29 Thread Bruno Cadonna
Hi Marcus, 1. If you change REPLICATION_FACTOR_CONFIG without resetting the application (or deleting the changelog and repartition topics) and redeploy the Streams application, the replication factor of the internal topics will not change. The replication factor will only change for new deplo