Hi Chris I updated KIP document https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex ) about discussion and changes before,
and commited some code according to your review. https://github.com/apache/kafka/pull/12219/commits/1541a538eaff676441f125c400a7807d17f2e138 I understand you mean the simplicity is more important with SMT ( because it is Simple Message Trasform.. ) so I removed about "message" structured format input support and also move GroupRegexValidator to inner private class. thanks, always whsoul 2022년 6월 14일 (화) 오전 11:30, Chris Egerton <fearthecel...@gmail.com>님이 작성: > Hi whsoul, > > Would you mind updating the KIP document ( > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex > ) > with all of these changes? We tend to discuss and vote on what's included > in the Wiki as our source of truth, as opposed to pull requests. > > RE 4: > > it seems the chained transform with extractField SMT + plainText value > ParseStructByRegex make the same result with struct value > ParseSturctByRegex, but it will drop collector meta data during > extractField ( I think.. ) > This is exactly right--using ExtractField + ParseStructByRegex will replace > the record key/value with the parsed struct, and drop everything else. Do > you think it'd increase the implementation complexity significantly to add > support for users to specify a field to operate on, which would preserve > the rest of the record key/value as-is? I'm not so sure that everyone is > going to want to drop all other metadata from their messages, but if > there's something that makes this particularly difficult to implement (as > compared to the other SMTs that are already included out of the box with > Kafka Connect), then we could probably leave it out for now. > > RE 6: > It's a tricky distinction, but what I mean is that, although we might add a > class named GroupRegexValidator and use that class in our SMT library, > unless it's part of the public interface we're trying to change, it's an > implementation detail and we don't have to call it out in the KIP. The > advantage to leaving it out is that it makes the KIP more concise and > therefore easier to review, you can choose to rename, remove, decompose, > etc. the class in your PR without having to worry about sticking to the > plan in the KIP that everyone reviewed and voted on. It's a minor detail > though, I'm noting it here more because it may be useful when writing > future KIPs than because it's necessary to adhere to strictly in this one. > > Cheers, > > Chris > > On Fri, Jun 10, 2022 at 8:13 AM gyejun choi <whsou...@gmail.com> wrote: > > > Hi Chris, > > > > I applied some code fix according your second reviews. > > > > > https://github.com/apache/kafka/pull/12219/commits/f673ea2eae0d907502e44c0ecd53b616386627bf > > > > > > 1. [applied] changed name as ParseStructByRegex > > > > 2. [applied] throw DataException, when a line that the SMT sees doesn't > > match the regex... > > originally, it will be skipped if no data match with regex, > > but change code to throw DataException according to your review > > > > 3. [already applied] > > I already delete code about desc ":{TYPE}" with commit below > > > > > https://github.com/apache/kafka/pull/12219/commits/534d995b3e6371c37443eb72eee03884cb23c85d > > > > 4. [need discuss] > > In my use case about log data collection, > > I configured the pipeline below > > nginx => filebeat => kafka => kafka connect es connector => es > > > > filebeat ( or most other log collector ) usually send value as struct ( > not > > plaintext ) with collector meta data, > > and the key name as "message" ( in case filebeat ) > > > > I think there are more use cases log message wrapped struct value than > > plain text value, > > and it seems the chained transform with extractField SMT + plainText > value > > ParseStructByRegex make the same result with struct value > > ParseSturctByRegex, > > but it will drop collector meta data during extractField ( I think.. ) > > and also in almost case, users will use ParseStructByRegex SMT with > > extractField SMT > > > > 5. [applied] throw DataException, when there are difference between group > > size and mapping names size > > > > > > 6. [ question ] > > you mean, use RegexValidator class already exist? > > without group "(.*)" pattern check, it will not provide early detection > > about regex config mistake, > > If you think it is enough as runtime DataException detection? > > > > always thanks. > > > > whsoul > > > > 2022년 6월 8일 (수) 오전 9:46, Chris Egerton <fearthecel...@gmail.com>님이 작성: > > > > > Hi whsoul, > > > > > > Thanks for the updates. I have a few more thoughts but this is looking > > > pretty good: > > > > > > 1. The name "ToStructByRegexTransform" is a little unwieldy. What do > you > > > think about something shorter like ParseStruct, ParseRegex, or > > > ParseStructByRegex? > > > > > > 2. What happens if a line that the SMT sees doesn't match the regex > > > supplied by the user? Will it throw an exception, silently skip the > > message > > > and return it as-is, allow the user to configure it to do either, > > something > > > else entirely? (I'd personally lean towards just throwing an exception > > > since users can configure regexes to be lenient via the optional > > > quantifier, i.e. "?") > > > > > > 3. The description for the "regex" property still includes the "( with > > > :{TYPE} )" snippet; should that be removed? > > > > > > 4. Is it worth adding support to this SMT to operate on an individual > > field > > > of a message? I.e., you specify a "field" of "log_line" and the SMT > > expects > > > to see a struct or a map with a field/key of "log_line" and parses that > > > instead of the entire message. If so, it might be worth specifying that > > > this property would follow any precedents set by KIP-821 [1] so that > > nested > > > fields could be accessed instead of just top-level fields. > > > > > > 5. What happens if the user specifies more names in the "mapping" > > property > > > than there are groups in the "regex" property? > > > > > > 6. (Nit) I don't think the GroupRegexValidator class needs to be called > > out > > > as part of the changes to public interface if it's just going to be > used > > by > > > this transform. > > > > > > [1] - > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures > > > > > > Cheers, > > > > > > Chris > > > > > > On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com> wrote: > > > > > > > Hi Chris, > > > > > > > > Thank you for your positive feed back, > > > > And sorry about late reply ( I didn’t recognize your reply email… TT > ) > > > > > > > > And I update KIP and PR with your review > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex > > > > > > > > 1. typecast function removed > > > > 2. struct.field removed > > > > 3. Proposed Interface changed > > > > > > > > > > > > I will wait for you second feed back > > > > > > > > Thanks~ > > > > > > > > On 2021/07/15 15:57:14 Chris Egerton wrote: > > > > > Hi whsoul, > > > > > > > > > > Thanks for the KIP. The two use cases you identified seem quite > > > > appealing, > > > > > especially the first one: parsing structured log messages. > > > > > > > > > > Here are my initial thoughts on the proposed design: > > > > > > > > > > 1. I wonder if it's necessary to include support for type casting > > with > > > > this > > > > > SMT. We already have a Cast SMT ( > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java > > > > ) > > > > > that can parse multiple fields of a structured record value with > > > > differing > > > > > types. Would it be enough for your new SMT to only produce string > > > values > > > > > for its structured data, and then allow users to perform casting > > logic > > > > > using the Cast SMT afterward? > > > > > > > > > > 2. It seems like the "struct.field" property is similar; based on > the > > > > > examples, it looks like when the SMT is configured with a value for > > > that > > > > > property, it will first pull out a field from a structured record > > value > > > > > (for example, it would pull out the value " > > > > > https://kafka.apache.org/documentation/#connect" from a map of > > > {"url": " > > > > > https://kafka.apache.org/documentation/#connect"}), then parse > that > > > > field's > > > > > value, and replace the entire record value (or key) with the result > > of > > > > the > > > > > parsing stage. It seems like this could be accomplished using the > > > > > ExtractField SMT ( > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java > > > > ) > > > > > as a preliminary step before passing it to your new SMT. Is this > > > correct? > > > > > And if so, could we simplify the interface for your SMT by removing > > the > > > > > "struct.field" property in favor of the existing ExtractField SMT? > > > > > > > > > > 3. (Nit) I believe the property names in the table beneath the > > > "Proposed > > > > > Interfaces" section should have the "transforms.RegexTransform." > > prefix > > > > > removed, since users will be able to configure the SMT with any > name, > > > not > > > > > just "RegexTransform". This keeps in line with similar KIPs such as > > > > > KIP-437, which added a new property to the MaskField SMT ( > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT > > > > > ). > > > > > > > > > > Cheers, > > > > > > > > > > Chris > > > > > > > > > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com> > wrote: > > > > > > > > > > > is there someone who any feed back about this? > > > > > > > > > > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I've opened KIP-678 which is intended to provide a new SMT in > > Kafka > > > > > > Connect. > > > > > > > I'd be grateful for any > > > > > > > feedback: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex== > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > whsoul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >