[
https://issues.apache.org/jira/browse/NIFI-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218401#comment-16218401
]
David Doran commented on NIFI-4510:
-----------------------------------
Actually, it could be argued that the result I labelled "as expected" above is
incorrect.
There are three schemas involved here:
(1) Record Reader schema
(2) Record Writer schema
(3) ValidateRecord schema to test against
In my example reading CSV & writing JSON, the "NotALong" records should fail
validation (3) and be sent to the invalid channel. However, since the same
schema is used everywhere, they should also fail validation on the writer (2).
This isn't happening - they're being incorrectly written as JSON with e.g.
"ShouldBeLong" : "NotALong1". Not sure what the expected behaviour is in this
situation.
I guess this confusion is because my expectation/requirement of this processor
is somewhat different. I'd want just two schemas in play: (1) Reader and (2)
Writer. Validation would be: Can the record be written using (2). Records
passing that validation are output on the 'valid' channel using (2). Those that
fail are output on the 'invalid' channel using (1), i.e. records that cannot be
converted are emitted unchanged on the 'invalid' channel.
> ValidateRecord does not work properly with AvroRecordSetWriter
> --------------------------------------------------------------
>
> Key: NIFI-4510
> URL: https://issues.apache.org/jira/browse/NIFI-4510
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.4.0
> Environment: Hortonworks HDF Sandbox with inbuilt NiFi 1.2 disabled,
> and NiFi 1.4 downloaded & running
> Reporter: David Doran
> Attachments: ValidateRecordTest.xml
>
>
> When using CSVReader and JsonRecordSetWriter, the ValidateRecord processor
> works as expected: Valid records are emitted as a flowfile on the valid
> queue, invalid ones on the invalid queue.
> However, when using CSVReader and AvroRecordSetWriter, the presence of an
> invalid record causes the ValidateRecord processor to fail: Nothing is
> emitted on any of the downstream connectors (failure, invalid or valid).
> Instead the session is rolled back and the input file is left in the upstream
> queue.
> Here's the simple schema I've been using:
> {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "Key",
> "type": "string"
> },
> {
> "name": "ShouldBeLong",
> "type": "long"
> }]
> }
> And here's some sample CSV data:
> TheKey,123
> TheKey,456
> TheKey,NotALong1
> TheKey,NotALong2
> TheKey,NotALong3
> TheKey,321
> TheKey,654
> Using CSVReader->JsonRecordSetWriter results in a flowfile in the valid path:
> [ {
> "Key" : "TheKey",
> "ShouldBeLong" : 123
> }, {
> "Key" : "TheKey",
> "ShouldBeLong" : 456
> }, {
> "Key" : "TheKey",
> "ShouldBeLong" : 321
> }, {
> "Key" : "TheKey",
> "ShouldBeLong" : 654
> } ]
> and in invalid path:
> [ {
> "Key" : "TheKey",
> "ShouldBeLong" : "NotALong1"
> }, {
> "Key" : "TheKey",
> "ShouldBeLong" : "NotALong2"
> }, {
> "Key" : "TheKey",
> "ShouldBeLong" : "NotALong3"
> } ]
> … as expected.
> With CSVReader->AvroRecordSetWriter, the ValidateRecord processor bulletins
> errors repeatedly (because it keeps retrying) and the incoming flow file
> remains in the input queue:
> 22:40:22 UTC ERROR 015f100a-3b6f-1638-43d1-143f4ca4a816
> ValidateRecord[id=015f100a-3b6f-1638-43d1-143f4ca4a816]
> ValidateRecord[id=015f100a-3b6f-1638-43d1-143f4ca4a816] failed to process due
> to java.lang.NumberFormatException: For input string: "NotALong1"; rolling
> back session: For input string: "NotALong1"
>
> 22:40:22 UTC ERROR 015f100a-3b6f-1638-43d1-143f4ca4a816
> ValidateRecord[id=015f100a-3b6f-1638-43d1-143f4ca4a816]
> ValidateRecord[id=015f100a-3b6f-1638-43d1-143f4ca4a816] failed to process
> session due to java.lang.NumberFormatException: For input string:
> "NotALong1": For input string: "NotALong1"
> Thanks,
> Dave.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)