[ 
https://issues.apache.org/jira/browse/AVRO-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15480549#comment-15480549
 ] 

John McClean commented on AVRO-1912:
------------------------------------

I've uploaded a patch. It's not large, but it's somewhat involved. I'll 
describe the problem and my proposed solution.

When decoding, the various 'decodeX' functions start by calling 
parser_.advance. Here's an example.

{code}
template <typename P>
void JsonDecoder<P>::decodeString(string& value)
{
    parser_.advance(Symbol::sString);
    expect(JsonParser::tkString);
    value = in_.stringValue();
}
{code}

The call to 'advance' skips through fields that are missing in the reader 
schema but were present in the writer _that occur before the field to be 
decoded_. The problem is that if the last field is the one that was deleted, 
there's no code that advances through the missing fields. So in this example, 
'arrayNext' is called with missing fields still undecoded.

{code}
template <typename P>
size_t JsonDecoder<P>::arrayNext()
{
    parser_.processImplicitActions();
    if (in_.peek() == JsonParser::tkArrayEnd) {
        in_.advance();
        parser_.popRepeater();
        parser_.advance(Symbol::sArrayEnd);
        return 0;
    }
    parser_.setRepeatCount(1);
    return 1;
}
{code}
The problem is that arrayNext doesn't know to skip through missing fields. It's 
assuming the next thing will be 'arrayEnd'.

My fix is to modify processImplicitActions to advance through the fields to be 
skipped. In other words, it advances through the fields to be fixed greedily. 

It may be a good idea to rename processImplicitActions, as it now does more 
than that.

*Testing*

I added a test for the specific case described in the bug. I also modified the 
tests to have the 'ResolvingDecoder' run in the json case as well as the binary 
case. Doing so resulted in two failing cases in the json case. These seem to 
fail because the json code seems to have a problem adding items to an array in 
chunks rather than all at once. (I moved those tests to allow them to run in 
the binary case but not the json case.)



> C++ Resolving Decoding doesn't work if element removed from record in array.
> ----------------------------------------------------------------------------
>
>                 Key: AVRO-1912
>                 URL: https://issues.apache.org/jira/browse/AVRO-1912
>             Project: Avro
>          Issue Type: Bug
>            Reporter: John McClean
>         Attachments: AVRO-1912.patch
>
>
> Writer schema:
> {code}
> { 
>     "type": "record",
>     "name": "TestRecord",
>     "fields": [
>         {
>             "name": "array",
>             "type": {
>                 "type": "array",
>                 "items": {
>                     "name": "item",
>                     "type": "record",
>                     "fields": [
>                         { "name": "A", "type": "string" },
>                         { "name": "B", "type": "string", "default": "foo" }
>                     ]
>                 }
>             }
>         }
>     ] 
> }
> {code}
> Reader schema:
> {code}
> { 
>     "type": "record",
>     "name": "TestRecord",
>     "fields": [
>         {
>             "name": "array",
>             "type": {
>                 "type": "array",
>                 "items": {
>                     "name": "item",
>                     "type": "record",
>                     "fields": [
>                         { "name": "A", "type": "string" }
>                     ]
>                 }
>             }
>         }
>     ] 
> }
> {code}
> Data is:      
> {code}
> {
>   "array": [
>     {
>       "A": "",
>       "B": ""
>     }
>   ]
> }
> {code}
> The following code fails with an exception “Expected: Repeater got String”. 
> The equivalent java code works fine on the same schema and data.
> {code}
> auto decoder = avro::resolvingDecoder(writerSchema,
>                                       readerSchema,
>                                       avro::jsonDecoder(writerSchema));
> strinstream ss = loadData();  
> auto_ptr<avro::InputStream> in = avro::istreamInputStream(ss);
> decoder->init(*in);
> auto record = reader::TestRecord();
> decode(*decoder, record);
> {code}
> I stepped through the code and what seems to be happening is that the code is 
> treating “A” and “B” as distinct elements in the array, as if the array had 
> two elements rather than one. 
> I'm not sure how to go about fixing this. Any pointers would be appreciated. 
> (I don't think it's my C++ test code. It works fine if the record above isn't 
> in an array.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to