Hello!  I just created a JIRA for this as an improvement :D
https://issues.apache.org/jira/browse/AVRO-2689

To check evolution, we'd probably want to specify the reader schema in
the GenericDatumReader created here:
https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L75

The writer schema is automatically set when the DataFileStream is
created.  If we want to set a different reader schema (than the one
found in the file), it should be set by calling
reader.setExpected(readerSchema) just after the DataFileStream is
created.

I think it's a pretty good idea -- it feels like we're seeing more
questions about schema evolution these days, so that would be a neat
way for a user to test (or to create reproducible scenarios for bug
reports).  If you're interested, feel free to take the JIRA!  I'd be
happy to help out.

Ryan


On Fri, Jan 17, 2020 at 2:22 PM roger peppe <rogpe...@gmail.com> wrote:
>
> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <r...@skraba.com> wrote:
>>
>> didn't find anything currently in the avro-tools that uses both
>> reader and writer schemas while deserializing data...  It should be a
>> pretty easy feature to add as an option to the DataFileReadTool
>> (a.k.a. tojson)!
>
>
> Thanks for that suggestion. I've been delving into that code a bit and trying 
> to understand what's going on.
>
> At the heart of it is this code:
>
>     GenericDatumReader<Object> reader = new GenericDatumReader<>();
>     try (DataFileStream<Object> streamReader = new DataFileStream<>(inStream, 
> reader)) {
>       Schema schema = streamReader.getSchema();
>       DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
>       JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, out, 
> pretty);
>
> I'm trying to work out where the best place to put the specific reader schema 
> (taken from a command line flag) might be.
>
> Would it be best to do it when creating the DatumReader (it looks like there 
> might be a way to create that with a generic writer schema and a specific 
> reader schema, although I can't quite see how to do that atm), or when 
> creating the DatumWriter?
> Or perhaps there's a better way?
>
> Thanks for any guidance.
>
>    cheers,
>     rog.
>>
>>
>> You are correct about running ./build.sh dist in the java directory --
>> it fails with JDK 11 (likely fixable:
>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>
>> You should probably do a simple mvn clean install instead and find the
>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>> should work with JDK11 without any problem (well-tested in the build).
>>
>> Best regards, Ryan
>>
>>
>>
>> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <rogpe...@gmail.com> wrote:
>> >
>> > Update: I tried running `build.sh dist` in `lang/java` and it failed (at 
>> > least, it looks like a failure message) after downloading a load of Maven 
>> > deps with the following errors: 
>> > https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
>> >
>> > Any hints on what I should do to build the avro-tools jar?
>> >
>> >   cheers,
>> >     rog.
>> >
>> > On Thu, 16 Jan 2020 at 16:45, roger peppe <rogpe...@gmail.com> wrote:
>> >>
>> >>
>> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <r...@skraba.com> wrote:
>> >>>
>> >>> Hello!  Is it because you are using brew to install avro-tools?  I'm
>> >>> not entirely familiar with how it packages the command, but using a
>> >>> direct bash-like solution instead might solve this problem of mixing
>> >>> stdout and stderr.  This could be the simplest (and right) solution
>> >>> for piping.
>> >>
>> >>
>> >> No, I downloaded the jar and am directly running it with "java -jar 
>> >> ~/other/avro-tools-1.9.1.jar".
>> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian package 
>> >> openjdk-11-jre-headless.
>> >>
>> >> I'm going to try compiling avro-tools myself to investigate but I'm a 
>> >> total Java ignoramus - wish me luck!
>> >>
>> >>>
>> >>> alias avrotoolx='java -jar
>> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>> >>> avrotoolx tojson x.out 2> /dev/null
>> >>>
>> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>> >>> warnings and logs should not be piped along with the normal content.)
>> >>>
>> >>> Otherwise, IIRC, there is no way to disable the first illegal
>> >>> reflective access warning when running in Java 9+, but you can "fix"
>> >>> these module errors, and deactivate the NativeCodeLoader logs with an
>> >>> explicit log4j.properties:
>> >>>
>> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>> >>> tojson x.out
>> >>
>> >>
>> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j 
>> >> properties files though. What do I need to put in /tmp/log4j.properties 
>> >> to make this work?
>> >>
>> >>> None of that is particularly satisfactory, but it could be a
>> >>> workaround for your immediate use.
>> >>
>> >>
>> >> Yeah, not ideal, because if something goes wrong, stdout will be 
>> >> corrupted, but at least some noise should go away :)
>> >>
>> >>> I'd also like to see a more unified experience with the CLI tool for
>> >>> documentation and usage.  The current state requires a bit of Avro
>> >>> expertise to use, but it has some functions that would be pretty
>> >>> useful for a user working with Avro data.  I raised
>> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>> >>>
>> >>> In my opinion, a schema compatibility tool would be a useful and
>> >>> welcome feature!
>> >>
>> >>
>> >> That would indeed be nice, but in the meantime, is there really nothing 
>> >> in the avro-tools commands that uses a chosen schema to read a data file 
>> >> written with some other schema? That would give me what I'm after 
>> >> currently.
>> >>
>> >> Thanks again for the helpful response.
>> >>
>> >>    cheers,
>> >>      rog.
>> >>
>> >>>
>> >>> Best regards, Ryan
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <rogpe...@gmail.com> wrote:
>> >>> >
>> >>> > Hi Fokko,
>> >>> >
>> >>> > Thanks for your swift response!
>> >>> >
>> >>> > Stdout and stderr definitely seem to be merged on this platform at 
>> >>> > least. Here's a sample:
>> >>> >
>> >>> > % avrotool random --count 1 --schema '"int"'  x.out
>> >>> > % avrotool tojson x.out > x.json
>> >>> > % cat x.json
>> >>> > 125140891
>> >>> > WARNING: An illegal reflective access operation has occurred
>> >>> > WARNING: Illegal reflective access by 
>> >>> > org.apache.hadoop.security.authentication.util.KerberosUtil 
>> >>> > (file:/home/rog/other/avro-tools-1.9.1.jar) to method 
>> >>> > sun.security.krb5.Config.getInstance()
>> >>> > WARNING: Please consider reporting this to the maintainers of 
>> >>> > org.apache.hadoop.security.authentication.util.KerberosUtil
>> >>> > WARNING: Use --illegal-access=warn to enable warnings of further 
>> >>> > illegal reflective access operations
>> >>> > WARNING: All illegal access operations will be denied in a future 
>> >>> > release
>> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load 
>> >>> > native-hadoop library for your platform... using builtin-java classes 
>> >>> > where applicable
>> >>> > %
>> >>> >
>> >>> > I've just verified that it's not a problem with the java executable 
>> >>> > itself (I ran a program that printed to System.err and the text 
>> >>> > correctly goes to the standard error).
>> >>> >
>> >>> > > Regarding the documentation, the CLI itself contains info on all the 
>> >>> > > available commands. Also, there are excellent online resources: 
>> >>> > > https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> >>> > >  Is there anything specific that you're missing?
>> >>> >
>> >>> > There's the single line summary produced for each command by running 
>> >>> > "avro-tools" with no arguments, but that's not as much info as I'd 
>> >>> > ideally like. For example, it often doesn't say what file format is 
>> >>> > being written or read. For some commands, the purpose is not very 
>> >>> > clear.
>> >>> >
>> >>> > For example the description of the recodec command is "Alters the 
>> >>> > codec of a data file". It doesn't describe how it alters it or how one 
>> >>> > might configure the alteration parameters. I managed to get some usage 
>> >>> > help by passing it more than two parameters (specifying "--help" gives 
>> >>> > an exception), but that doesn't provide much more info:
>> >>> >
>> >>> > % avro-tools recodec a b c
>> >>> > Expected at most an input file and output file.
>> >>> > Option             Description
>> >>> > ------             -----------
>> >>> > --codec <String>   Compression codec (default: null)
>> >>> > --level <Integer>  Compression level (only applies to deflate and xz) 
>> >>> > (default:
>> >>> >                      -1)
>> >>> >
>> >>> > For the record, I'm wondering it might be possible to get avrotool to 
>> >>> > tell me if one schema is compatible with another so that I can check 
>> >>> > hypotheses about schema-checking in practice without having to write 
>> >>> > Java code.
>> >>> >
>> >>> >   cheers,
>> >>> >     rog.
>> >>> >
>> >>> >
>> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl> 
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi Rog,
>> >>> >>
>> >>> >> This is actually a warning produced by the Hadoop library, that we're 
>> >>> >> using. Please note that htis isn't part of the stdout:
>> >>> >>
>> >>> >> $ find /tmp/tmp
>> >>> >> /tmp/tmp
>> >>> >> /tmp/tmp/._SUCCESS.crc
>> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >>> >> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>> >>> >> /tmp/tmp/_SUCCESS
>> >>> >>
>> >>> >> $ avro-tools tojson 
>> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load 
>> >>> >> native-hadoop library for your platform... using builtin-java classes 
>> >>> >> where applicable
>> >>> >> {"line_of_text":{"string":"Hello"}}
>> >>> >> {"line_of_text":{"string":"World"}}
>> >>> >>
>> >>> >> $ avro-tools tojson 
>> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro > 
>> >>> >> /tmp/tmp/data.json
>> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load 
>> >>> >> native-hadoop library for your platform... using builtin-java classes 
>> >>> >> where applicable
>> >>> >>
>> >>> >> $ cat /tmp/tmp/data.json
>> >>> >> {"line_of_text":{"string":"Hello"}}
>> >>> >> {"line_of_text":{"string":"World"}}
>> >>> >>
>> >>> >> So when you pipe the data, it doesn't include the warnings.
>> >>> >>
>> >>> >> Regarding the documentation, the CLI itself contains info on all the 
>> >>> >> available commands. Also, there are excellent online resources: 
>> >>> >> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> >>> >>  Is there anything specific that you're missing?
>> >>> >>
>> >>> >> Hope this helps.
>> >>> >>
>> >>> >> Cheers, Fokko
>> >>> >>
>> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <rogpe...@gmail.com>:
>> >>> >>>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> I've been trying to use avro-tools to verify Avro implementations, 
>> >>> >>> and I've come across an issue. Perhaps someone here might be able to 
>> >>> >>> help?
>> >>> >>>
>> >>> >>> When I run avro-tools with some subcommands, it prints a bunch of 
>> >>> >>> warnings (see below) to the standard output. Does anyone know a way 
>> >>> >>> to disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and 
>> >>> >>> avro-tools 1.9.1.
>> >>> >>>
>> >>> >>> The warnings are somewhat annoying because they can corrupt output 
>> >>> >>> of tools that print to the standard output, such as recodec.
>> >>> >>>
>> >>> >>> Aside: is there any documentation for the commands in avro-tools? 
>> >>> >>> Some seem to have some command-line help (though unfortunately there 
>> >>> >>> doesn't seem to be a standard way of showing it), but often that 
>> >>> >>> help often doesn't describe what the command actually does.
>> >>> >>>
>> >>> >>> Here's the output that I see:
>> >>> >>>
>> >>> >>> WARNING: An illegal reflective access operation has occurred
>> >>> >>> WARNING: Illegal reflective access by 
>> >>> >>> org.apache.hadoop.security.authentication.util.KerberosUtil 
>> >>> >>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method 
>> >>> >>> sun.security.krb5.Config.getInstance()
>> >>> >>> WARNING: Please consider reporting this to the maintainers of 
>> >>> >>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of further 
>> >>> >>> illegal reflective access operations
>> >>> >>> WARNING: All illegal access operations will be denied in a future 
>> >>> >>> release
>> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load 
>> >>> >>> native-hadoop library for your platform... using builtin-java 
>> >>> >>> classes where applicable
>> >>> >>>
>> >>> >>>   cheers,
>> >>> >>>     rog.
>> >>> >>>

Reply via email to