recursive types

2019-12-04 Thread roger peppe
Hi, My apologies in advance if this topic has been well discussed before - the mailing list search tool appears to be broken (the link points to the expired domain name "search-hadoop.com"). I'm trying to understand about recursive types in Avro, given that the specification says about names

defaults for complex types (was Re: recursive types)

2019-12-05 Thread roger peppe
} ] }, "default": ["A", "B", "C"] } ] } This seems like it should be valid according to the spec, because default value encodings don't encode the type name in enums, unlike in the JSON encoding, bu

Re: defaults for complex types (was Re: recursive types)

2019-12-06 Thread roger peppe
On Fri, 6 Dec 2019 at 10:38, Ryan Skraba wrote: > Hello! I had a Java unit test ready to go (looking at default values > for complex types for AVRO-2636), so just reporting back (the easy > work!): > Thanks for the responses! > 1. In Java, the schema above is parsed without error, but when >

Re: defaults for complex types (was Re: recursive types)

2019-12-06 Thread roger peppe
On Fri, 6 Dec 2019 at 13:49, Lee Hambley wrote: > Rog, > > I alluded to it previously, but I really think you should send a PR to > improve the docs. This knowledge was hard-won for you, and the authors of > Avro are quite responsive at the moment after a couple of years (the > black-winter of 1.

Re: Resolving a possible specification inconsistency pertaining to the doc attribute

2019-12-09 Thread roger peppe
Somewhat relevant, here is a CUE schema for Avro schemas that I wrote a little while ago that can be used to check Avro schema compliance to a degree (if you haven't heard of CUE, there's a b

Re: Resolving a possible specification inconsistency pertaining to the doc attribute

2019-12-10 Thread roger peppe
alidating schema substantially less useful, because it can't report misspelled fields, but the schema should nonetheless allow it given that the spec does. Is there code in the avro project that is manipulating schemas and > stripping metadata silently? I would consider that a bug. For &

records with without fields?

2019-12-13 Thread roger peppe
Hi, The specification doesn't seem to make it entirely clear whether it's allowable for a record to contain no fields (a zero-length array for the fields member). I've found at least one implementation that complains about a record with an empty fields array, and I'm wondering if this is a bug. A

Re: records with without fields?

2019-12-13 Thread roger peppe
hen the need materializes. > Could you expand a little on that latter point, please? I'm not sure I understand what you're saying. A concrete example might help. cheers, rog. > > > On December 13, 2019, at 9:25 AM, roger peppe wrote: > > > Hi, > > The spe

Re: records with without fields?

2019-12-13 Thread roger peppe
. Otherwise, you run the risk of requiring > non-interchangeable re-identification if you need required, non-default, > fields when the need materializes. > > > > > > > > On December 13, 2019, at 9:25 AM, roger peppe > wrote: > > > > > > Hi, > >

Re: records with without fields?

2019-12-13 Thread roger peppe
On Fri, 13 Dec 2019 at 20:45, Jonah H. Harris wrote: > On Fri, Dec 13, 2019 at 11:56 AM Ryan Skraba wrote: > >> I think the spec is OK with it. We've even used it in the Java API > > > Hmm. I understood it as fields were required. Though, I could see how it’s > written could also mean zero. We

Re: records with without fields?

2019-12-13 Thread roger peppe
>> non-interchangeable re-identification if you need required, non-default, >>> fields when the need materializes. >>> >> >> Could you expand a little on that latter point, please? I'm not sure I >> understand what you're saying. >>

Re: records with without fields?

2019-12-14 Thread roger peppe
his for the file metadata field. cheers, rog. > On Fri, Dec 13, 2019 at 6:28 PM roger peppe wrote: > >> >> >> On Fri, 13 Dec 2019 at 23:08, Vance Duncan wrote: >> >>> Sorry about that. I was assuming some kind of name-based schema registry >

name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-18 Thread roger peppe
Hi, Background: I've been contemplating the proposed Avro format in the CloudEvent specification , which defines standard metadata for events. It defines a very generic format for an event that allows storage of almost any data. It se

Re: name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-19 Thread roger peppe
n with explicitly typed > metadata fields and names as well!) > Thanks again for your feedback. I'll try making a proposal for a different CloudEvent format, and try to get some implementations to relax their rules a bit. cheers, rog. All my best, Ryan > > On Wed, Dec 18, 2019

Re: name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-20 Thread roger peppe
as/jaxrs-spf4j-demo/wiki/AvroReferences */ > string schema; > > bytes data; > > } > > this way a system that is interested in the metadata does not even have to > deserialize the payload…. > > hope it helps. > > —Z > > > On Dec 18, 2019, at 11:49 AM, roger p

Re: name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-20 Thread roger peppe
e the schema of the data, for efficiency, you can use a > schema id + schema repo, or something like > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences */ > string schema; > > bytes data; > > } > > this way a system that is interested in the metadata does not

Re: name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-28 Thread roger peppe
semantics within the Avro format. (You might notice that I omitted some fields which are arguably redundant when one knows the writer's schema, eg. data content type and data schema). cheers, rog. > On Wed, Dec 18, 2019 at 11:49 AM roger peppe wrote: > >> Hi, &g

More idiomatic JSON encoding for unions

2020-01-06 Thread roger peppe
Hi, The JSON encoding in the specification includes an explicit type name for all kinds of object other than null. This means that a JSON-encoded Avro value with a union is very rarely directly compatible with normal JSON formats. For

Re: More idiomatic JSON encoding for unions

2020-01-06 Thread roger peppe
n/json using > the encoder suggested as implementation in the jira. > > Somebody needs to find the time do the work to integrate this... > > --Z > > > > > On Monday, January 6, 2020, 12:36:44 PM EST, roger peppe < > rogpe...@gmail.com> wrote: > > > Hi, &g

Re: More idiomatic JSON encoding for unions

2020-01-14 Thread roger peppe
>>> >>> I would be a great fan of this as well. This also bothered me. The >>> tricky part here is to see when to release this because it will break the >>> existing JSON structure. We could make this configurable as well. >>> >>> Cheers, Fok

Re: More idiomatic JSON encoding for unions

2020-01-14 Thread roger peppe
On Tue, 14 Jan 2020 at 19:26, Zoltan Farkas wrote: > Makes sense, > > We have to agree on he scope of this implementation. > > Right now the implementation I have in java, handles only the: > > union {null, [some type]} situation. > > Are we ok with this for a start? > I'm not sure that it's wor

Re: More idiomatic JSON encoding for unions

2020-01-15 Thread roger peppe
lues as strings. On Tue, 14 Jan 2020 at 21:57, roger peppe wrote: > On Tue, 14 Jan 2020 at 19:26, Zoltan Farkas wrote: > >> Makes sense, >> >> We have to agree on he scope of this implementation. >> >> Right now the implementation I have in java, handles on

Re: More idiomatic JSON encoding for unions

2020-01-15 Thread roger peppe
On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas wrote: > See comments in-line below: > > On Jan 15, 2020, at 3:42 AM, roger peppe wrote: > > Oops, I left arrays out! Two other thoughts: > > >- I wonder if it might be worth hedging bets about logical types. It >wo

avro-tools illegal reflective access warnings

2020-01-16 Thread roger peppe
Hi, I've been trying to use avro-tools to verify Avro implementations, and I've come across an issue. Perhaps someone here might be able to help? When I run avro-tools with some subcommands, it prints a bunch of warnings (see below) to the standard output. Does anyone know a way to disable this?

Re: More idiomatic JSON encoding for unions

2020-01-16 Thread roger peppe
d haven't seen any response yet. cheers, rog. > > —Z > > On Jan 15, 2020, at 12:30 PM, roger peppe wrote: > > On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas wrote: > >> See comments in-line below: >> >> On Jan 15, 2020, at 3:42 AM, roger peppe

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread roger peppe
quot;}} > {"line_of_text":{"string":"World"}} > > So when you pipe the data, it doesn't include the warnings. > > Regarding the documentation, the CLI itself contains info on all the > available commands. Also, there are excellent online resource

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread roger peppe
, but in the meantime, is there really nothing in the avro-tools commands that uses a chosen schema to read a data file written with some other schema? That would give me what I'm after currently. Thanks again for the helpful response. cheers, rog. > Best regards, Ryan > >

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread roger peppe
-tools jar? cheers, rog. On Thu, 16 Jan 2020 at 16:45, roger peppe wrote: > > On Thu, 16 Jan 2020 at 13:57, Ryan Skraba wrote: > >> Hello! Is it because you are using brew to install avro-tools? I'm >> not entirely familiar with how it packages the command, but u

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread roger peppe
On Thu, 16 Jan 2020 at 17:21, Ryan Skraba wrote: > Hello! For a simple, silent log4j, I use: > > $ cat /tmp/log4j.properties > log4j.rootLogger=off > Apparently passing those flags has sorted my stdin/stderr issue as well as suppressing the warnings. I wonder what was going on there. Thanks ver

Re: More idiomatic JSON encoding for unions

2020-01-16 Thread roger peppe
On Thu, 16 Jan 2020, 18:59 Zoltan Farkas, wrote: > answers inline > > On Jan 16, 2020, at 5:51 AM, roger peppe wrote: > > On Wed, 15 Jan 2020 at 18:51, Zoltan Farkas wrote: > >> What I mean with timestamp-micros, is that it is currently restricted to >> being bou

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread roger peppe
to build avro-tools alone - I have a suspicion that "Apache Avro Maven Service Archetype" isn't a hard requirement for that. cheers, rog. > Cheers, Fokko > > > Op do 16 jan. 2020 om 18:48 schreef roger peppe : > >> On Thu, 16 Jan 2020 at 17:21, Ryan S

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread roger peppe
On Fri, 17 Jan 2020 at 11:45, roger peppe wrote: > > > On Fri, 17 Jan 2020 at 09:17, Driesprong, Fokko > wrote: > >> Hi Roger, >> >> We also have Java11 in our CI, but it might be that there are still some >> issues with it. I haven't battletested Avr

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread roger peppe
mvn clean install instead and find the > jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar. That > should work with JDK11 without any problem (well-tested in the build). > > Best regards, Ryan > > > > On Thu, Jan 16, 2020 at 5:49 PM roger peppe wrote: > > > &

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread roger peppe
ava:82) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72) at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:99) at org.apache.avro.tool.Main.run(Main.java:66) at org.apache.avro.tool.Main.main(Main.java:55) % I am a bit clueless when it comes to interpret

Re: avro-tools illegal reflective access warnings

2020-01-21 Thread roger peppe
n the header of the file. The B field is not there the > record, so the reader field is not compatible, so it won't work. I'll check > if we can come up with a more meaningful exception. > > Cheers, Fokko > > > > Op vr 17 jan. 2020 om 17:02 schreef roger peppe : >

Re: defaults for complex types (was Re: recursive types)

2020-03-23 Thread roger peppe
On Sun, 22 Mar 2020 at 09:09, Andy Le wrote: > Hi Roger, > > Instead of trying to modify the spec, is it easier for us to discard > schemas with such ambiguity? > > That certainly sounds like a reasonable approach to me. How would you word the definition of ambiguity for this purpose?

Re: defaults for complex types (was Re: recursive types)

2020-03-23 Thread roger peppe
s as for the JSON encoding, but I appreciate that backward-compatibility concerns would make that difficult or impossible to do. > On 2020/03/23 09:44:45, roger peppe wrote: > > On Sun, 22 Mar 2020 at 09:09, Andy Le wrote: > > > > > Hi Roger, > > > > > > Instead

Re: defaults for complex types (was Re: recursive types)

2020-03-24 Thread roger peppe
and writing down some > dis-ambiguity rules. Suggested rule above for enums is one of them. It > would be great if you can provide me other ones. > > To me, using rules is the most affordable way to keep compatibilities. > > If you care, please check my fork https://github.com/anhldbk/avro

Re: Avro Spec: encoding unions

2020-03-30 Thread roger peppe
AIUI, longs are encoded exactly the same as ints, so there should only be a problem if your union has more than 217483647 members, which seems unlikely to me in practice :) On Mon, 30 Mar 2020 at 09:00, Andy Le wrote: > Hey Nandor, > > Here what I see: > - Java/Perl/Python use int values to enco

Re: Is there a way to skip field decoding without materializing the data?

2020-04-03 Thread roger peppe
If you're using a custom codec, this is potentially possible if you're reading only a single record, not a sequence of records. I've been considering implementing this as an optimisation. However, I don't think it's possible to skip entirely when reading a sequence of records because there's no rec

schema resolution vs logical types

2020-04-07 Thread roger peppe
Hi, I'm just contemplating an implementation of the decimal logical type, and I'm a bit confused by the specification around this. On the one hand the specification says : If the Parsing Canonical Forms of two diff

Re: schema resolution vs logical types

2020-04-08 Thread roger peppe
On Tue, 7 Apr 2020 at 17:57, Doug Cutting wrote: > On Tue, Apr 7, 2020 at 4:03 AM roger peppe wrote: > >> On the one hand the specification says >> <https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas> >> : >> >> If th

does anyone use IDL ?

2020-06-19 Thread roger peppe
Hi, Because the syntax is so much more convenient, we thought it would be a good idea to switch from using the raw JSON AVSC format to using IDL (avdl files). Almost immediately, I discovered three quite significant bugs: 1. you can't use default values for record-typed fields: https://issues

Re: AVRO Best Practices for Sparse object storage

2020-06-26 Thread roger peppe
Assuming each field is represented as a union {null, string}, 70 null fields would take about 70 bytes (one byte for the discriminator for each union). One way to reduce that overhead might be to put a bunch of the fields that are very commonly null into a possibly-null sub-record. That way you'd n