It's probably too messy to go into a patch at this point. I just put the code up on a fork:
https://github.com/laserson/avro/tree/perf Phil, perhaps we could sit down at some point and go through it briefly? On Mon, Apr 29, 2013 at 10:56 AM, Philip Zeyliger <[email protected]>wrote: > Hi Uri, > > Once you post to the JIRA, I'd be happy to review it. > > -- Philip > > > On Mon, Apr 29, 2013 at 9:22 AM, Doug Cutting <[email protected]> wrote: > > > Uri, > > > > This sounds awesome! Is the API compatible with the existing API? If > > it's incompatible and cannot easily be made compatible then perhaps we > > can add it as the 'new' API and deprecate the old one. Regardless, > > please file an issue in Jira (issues.apache.org/jira/browse/AVRO) and > > attach your patch there. > > > > Thanks, > > > > Doug > > > > On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <[email protected]> > > wrote: > > > Hi all, > > > > > > I rewrote some of the python code to read avro files. I was able to > > > achieve a ~3x speedup over the current impl, and can probably do better > > if > > > it was cleaned up more. The main changes are: > > > * Eliminated the object-oriented nature of the reader. It's just > > functions > > > now. Presumably this can be changed back, but it didn't really seem > like > > > there was any reason for it. > > > * Given a reader and writer schema, it precomputes as much helpful info > > as > > > it can upfront and caches this in a dictionary that the read functions > > use > > > * The code is compiled with Cython for speedup. > > > > > > How can this be used to improve the current python api? Let me know > how > > I > > > can be helpful... > > > > > > Uri > > > > > > -- > > > Uri Laserson, PhD > > > Data Scientist, Cloudera > > > Twitter/GitHub: @laserson > > > +1 617 910 0447 > > > [email protected] > > > -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 [email protected]
