Re: 3x faster python reader

Uri Laserson Mon, 29 Apr 2013 11:56:14 -0700

It's probably too messy to go into a patch at this point.  I just put the
code up on a fork:


https://github.com/laserson/avro/tree/perf

Phil, perhaps we could sit down at some point and go through it briefly?


On Mon, Apr 29, 2013 at 10:56 AM, Philip Zeyliger <[email protected]>wrote:

> Hi Uri,
>
> Once you post to the JIRA, I'd be happy to review it.
>
> -- Philip
>
>
> On Mon, Apr 29, 2013 at 9:22 AM, Doug Cutting <[email protected]> wrote:
>
> > Uri,
> >
> > This sounds awesome!  Is the API compatible with the existing API?  If
> > it's incompatible and cannot easily be made compatible then perhaps we
> > can add it as the 'new' API and deprecate the old one.  Regardless,
> > please file an issue in Jira (issues.apache.org/jira/browse/AVRO) and
> > attach your patch there.
> >
> > Thanks,
> >
> > Doug
> >
> > On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <[email protected]>
> > wrote:
> > > Hi all,
> > >
> > > I rewrote some of the python code to read avro files.  I was able to
> > > achieve a ~3x speedup over the current impl, and can probably do better
> > if
> > > it was cleaned up more.  The main changes are:
> > > * Eliminated the object-oriented nature of the reader.  It's just
> > functions
> > > now.  Presumably this can be changed back, but it didn't really seem
> like
> > > there was any reason for it.
> > > * Given a reader and writer schema, it precomputes as much helpful info
> > as
> > > it can upfront and caches this in a dictionary that the read functions
> > use
> > > * The code is compiled with Cython for speedup.
> > >
> > > How can this be used to improve the current python api?  Let me know
> how
> > I
> > > can be helpful...
> > >
> > > Uri
> > >
> > > --
> > > Uri Laserson, PhD
> > > Data Scientist, Cloudera
> > > Twitter/GitHub: @laserson
> > > +1 617 910 0447
> > > [email protected]
> >
>



-- 
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
[email protected]

Re: 3x faster python reader

Reply via email to