Re: Hello to the Arrow dev community

Micah Kornfield Tue, 22 Sep 2020 21:28:54 -0700

Welcome to the community Bob.

On Tue, Sep 22, 2020 at 12:27 PM Bob Tinsman <bobt...@pacbell.net> wrote:


> I'd like to introduce myself, because I've had an interest in Arrow for a
> long time and now I have a chance to help out.Up until now, I haven't
> really contributed much in open source, although I've been an avid
> consumer, so I'd like to change that!
> My main areas of work have been performance optimization, Java, databases
> (mostly relational), and optimizing/refactoring architecture, but I also
> have some C/C++ background, and I'm a quick learner of new languages.
>
> The reason that I'm so interested in Arrow is that I've already created
> two in-memory columnar dataset implementations for two different companies,
> so I'm a believer in the power of this model, although I came to it from a
> different perspective.I was just watching this discussion with Wes and
> Jacques: Starting Apache Arrow
>
> |
> |
> |
> |  |  |
>
>  |
>
>  |
> |
> |  |
> Starting Apache Arrow
>
> Our CTO Jacques Nadeau sat down for a fireside chat with Wes Mckinnney,
> discussing the past, present, and future...
>  |
>
>  |
>
>  |
>
>
> Wes lays out two phases of Arrow:- Phase one: Arrow used as a common
> format- Phase two: Arrow used for actual calculationBecause I was working
> on my own, I skipped to phase two.
> I worked for an online marketing survey company called MarketTools in the
> early 00's. Survey results were stored in SQL Server, and we had to
> implement crosstabs on the data; for example, if you wanted to see answers
> to survey answers broken down by age, gender, income range, etc.
> The original implementation would generate some pretty hairy SQL, which
> got pretty slow if there were a lot of questions on the crosstab.I thought
> "why are we asking the DB to run multiple queries on the same data when we
> could pull it into memory once, then do aggregate calculations there?"That
> managed to produce a 5x speedup in running the crosstabs.In my most recent
> company, I created a new in-memory dataset implementation as the basis for
> an interactive data analysis tool. Again I was working with mostly
> relational databases. I was able to push the scalability of the in-memory
> columns a lot more using dictionaries. I also developed a hybrid engine
> combining SQL generation and in-memory calculation, sort of like what Spark
> is doing.If I knew about Arrow, I would have definitely used it, but it
> wasn't around yet. You guys have accomplished a lot--congrats on your 1.0.0
> release, by the way!I'm starting out by grokking all the source and doc,
> and looking at JIRA issues that I could potentially work on, but I'm
> looking forward to helping out however I can.
>

Re: Hello to the Arrow dev community

Reply via email to