17 - Tabular Data Structures for Data Analysis - Oleksandr Zaytsev

p...@highoctane.be Tue, 16 May 2017 10:45:42 -0700

We may also use Discord and do something "somewhat live"

Phil


On Tue, May 16, 2017 at 7:23 PM, <serge.stinckw...@gmail.com> wrote:

> I was asking Philippe but hope to see you also at ESUG !
>
> Envoyé de mon iPhone
>
> Le 16 mai 2017 à 19:02, Oleksandr Zaytsev <olk.zayt...@gmail.com> a
> écrit :
>
> I would love to, but to go to Lille from my country I would need a visa.
> Which is not that easy to acquire.
> So maybe I will come to PharoDays 2018.
> And I will definitely try to come to ESUG Conference in September.
>
> Oleks
>
> On Tue, May 16, 2017 at 7:26 PM, <serge.stinckw...@gmail.com> wrote:
>
>>
>>
>> Envoyé de mon iPhone
>>
>> Le 11 mai 2017 à 11:43, "p...@highoctane.be" <p...@highoctane.be> a
>> écrit :
>>
>> ---------- Message transféré ----------
>> De : "p...@highoctane.be" <p...@highoctane.be>
>> Date : 11 mai 2017 10:54
>> Objet : Re: 11/05/17 - Tabular Data Structures for Data Analysis -
>> Oleksandr Zaytsev
>> À : "Nick Papoylias" <npapoyl...@gmail.com>
>> Cc :
>>
>>
>>
>> On Thu, May 11, 2017 at 10:20 AM, Nick Papoylias <npapoyl...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 11, 2017 at 5:24 AM, Oleksandr Zaytsev <
>>> olk.zayt...@gmail.com> wrote:
>>>
>>>>
>>>> *A. Work done*
>>>>
>>>>    - Downloaded the threaded VM as suggested by Esteban Lorenzano to
>>>>    make Iceberg work. And it does! I have successfully pushed my 
>>>> NeuralNetwork
>>>>    code to GitHub: https://github.com/olekscode/MLNeuralNetwork
>>>>    - Joined the PolyMath organization on GitHub
>>>>    - Created a repository for the TabularDataset project
>>>>    https://github.com/PolyMathOrg/TabularDataset
>>>>    <https://github.com/PolyMathOrg/TabularDataset> as a part of
>>>>    PolyMath organization on GitHub
>>>>    - Fixed a PolyMath issue #25 and made a PR
>>>>    - Read an article from Wolfram Mathematica documentation regarding
>>>>    Dataset. It was one of the reading suggestions sent to me by Nick 
>>>> Papoylias
>>>>
>>>>
>>>> *B. Next steps*
>>>>
>>>>    - Fix more issues of PolyMath, using Iceberg. I have to get used to
>>>>    it by the time the coding phase starts
>>>>    - Read the rest of Nick Papoylias's suggestions
>>>>
>>>>
>>>> *C. Help needed*
>>>>
>>>>    - The Dataset in Wolfram, as well as Pandas in Python, has a very
>>>>    advanced indexing system. Smalltalk has its own special conventions for
>>>>    indexing, so I think that it would be great if I got familiar with them.
>>>>    Could you suggest me some reading on this topic (what are the indexing
>>>>    conventions in Smalltalk?).
>>>>    For example, in Wolfram, I can write *dataset[[-1]]* to extract the
>>>>    last row. But in Pharo indexes can not be negative. In Pharo I would 
>>>> say *dataset
>>>>    last*. But how about *dataset[[-5]]*?
>>>>
>>>> This would be a good exercise for you ;) In Pharo you can easily add
>>> negative indexing yourself.
>>>
>>> *Hint:* You know the index of the last element, since this is the size
>>> of the collection, so... ;)
>>>
>>> No need for changes, this exists already.
>>
>> Use atWrap: index put: value and atWrap: with negative indexes.
>> 'hello' atWrap: -2
>>
>> There is a specific version for Array using a primitive.
>> #[ 10 20 30 40 ] atWrap: -1
>>
>> atWrap:0 gives you the last item.
>> atWrap: -1 gives 30
>>
>> This is different from 0 based index languages.
>>
>> The interesing thing about atWrap: is that it uses modulo interally so
>> you do not need to care about that.
>>
>> ($/ split: 'abc/def/ghi/jkl') atWrap: -1
>> --> 'ghi'
>>
>> The Matrix class has a bunch of things API wise but the class is highly
>> inefficient, doing copies all the time etc. It would be nice to have some
>> kind of futures/copy on write style things in there.
>>
>> I miss cbind and rbind. These are useful. I have some half baked super
>> inefficient implementations of these things for Matrix.
>>
>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html
>>
>> The ability to name columns is also nice to have.
>>
>> In R one does:
>>
>> df <- dataframe()
>> cbind(df, c(1,2,3))
>> cbind(df, c(4,5,6))
>> names(df)<-("C1", "C2", "C3")
>> names can be found back with:
>>
>> names(df)
>>
>> A Smalltalkish style would be welcome.
>>
>>
>>
>>
>> Interesting ! Are you coming to PharoDays ? We can talk about that if we
>> found time.
>>
>> Maybe looking at the Voyage queries can be helpful.
>>
>> Phil
>>
>>
>>
>>> Try adding an extention method to Ordrered or SequenceableCollection.
>>>
>>> If the Pharo by example chapter is not enough or the MOOC, read the
>>> source
>>> itself in the core, to see how basic methods are implemented (it is less
>>> scary,
>>> than it sounds).
>>>
>>> You can also try Chapters 9, 10, 11 of the blue book (some API changes
>>> may apply):
>>>
>>> <http://goog_1902892863>
>>> http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/Bluebook.pdf
>>>
>>>
>>>>    - Or what is the best way of implementing this index:
>>>>    *dataset[["name"]]* (extracts a named row), *dataset[[1]*]
>>>>    (extracts the first row)? Should I create two separate messages: 
>>>> *dataset
>>>>    rowNamed: 'name'* and *dataset rowAt: 1*?
>>>>
>>>> rowNamed:
>> rowAt:
>>
>> yes, look like it.
>>
>> But if we want to model things like R dataframes for example, this has to
>> be seen as a vectorized operation, so you can to use row slices, column
>> slices, and logical indexes.
>>
>> Check this out:
>>
>> http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice
>> https://www.r-bloggers.com/working-with-data-frames/
>>
>>
>>
>>> The internal representation of your data-structure can be anything at
>>> the moment, *as long as you encapsulate it.*
>>>
>>> (ie it can be nested OrderedCollections with meta-data for column-names
>>> to indexes, or dictionary of collections etc).
>>>
>>> *If you don't expose it to the user* (ie return it from the public api,
>>> or expect knowledge of it in argument passing),
>>> we can easily change it later. So *first make it work, and we optimize
>>> later ;)*
>>>
>>> For your case it will be a little bit trickier because *you also have
>>> the notions of a) rows and b) columns*, which
>>> are exposed to the user. So *you would need to create abstractions* for
>>> these too.
>>>
>>> Cheers,
>>>
>>> Nick
>>>
>>>>
>>>>    -
>>>>
>>>>
>>>> If someone else is having problems with Iceberg on Linux, try
>>>> downloading the threaded VM:
>>>>
>>>> wget -O- get.pharo.org/vmT60 | bash
>>>>
>>>> And use SSH (not HTTPS) remote URL.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Pharo Google Summer of Code" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to pharo-gsoc+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to pharo-g...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8
>>>> qkTqfQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Pharo Google Summer of Code" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pharo-gsoc+unsubscr...@googlegroups.com.
>>> To post to this group, send email to pharo-g...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7G
>>> h1c0sM%3DA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

Re: [Pharo-users] Fwd: Re: 11/05/17 - Tabular Data Structures for Data Analysis - Oleksandr Zaytsev

Reply via email to