write some tests and ask for a good implementations. Crazy implementors like henrik can probably beat us all :)
On Wed, May 17, 2017 at 7:55 PM, Stephane Ducasse <stepharo.s...@gmail.com> wrote: > I'm interested to help for such new "containers". > May be we should proceed that way: > > > On Tue, May 16, 2017 at 7:44 PM, p...@highoctane.be <p...@highoctane.be> > wrote: > >> We may also use Discord and do something "somewhat live" >> >> Phil >> >> On Tue, May 16, 2017 at 7:23 PM, <serge.stinckw...@gmail.com> wrote: >> >>> I was asking Philippe but hope to see you also at ESUG ! >>> >>> Envoyé de mon iPhone >>> >>> Le 16 mai 2017 à 19:02, Oleksandr Zaytsev <olk.zayt...@gmail.com> a >>> écrit : >>> >>> I would love to, but to go to Lille from my country I would need a visa. >>> Which is not that easy to acquire. >>> So maybe I will come to PharoDays 2018. >>> And I will definitely try to come to ESUG Conference in September. >>> >>> Oleks >>> >>> On Tue, May 16, 2017 at 7:26 PM, <serge.stinckw...@gmail.com> wrote: >>> >>>> >>>> >>>> Envoyé de mon iPhone >>>> >>>> Le 11 mai 2017 à 11:43, "p...@highoctane.be" <p...@highoctane.be> a >>>> écrit : >>>> >>>> ---------- Message transféré ---------- >>>> De : "p...@highoctane.be" <p...@highoctane.be> >>>> Date : 11 mai 2017 10:54 >>>> Objet : Re: 11/05/17 - Tabular Data Structures for Data Analysis - >>>> Oleksandr Zaytsev >>>> À : "Nick Papoylias" <npapoyl...@gmail.com> >>>> Cc : >>>> >>>> >>>> >>>> On Thu, May 11, 2017 at 10:20 AM, Nick Papoylias <npapoyl...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, May 11, 2017 at 5:24 AM, Oleksandr Zaytsev < >>>>> olk.zayt...@gmail.com> wrote: >>>>> >>>>>> >>>>>> *A. Work done* >>>>>> >>>>>> - Downloaded the threaded VM as suggested by Esteban Lorenzano to >>>>>> make Iceberg work. And it does! I have successfully pushed my >>>>>> NeuralNetwork >>>>>> code to GitHub: https://github.com/olekscode/MLNeuralNetwork >>>>>> - Joined the PolyMath organization on GitHub >>>>>> - Created a repository for the TabularDataset project >>>>>> https://github.com/PolyMathOrg/TabularDataset >>>>>> <https://github.com/PolyMathOrg/TabularDataset> as a part of >>>>>> PolyMath organization on GitHub >>>>>> - Fixed a PolyMath issue #25 and made a PR >>>>>> - Read an article from Wolfram Mathematica documentation >>>>>> regarding Dataset. It was one of the reading suggestions sent to me >>>>>> by Nick >>>>>> Papoylias >>>>>> >>>>>> >>>>>> *B. Next steps* >>>>>> >>>>>> - Fix more issues of PolyMath, using Iceberg. I have to get used >>>>>> to it by the time the coding phase starts >>>>>> - Read the rest of Nick Papoylias's suggestions >>>>>> >>>>>> >>>>>> *C. Help needed* >>>>>> >>>>>> - The Dataset in Wolfram, as well as Pandas in Python, has a very >>>>>> advanced indexing system. Smalltalk has its own special conventions >>>>>> for >>>>>> indexing, so I think that it would be great if I got familiar with >>>>>> them. >>>>>> Could you suggest me some reading on this topic (what are the indexing >>>>>> conventions in Smalltalk?). >>>>>> For example, in Wolfram, I can write *dataset[[-1]]* to extract >>>>>> the last row. But in Pharo indexes can not be negative. In Pharo I >>>>>> would >>>>>> say *dataset last*. But how about *dataset[[-5]]*? >>>>>> >>>>>> This would be a good exercise for you ;) In Pharo you can easily add >>>>> negative indexing yourself. >>>>> >>>>> *Hint:* You know the index of the last element, since this is the >>>>> size of the collection, so... ;) >>>>> >>>>> No need for changes, this exists already. >>>> >>>> Use atWrap: index put: value and atWrap: with negative indexes. >>>> 'hello' atWrap: -2 >>>> >>>> There is a specific version for Array using a primitive. >>>> #[ 10 20 30 40 ] atWrap: -1 >>>> >>>> atWrap:0 gives you the last item. >>>> atWrap: -1 gives 30 >>>> >>>> This is different from 0 based index languages. >>>> >>>> The interesing thing about atWrap: is that it uses modulo interally so >>>> you do not need to care about that. >>>> >>>> ($/ split: 'abc/def/ghi/jkl') atWrap: -1 >>>> --> 'ghi' >>>> >>>> The Matrix class has a bunch of things API wise but the class is highly >>>> inefficient, doing copies all the time etc. It would be nice to have some >>>> kind of futures/copy on write style things in there. >>>> >>>> I miss cbind and rbind. These are useful. I have some half baked super >>>> inefficient implementations of these things for Matrix. >>>> >>>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html >>>> >>>> The ability to name columns is also nice to have. >>>> >>>> In R one does: >>>> >>>> df <- dataframe() >>>> cbind(df, c(1,2,3)) >>>> cbind(df, c(4,5,6)) >>>> names(df)<-("C1", "C2", "C3") >>>> names can be found back with: >>>> >>>> names(df) >>>> >>>> A Smalltalkish style would be welcome. >>>> >>>> >>>> >>>> >>>> Interesting ! Are you coming to PharoDays ? We can talk about that if >>>> we found time. >>>> >>>> Maybe looking at the Voyage queries can be helpful. >>>> >>>> Phil >>>> >>>> >>>> >>>>> Try adding an extention method to Ordrered or SequenceableCollection. >>>>> >>>>> If the Pharo by example chapter is not enough or the MOOC, read the >>>>> source >>>>> itself in the core, to see how basic methods are implemented (it is >>>>> less scary, >>>>> than it sounds). >>>>> >>>>> You can also try Chapters 9, 10, 11 of the blue book (some API changes >>>>> may apply): >>>>> >>>>> <http://goog_1902892863> >>>>> http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/Bluebook.pdf >>>>> >>>>> >>>>>> - Or what is the best way of implementing this index: >>>>>> *dataset[["name"]]* (extracts a named row), *dataset[[1]*] >>>>>> (extracts the first row)? Should I create two separate messages: >>>>>> *dataset >>>>>> rowNamed: 'name'* and *dataset rowAt: 1*? >>>>>> >>>>>> rowNamed: >>>> rowAt: >>>> >>>> yes, look like it. >>>> >>>> But if we want to model things like R dataframes for example, this has >>>> to be seen as a vectorized operation, so you can to use row slices, column >>>> slices, and logical indexes. >>>> >>>> Check this out: >>>> >>>> http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice >>>> https://www.r-bloggers.com/working-with-data-frames/ >>>> >>>> >>>> >>>>> The internal representation of your data-structure can be anything at >>>>> the moment, *as long as you encapsulate it.* >>>>> >>>>> (ie it can be nested OrderedCollections with meta-data for >>>>> column-names to indexes, or dictionary of collections etc). >>>>> >>>>> *If you don't expose it to the user* (ie return it from the public >>>>> api, or expect knowledge of it in argument passing), >>>>> we can easily change it later. So *first make it work, and we >>>>> optimize later ;)* >>>>> >>>>> For your case it will be a little bit trickier because *you also have >>>>> the notions of a) rows and b) columns*, which >>>>> are exposed to the user. So *you would need to create abstractions* >>>>> for these too. >>>>> >>>>> Cheers, >>>>> >>>>> Nick >>>>> >>>>>> >>>>>> - >>>>>> >>>>>> >>>>>> If someone else is having problems with Iceberg on Linux, try >>>>>> downloading the threaded VM: >>>>>> >>>>>> wget -O- get.pharo.org/vmT60 | bash >>>>>> >>>>>> And use SSH (not HTTPS) remote URL. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Pharo Google Summer of Code" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to pharo-gsoc+unsubscr...@googlegroups.com. >>>>>> To post to this group, send email to pharo-g...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA >>>>>> 6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Pharo Google Summer of Code" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to pharo-gsoc+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to pharo-g...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ >>>>> 8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> >> >