---------- Message transféré ---------- De : "p...@highoctane.be" <p...@highoctane.be> Date : 11 mai 2017 10:54 Objet : Re: 11/05/17 - Tabular Data Structures for Data Analysis - Oleksandr Zaytsev À : "Nick Papoylias" <npapoyl...@gmail.com> Cc :
On Thu, May 11, 2017 at 10:20 AM, Nick Papoylias <npapoyl...@gmail.com> wrote: > > > On Thu, May 11, 2017 at 5:24 AM, Oleksandr Zaytsev <olk.zayt...@gmail.com> > wrote: > >> >> *A. Work done* >> >> - Downloaded the threaded VM as suggested by Esteban Lorenzano to >> make Iceberg work. And it does! I have successfully pushed my >> NeuralNetwork >> code to GitHub: https://github.com/olekscode/MLNeuralNetwork >> - Joined the PolyMath organization on GitHub >> - Created a repository for the TabularDataset project >> https://github.com/PolyMathOrg/TabularDataset >> <https://github.com/PolyMathOrg/TabularDataset> as a part of PolyMath >> organization on GitHub >> - Fixed a PolyMath issue #25 and made a PR >> - Read an article from Wolfram Mathematica documentation regarding >> Dataset. It was one of the reading suggestions sent to me by Nick >> Papoylias >> >> >> *B. Next steps* >> >> - Fix more issues of PolyMath, using Iceberg. I have to get used to >> it by the time the coding phase starts >> - Read the rest of Nick Papoylias's suggestions >> >> >> *C. Help needed* >> >> - The Dataset in Wolfram, as well as Pandas in Python, has a very >> advanced indexing system. Smalltalk has its own special conventions for >> indexing, so I think that it would be great if I got familiar with them. >> Could you suggest me some reading on this topic (what are the indexing >> conventions in Smalltalk?). >> For example, in Wolfram, I can write *dataset[[-1]]* to extract the >> last row. But in Pharo indexes can not be negative. In Pharo I would say >> *dataset >> last*. But how about *dataset[[-5]]*? >> >> This would be a good exercise for you ;) In Pharo you can easily add > negative indexing yourself. > > *Hint:* You know the index of the last element, since this is the size of > the collection, so... ;) > > No need for changes, this exists already. Use atWrap: index put: value and atWrap: with negative indexes. 'hello' atWrap: -2 There is a specific version for Array using a primitive. #[ 10 20 30 40 ] atWrap: -1 atWrap:0 gives you the last item. atWrap: -1 gives 30 This is different from 0 based index languages. The interesing thing about atWrap: is that it uses modulo interally so you do not need to care about that. ($/ split: 'abc/def/ghi/jkl') atWrap: -1 --> 'ghi' The Matrix class has a bunch of things API wise but the class is highly inefficient, doing copies all the time etc. It would be nice to have some kind of futures/copy on write style things in there. I miss cbind and rbind. These are useful. I have some half baked super inefficient implementations of these things for Matrix. https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html The ability to name columns is also nice to have. In R one does: df <- dataframe() cbind(df, c(1,2,3)) cbind(df, c(4,5,6)) names(df)<-("C1", "C2", "C3") names can be found back with: names(df) A Smalltalkish style would be welcome. Maybe looking at the Voyage queries can be helpful. Phil > Try adding an extention method to Ordrered or SequenceableCollection. > > If the Pharo by example chapter is not enough or the MOOC, read the source > itself in the core, to see how basic methods are implemented (it is less > scary, > than it sounds). > > You can also try Chapters 9, 10, 11 of the blue book (some API changes may > apply): > > <http://goog_1902892863> > http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/Bluebook.pdf > > >> - Or what is the best way of implementing this index: >> *dataset[["name"]]* (extracts a named row), *dataset[[1]*] (extracts >> the first row)? Should I create two separate messages: *dataset >> rowNamed: 'name'* and *dataset rowAt: 1*? >> >> rowNamed: rowAt: yes, look like it. But if we want to model things like R dataframes for example, this has to be seen as a vectorized operation, so you can to use row slices, column slices, and logical indexes. Check this out: http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice https://www.r-bloggers.com/working-with-data-frames/ > The internal representation of your data-structure can be anything at the > moment, *as long as you encapsulate it.* > > (ie it can be nested OrderedCollections with meta-data for column-names to > indexes, or dictionary of collections etc). > > *If you don't expose it to the user* (ie return it from the public api, > or expect knowledge of it in argument passing), > we can easily change it later. So *first make it work, and we optimize > later ;)* > > For your case it will be a little bit trickier because *you also have the > notions of a) rows and b) columns*, which > are exposed to the user. So *you would need to create abstractions* for > these too. > > Cheers, > > Nick > >> >> - >> >> >> If someone else is having problems with Iceberg on Linux, try downloading >> the threaded VM: >> >> wget -O- get.pharo.org/vmT60 | bash >> >> And use SSH (not HTTPS) remote URL. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Pharo Google Summer of Code" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pharo-gsoc+unsubscr...@googlegroups.com. >> To post to this group, send email to pharo-g...@googlegroups.com. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8 >> qkTqfQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "Pharo Google Summer of Code" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pharo-gsoc+unsubscr...@googlegroups.com. > To post to this group, send email to pharo-g...@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/ms > gid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7G > h1c0sM%3DA%40mail.gmail.com > <https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. >