I am sorry, I must have misunderstood the purpose of this thread. I
read "Even if you have a vague idea, you can contribute." and tried to
give a couple of vague ideas.

I did not really mean that I would be able or have time to mentor such a project

2015-02-18 11:01 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
> OK, try making a proposal then, http://gsoc.pharo.org has the instructions 
> and the current list, you probably know more about data science than I do.
>
>> On 18 Feb 2015, at 10:53, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
>>
>> I am sorry if the previous messages came off as too harsh. The Neo
>> tools are perfectly fine for their intended use.
>>
>> What I was trying to say is that a good idea for a SoC project would
>> be to develop a framework for data analysis that would be useful for
>> data scientists, and in particular this would include something to
>> import unstructured data more freely.
>>
>> 2015-02-18 10:39 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
>>> Well, you are certainly free to contribute.
>>>
>>> Heuristic interpretation of data could be useful, but looks like an 
>>> addition on top, the core library should be fast and efficient.
>>>
>>>> On 18 Feb 2015, at 10:35, Andrea Ferretti <ferrettiand...@gmail.com> wrote:
>>>>
>>>> For an example of what I am talking about, see
>>>>
>>>> http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
>>>>
>>>> I agree that this is definitely too much options, but it gets the job
>>>> done for quick and dirty exploration.
>>>>
>>>> The fact is that working with a dump of table on your db, whose
>>>> content you know, requires different tools than exploring the latest
>>>> opendata that your local municipality has put online, using yet
>>>> another messy format.
>>>>
>>>> Enterprise programmers deal more often with the former, data
>>>> scientists with the latter, and I think there is room for both kind of
>>>> tools
>>>>
>>>> 2015-02-18 10:26 GMT+01:00 Andrea Ferretti <ferrettiand...@gmail.com>:
>>>>> Thank you Sven. I think this should be emphasized and prominent on the
>>>>> home page*. Still, libraries such as pandas are even more lenient,
>>>>> doing things such as:
>>>>>
>>>>> - autodetecting which fields are numeric in CSV files
>>>>> - allowing to fill missing data based on statistics (for instance, you
>>>>> can say: where the field `age` is missing, use the average age)
>>>>>
>>>>> Probably there is room for something built on top of Neo
>>>>>
>>>>>
>>>>> * by the way, I suggest that the documentation on Neo could benefit
>>>>> from a reorganization. Right now, the first topic  on the NeoJSON
>>>>> paper introduces JSON itself. I would argue that everyone that tries
>>>>> to use the library knows what JSON is already. Still, there is no
>>>>> example of how to read JSON from a file in the whole document.
>>>>>
>>>>> 2015-02-18 10:12 GMT+01:00 Sven Van Caekenberghe <s...@stfx.eu>:
>>>>>>
>>>>>>> On 18 Feb 2015, at 09:52, Andrea Ferretti <ferrettiand...@gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Also, these tasks
>>>>>>> often involve consuming data from various sources, such as CSV and
>>>>>>> Json files. NeoCSV and NeoJSON are still a little too rigid for the
>>>>>>> task - libraries like pandas allow to just feed a csv file and try to
>>>>>>> make head or tails of the content without having to define too much of
>>>>>>> a schema beforehand
>>>>>>
>>>>>> Both NeoCSV and NeoJSON can operate in two ways, (1) without the 
>>>>>> definition of any schema's or (2) with the definition of schema's and 
>>>>>> mappings. The quick and dirty explore style is most certainly possible.
>>>>>>
>>>>>> 'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: 
>>>>>> in) upToEnd ].
>>>>>>
>>>>>> => an array of arrays
>>>>>>
>>>>>> 'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: 
>>>>>> in) next ].
>>>>>>
>>>>>> => objects structured using dictionaries and arrays
>>>>>>
>>>>>> Sven
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to