Re: [Laboratory Toolkit] proposing a new Apache Commons component

Valentin Waeselynck Tue, 17 Dec 2013 16:04:13 -0800

Hello to all,

As you asked me, I have changed the structure to make the Laboratory Toolkit a 
Maven project, and added some code samples to show its use cases. (Sorry for 
the delay, I've had a rough couple of weeks).


In the code samples, you may find the following examples :
    - accounting : the simplest example, in the field of enterprise finance. 
It's an  application that takes as an input the accounting documents of a 
company (the Balance Sheet and the Income Statement), and calculates from these 
a variety of financial quantities, such as the Net Income, and profitability 
ratios, such as the Return On Equity.

    - integer : a more mathematical example, in the fields of arithmetics and 
algebra. The base data is simpy a positive integer; the application computes 
things like the set of divisors of this integer, then some more advanced 
algebra objects such as the Ring of modulos of this integer, its canonical 
Chinese factorization and the isomorphism between them, up to its a set of 
generators of its Group of Invertibles. 

    - search-engine : implements a basic search-engine on a corpus of 
documents, using classical ranking functions such as BM25 or TF-IDF in a Vector 
Space Model. That one is closer to reality, it's directly inspired from an 
"Introduction to Big Data" class I just had.
    - text : illustrates the HTML guide in the repository, don't look at this 
directly.
 
Of course, these are only toy examples, and I don't have the ambition of 
replacing software that already does this very well; but I hope they're 
informative enough about this API's genericity and possible applications. 

In the real world, this API is originated from my developing an application 
that generates advertisements snippets from HTML product pages, in which I used 
this toolkit extensively for extracting and ranking keywords from the HTML 
document; but I can't show you that code. My experience with this is that I 
find it much easier to look at a sequence of formulas on a paper, and implement 
them one by one.

In my opinion, the main features of this API are :
    - making some algorithms easier to develop by expressing their concepts in 
terms of analyses and results (as an analogy, think of how the Executor API 
lets you describe concurrent algorithms in terms of tasks and executors)
    - built-in support for the Intercept, Cache and Invoke pattern
    - enforcing a modular architecture without hindering the communication 
between the modules (i.e the Analysis objects)
    - through the use of Laboratory objects, emulating a new scope that is an 
alternative to class scope or method scope.
    - separating the concerns of declaring of the steps of an algorithm are 
computed, and externally requesting their results.
    - encouraging the exploration of a space of strategies and parameters for 
the algorithms, by concentrating all these parameters in one place (the 
Equipment object).

I hope you'll like it, and I'm always eager for feedback!

With best wishes,


Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waesely...@polytechnique.edu
+33 6 80 84 99 54




Le Vendredi 6 décembre 2013 14h21, Benedikt Ritter <brit...@apache.org> a écrit 
:
 
2013/12/5 Christian Grobmeier <grobme...@gmail.com>

> On 5 Dec 2013, at 13:44, Valentin Waeselynck wrote:
>
>  Should I keep answering to the whole ML about this, or only to you?
>>
>
> Keep the mailing list in loop. There might be others interested in this.
> In addition ml do document history which is why we always use the ml.


Thanks for chiming in on this, Christian!

Valentin: Before you invest a lot of work to get maven and some tests in
place, let us start with the example code, so that people can decide
 if
your projects fits into commons.

Benedikt


>
>
>
>
>
>
>> Best regards,
>>
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waesely...@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>>
>>
>> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <brit...@apache.org> a
>> écrit :
>>
>> Bonjour Valentin,
>>
>> welcome to the ML. Good to hear that you've decided to join the open
>> source
>> movement.
>>
>> First of all, it would really help, if you could elaborate some use cases
>> for your library. You're talking about building algorithms. What kind of
>> algorithms can be build with Laboratory Toolkit? Can you give some code
>> examples (just create some gists at github that show the the use of
>> Laboratory Toolkit)?
>>
>> There is an important requirement for any code to be incorporated into the
>> Apache
 code base:
>> - the interlectual property (IP) of the code has to be owned completely by
>> the contributor. You said, that you've build the Laboratory Toolkit for a
>> research project. Are you sure that you own the code? Or is it the result
>> of your work and thus is owned by your employer?
>>
>> At commons we have some additinal requirements:
>> - There should be a group of people who is willing to maintain the code
>> - Commons components should in general not depend on any other libraries
>> - Commons uses maven as the main build tool, so there should be a maven
>> build available
>> - The code should have a good test coverage
>>
>> You have to figure the IP issue
 out on your own first.
>> After that, if the community decides to accept this contribution, we can
>> work on the commons requirements.
>>
>> Best regards and thank you,
>> Benedikt
>>
>>
>>
>> 2013/12/4 Valentin Waeselynck <valentinwaesely...@yahoo.fr>
>>
>>    Hello to all,
>>>
>>> As part of a small research project (which combined techniques of
>>> text-mining, machine-learning and natural language generation, not that
>>> it's really relevant) I have come to design a small JavaSE library,
 which
>>> I'm for the moment calling the Laboratory Toolkit, for developing our
>>> algorithms in a comfortable and flexible manner.
>>>
>>> I have found it to be quite generic and reusable, not tied to any
>>> application domain, while still being rather accessible, and small enough
>>> to comprehend it easily. Therefore, I would like to propose it as a new
>>> Apache Commons component. I would be very grateful if one of you could
>>> tell me what steps I should follow for that purpose.
>>>
>>> I have uploaded it on Github :
>>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>>> find the sources, the javadoc, and a small guide I have started to write
>>> for it (also attached to this mail).
>>>
>>> Of course, I am very open to feedback and criticism on your behalf. The
>>> last thing I want is to publish an immature or useless component; nor do
>>> I
>>> take a positive answer from you for granted.
>>>
>>> If I have failed to follow the proper procedure to propose a new
>>> candidate
>>> component, it is not on purpose, and I apologize in advance.
>>>
>>> Whatever your reply, and since I have the chance, I would also like to
>>> congratulate you for all your
 work. The Apache Commons components have
>>> really been lifesavers to me, on many occasions.
>>>
>>> With best wishes,
>>>
>>> Valentin WAESELYNCK
>>> Étudiant en 3° année à l'École Polytechnique
>>> valentin.waesely...@polytechnique.edu
>>> +33 6 80 84 99 54
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>
>
> ---
> http://www.grobmeier.de
> @grobmeier
> GPG: 0xA5CC90DB

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Reply via email to