The student application opened, so I'm going to submit my proposal.
Does anyone have suggestions or requests?

Cheers,
Matteo

Il giorno lunedì 3 marzo 2014 15:36:43 UTC+1, Matteo Ceccarello ha scritto:
>
> Hello everybody,
>
> I’m Matteo Ceccarello, a PhD student in Computer Engineering from 
> university of
> Padova, Italy. Recently, I’ve started working with Clojure, a language 
> that I
> find both powerful and fun. I’ve participated to GSoC 2013 and 2012 with
> JPF <http://babelfish.arc.nasa.gov/trac/jpf/wiki>. I’m interested in
> participating to GSoC 2014 with Clojure, in order to gain confidence with
> clojure under the guidance of a mentor while contributing to the 
> community. I’ll
> describe what I’d like to do below.
> Motivation 
>
> Currently, alongside my research on parallel graph algorithms, I’ve started
> writing a web crawler in Clojure. The reasons I want to do this in clojure 
> are
> many:
>
>    - The web crawling software we are using
>    (Heritrix <https://crawler.archive.org>) performs blocking IO. I think 
>    that in
>    big crawls this is a showstopper, since to achieve a good throughput 
>    one must
>    use many Java threads (in the order of 500) that end up spending most 
>    of their
>    time performing blocking waits. 
>    - Clojure has the wonderful core.async library, with all the magic 
>    that go
>    blocks and channels bring. 
>    - The http-kit library makes possible to perform async requests to web
>    servers in a very simple way. 
>    - Clojure has an awesome support for concurrency. 
>
> What I want to achieve eventually is a concurrent asynchronous web crawler,
> without a single blocking call.
>
> Since I’m new to the language, before starting to get some real work done, 
> I’ve
> worked on a few small projects, to get a feel of the language. What I 
> learned is
> that everything you put in a ref or atom must be immutable. A fundamental
> component of a web crawler is a data structure that tells you if you have
> already visited a URL. Bloom filters are a common choice for the 
> implementation
> of this component. Since this bloom filter will be put in a ref, I want 
> it to
> be immutable and, for efficiency reasons, persistent. After some research 
> on the
> web, I found out that an immutable persistent implementation of the bloom 
> filter
> is still missing from the Clojure ecosystem.
>
> Since it seems that Clojure is missing a persistent implementation of many
> probabilistic data structures, I came up with the following idea.
> Persistent probabilistic data structures for Clojure 
>
> Clojure seems to miss persistent implementations of many useful 
> probabilistic
> data structures, in particular:
>
>    - Bloom filters <http://webhdd.ru/library/files/10.1.1.127.9672.pdf> 
>    - Counting Bloom 
> filters<http://webhdd.ru/library/files/10.1.1.127.9672.pdf> 
>    - Compact approximators <http://arxiv.org/pdf/cs.DS/0306046.pdf> 
>    - HyperLogLog 
> counters<http://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewPDFInterstitial/dmAH0110/2100>
>  
>    - Count-min 
> sketches<http://twiki.di.uniroma1.it/pub/Ing_algo/WebHome/p14_Cormode_JAl_05.pdf>
>  
>
> Of course one can use one of the various implementations of these data 
> structures for
> Java, however, being these implementations mutable, they cannot be used
> in idiomatic concurrent clojure code (as for my understanding of idiomatic
> concurrent clojure code).
>
> What I propose to realize is an optimized persistent implementation of 
> these
> libraries. Hence I plan to explore different paths using
> benchmarks <https://github.com/hugoduncan/criterium> as a guide,
> for instance to decide whether it’s more convenient to use standard 
> clojure vectors
> to represent bit vectors or to provide a custom persistent
> bit vector implementation.
>
> Since I like the feeling of having a static type checker that
> prevents common errors, the library will be annotated with
> Typed Clojure <http://typedclojure.org/>. Moreover I find interesting the 
> idea
> of using tests as machine checkable documentation, and
> midje-doc <http://docs.caudate.me/lein-midje-doc/> seems the right tool 
> for the job.
>
> Is there someone interested in mentoring me with this project?
>
> Yours sincerely
> Matteo
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to