Re: codeq: add support for importing from github, importing tags & branches

Dan Burkert Sun, 19 May 2013 19:55:25 -0700

On Sunday, May 19, 2013 7:08:19 PM UTC-4, Rich Morin wrote:

> On May 18, 2013, at 16:36, Dan Burkert wrote: 
>


>   What mechanisms are you using to manage and run Codeq? 
>
 
Right now I don't do a whole lot with Codeq (beyond working with its 
internals).  I've had the idea (like a lot of people, I think) of pulling 
in the corpus of clojure projects on github and putting a nice web 
interface in front of it.  Having support for analyzing repositories 
directly from github makes this significantly easier, since you then don't 
have to actually store the repositories anywhere.  As far as putting storm 
in front of codeq, I think it would be possible, but probably overkill.  I 
don't have any direct experience with storm, but I am aware of its use 
cases, and I have a lot of Hadoop experience.  I just don't think the data 
set size is nearly big enough to warrant the distribution.  It would 
probably only take a few days for a codeq instance to crawl github and 
import the clojure projects that 99% of potential users would want to see. 
 Once the initial import is done, taking care of updates would relatively 
easy.  I would ballpark that there aren't more than 1000 commits to 
important clojure projects on github daily.  I may be way off-base though.
 

>   What support would you want in a production release?
>
 
#1 - A better analyzer.  If codeq could analyze down to the s-expression 
level and determine what function is being called in each expression it 
would open up a world of opportunities.  Many smart people have discussed 
this, and I'm not sure I really have anything to add about the feasibility 
or how it could be done.

#2 - Building on #1, determine not only what function is being called in 
each expression (i.e., the namespace and symbol), but also what git 
repository it came from and what commit in that repository.  Obviously this 
requires, at a minimum, the ability to parse the dependency information 
from the project.clj.  It would probably also require Clojars integration 
to determine the git repository and commit from the version.


>   What other import facilities would you like to have? 
>

When codeq was first released there were thoughts of substituting the 
shelling out behavior with a faster way to read git repository data.  That 
should be significantly easier now with the repository protocol.  I'm not 
going to tackle it myself because I don't need local imports to be faster, 
but its an open problem.

I think there is a lot of room to pull in more metadata about repositories, 
and that could be very useful for certain use cases.  For example, this 
commit<https://github.com/danburkert/codeq/commit/3ec970b0e446fe9abc4e57da6fcdd546bc7c0f86>
 in 
my codeq fork adds a parent attribute to repository entities, which is a 
pointer to where the repository was forked from.  This is easy to get 
through the github API; for local repos it uses the "upstream" remote, as 
that seems to be somewhat standard (at least as standard as treating the 
"origin" remote as the uri for the repo).  Obviously not all repositories 
are forks, so not all repositories will have parents.  If a repo does have 
a parent which is not already imported, the import fails (this could be 
changed though).

Finally, I forgot to mention in my original email that you should not 
import projects into an already existing codeq database with the patches, 
because I changed the format of repository URIs.  Instead of using the raw 
address of the "origin" remote, i.e., 
"https://github.com/Datomic/codeq.git";, I instead use a transformed 
version, "github.com/Datomic/codeq".  I feel this is a better way of doing 
it, because the following are all valid git URIs, and mixing them could 
result in multiply importing a project:

https://github.com/Datomic/codeq.git
https://github.com/Datomic/codeq
g...@github.com:Datomic/codeq.git
g...@github.com:Datomic/codeq
git://github.com/Datomic/codeq.git
git://github.com/Datomic/codeq

With my patch all of these URI's are transformed to 
"github.com/Datomic/codeq", so it is impossible to multiply import a repo.

-- Dan

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: codeq: add support for importing from github, importing tags & branches

Reply via email to