On Sunday, May 19, 2013 7:08:19 PM UTC-4, Rich Morin wrote: > On May 18, 2013, at 16:36, Dan Burkert wrote: >
> What mechanisms are you using to manage and run Codeq? > Right now I don't do a whole lot with Codeq (beyond working with its internals). I've had the idea (like a lot of people, I think) of pulling in the corpus of clojure projects on github and putting a nice web interface in front of it. Having support for analyzing repositories directly from github makes this significantly easier, since you then don't have to actually store the repositories anywhere. As far as putting storm in front of codeq, I think it would be possible, but probably overkill. I don't have any direct experience with storm, but I am aware of its use cases, and I have a lot of Hadoop experience. I just don't think the data set size is nearly big enough to warrant the distribution. It would probably only take a few days for a codeq instance to crawl github and import the clojure projects that 99% of potential users would want to see. Once the initial import is done, taking care of updates would relatively easy. I would ballpark that there aren't more than 1000 commits to important clojure projects on github daily. I may be way off-base though. > What support would you want in a production release? > #1 - A better analyzer. If codeq could analyze down to the s-expression level and determine what function is being called in each expression it would open up a world of opportunities. Many smart people have discussed this, and I'm not sure I really have anything to add about the feasibility or how it could be done. #2 - Building on #1, determine not only what function is being called in each expression (i.e., the namespace and symbol), but also what git repository it came from and what commit in that repository. Obviously this requires, at a minimum, the ability to parse the dependency information from the project.clj. It would probably also require Clojars integration to determine the git repository and commit from the version. > What other import facilities would you like to have? > When codeq was first released there were thoughts of substituting the shelling out behavior with a faster way to read git repository data. That should be significantly easier now with the repository protocol. I'm not going to tackle it myself because I don't need local imports to be faster, but its an open problem. I think there is a lot of room to pull in more metadata about repositories, and that could be very useful for certain use cases. For example, this commit<https://github.com/danburkert/codeq/commit/3ec970b0e446fe9abc4e57da6fcdd546bc7c0f86> in my codeq fork adds a parent attribute to repository entities, which is a pointer to where the repository was forked from. This is easy to get through the github API; for local repos it uses the "upstream" remote, as that seems to be somewhat standard (at least as standard as treating the "origin" remote as the uri for the repo). Obviously not all repositories are forks, so not all repositories will have parents. If a repo does have a parent which is not already imported, the import fails (this could be changed though). Finally, I forgot to mention in my original email that you should not import projects into an already existing codeq database with the patches, because I changed the format of repository URIs. Instead of using the raw address of the "origin" remote, i.e., "https://github.com/Datomic/codeq.git", I instead use a transformed version, "github.com/Datomic/codeq". I feel this is a better way of doing it, because the following are all valid git URIs, and mixing them could result in multiply importing a project: https://github.com/Datomic/codeq.git https://github.com/Datomic/codeq g...@github.com:Datomic/codeq.git g...@github.com:Datomic/codeq git://github.com/Datomic/codeq.git git://github.com/Datomic/codeq With my patch all of these URI's are transformed to "github.com/Datomic/codeq", so it is impossible to multiply import a repo. -- Dan -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.