[ 
https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843
 ] 

Dennis Gove commented on SOLR-8176:
-----------------------------------

I've been thinking about this a little bit and one thing I keep coming back to 
is that there are different kinds of graph traversals and I think our model 
should take that into account. There are lots of types but I think the two 
major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find 
connected nodes with some set of criteria. For example, given a collection of 
geographic locations (city, county, state, country) with fields "id", "type", 
"parentId", "name" find all cities in NY. As a hiccup the data is not 
completely normalized and some cities have their county listed as their parent 
while some have their state listed as their parent. Ie, you do not know how 
many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography 
collection, all which have some relationship to each other. You select your 
starting (root) nodes as all states named "ny" (there could be more than one). 
You then define what constitutes an ending (leaf) node as all cities. And 
finally, you say that all edges where nodeA.id == nodeB.parentId should be 
followed.

This traversal can be implemented as a relatively simple iterative search 
following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and 
matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does 
not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced 
with the next set of nodes to consider. In the end you have a list of all leaf 
nodes which in some way connect to the original root nodes following the 
defined edge. Note that for simplicity I've left a couple of things out, 
including checking for already traversed nodes to avoid loops. Also, the leaf 
nodes are not added to the frontier but they can be. This would be useful in a 
situation where leaves are connected to leaves.

> Model distributed graph traversals with Streaming Expressions
> -------------------------------------------------------------
>
>                 Key: SOLR-8176
>                 URL: https://issues.apache.org/jira/browse/SOLR-8176
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, SolrCloud, SolrJ
>    Affects Versions: Trunk
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>              Labels: Graph
>             Fix For: Trunk
>
>
> I think it would be useful to model a few *distributed graph traversal* use 
> cases with Solr's *Streaming Expression* language. This ticket will explore 
> different approaches with a goal of implementing two or three common graph 
> traversal use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to