Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Sven Krasser
Agreed with Jerry. Aside from Tachyon, seeing this for general debugging would be very helpful. Haoyuan, is that feature you are referring to related to https://issues.apache.org/jira/browse/SPARK-975? In the interim, I've found the "toDebugString()" method useful (but it renders execution as a t

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Haoyuan Li
Jerry, Great question. Spark and Tachyon capture lineage information at different granularities. We are working on an integration between Spark/Tachyon about this. Hope to get it ready to be released soon. Best, Haoyuan On Fri, Jan 2, 2015 at 12:24 PM, Jerry Lam wrote: > Hi spark developers,

Spark or Tachyon: capture data lineage

2015-01-02 Thread Jerry Lam
Hi spark developers, I was thinking it would be nice to extract the data lineage information from a data processing pipeline. I assume that spark/tachyon keeps this information somewhere. For instance, a data processing pipeline uses datasource A and B to produce C. C is then used by another proce