jsai28 opened a new issue, #15483:
URL: https://github.com/apache/datafusion/issues/15483

   Would there be any interest in building a data quality framework like [Great 
Expectations](https://github.com/great-expectations/great_expectationshttps://github.com/great-expectations/great_expectations)
 or [Deequ](https://github.com/awslabs/deequ) (built on spark) except in Rust 
using DataFusion? As far as I am aware, there is nothing like this in Rust let 
alone built on DataFusion. 
   
   The idea is essentially a Rust-based tool to specify unit-like tests for 
your data. Users would specify tests (called expectations in Great 
Expectations) and then DataFusion could be used for the underlying metric 
computation. Essentially something like this:
   
   ```
   fn main() {
         let validator = create_validator(‘example.csv’);
         
         validator.is_not_null(“id”); // specify column for null check
         validator.min_value(“price”, 0); // specify column and minimum value
         validator.validate();
   }
   ```
   
   Which could return an output like this:
   
   ```
   ✅ id: Passed (All values not null)
   ❌ price: Failed (2 values below 0)
   ```
   
   It would be a pretty niche tool that could be apart of a larger data 
pipeline. I was thinking it could be a good project to work on for GSoC. What 
do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to