Re: [Spark-Core] Spark Dry Run

2021-10-04 Thread Ali Behjati
Hey Ramiro, Thank you for your detailed answer. We also have a similar framework which does the same and I saw very good results. However, pipelines using normal spark apps require change to adapt to a framework and it requires a lot of effort. This is why I'm suggesting adding it to spark core to

Re: [Spark-Core] Spark Dry Run

2021-10-04 Thread Ramiro Laso
Hello Ali!, I've implemented a dry run in my data pipeline using a schema repository. My pipeline takes a "dataset descriptor", which is a json describing the dataset you want to build, loads some "entities", applies some transformations and then writes the final dataset. Is in the "dataset descrip

Re: [Spark-Core] Spark Dry Run

2021-09-30 Thread Mich Talebzadeh
Ok thanks. What is your experience of VS Code (in terms of capabilities ) as it is becoming a standard tool available in Cloud workspaces like Amazon workspace? Mich view my Linkedin profile *Disclaimer:* Use it at your own risk.

Re: [Spark-Core] Spark Dry Run

2021-09-30 Thread Ali Behjati
Not anything specific in my mind. Any IDE which is open to plugins can use it (e.g: VS Code and Jetbrains) to validate execution plans in the background and mark syntax errors based on the result. On Thu, Sep 30, 2021 at 4:40 PM Mich Talebzadeh wrote: > What IDEs do you have in mind? > > > >

Re: [Spark-Core] Spark Dry Run

2021-09-30 Thread Mich Talebzadeh
What IDEs do you have in mind? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email'

Re: [Spark-Core] Spark Dry Run

2021-09-30 Thread Ali Behjati
Yeah it doesn't remove the need of testing on sample data. It would be more of syntax check rather than test. I have witnessed that syntax errors occur a lot. Maybe after having dry-run we will be able to create some automation around basic syntax checking for IDEs too. On Thu, Sep 30, 2021 at 4:

Re: [Spark-Core] Spark Dry Run

2021-09-30 Thread Sean Owen
If testing, wouldn't you actually want to execute things? even if at a small scale, on a sample of data? On Thu, Sep 30, 2021 at 9:07 AM Ali Behjati wrote: > Hey everyone, > > > By dry run I mean ability to validate the execution plan but not executing > it within the code. I was wondering wheth