Hi Mich! I think you can combine the good/rejected into one method that internally:
- Create good/rejected df's given an input df and input rules/predicates to apply to the df. - Create a third df containing the good rows and the rejected rows with the bad columns nulled out - Append/insert the two dfs into their respective hive good/exception tables - return value can be a tuple of the (goodDf,exceptionsDf,combinedDf) or maybe just the (combinedDf,exceptionsDf) Am Sa., 2. Mai 2020 um 06:00 Uhr schrieb Mich Talebzadeh < mich.talebza...@gmail.com>: > > Hi, > > I have a Spark Scala program created and compiled with Maven. It works > fine. It basically does the following: > > > 1. Reads an xml file from HDFS location > 2. Creates a DF on top of what it reads > 3. Creates a new DF with some columns renamed etc > 4. Creates a new DF for rejected rows (incorrect value for a column) > 5. Puts rejected data into Hive exception table > 6. Puts valid rows into Hive main table > 7. Nullifies the invalid rows by setting the invalid column to NULL > and puts the rows into the main Hive table > > These are currently performed in one method. Ideally I want to break this > down as follows: > > > 1. A method to read the XML file and creates DF and a new DF on top of > previous DF > 2. A method to create a DF on top of rejected rows using t > 3. A method to put invalid rows into the exception table using tmp > table > 4. A method to put the correct rows into the main table again using > tmp table > > I was wondering if this is correct approach? > > Thanks, > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >