On a second note with regard Spark and read writes as I understand unit
tests are not meant to test database connections. This should be done in
integration tests to check that all the parts work together. Unit tests are
just meant to test the functional logic, and not spark's ability to read
from a database.

I would have thought that if the specific connectivity through third part
tool (in my case reading XML file using Databricks jar) is required, then
this should be done through Read Evaluate Print Loop – REPL environment of
Spark Shell by writing some codec to quickly establish where the API
successfully reads from the XML file.

Does this assertion sound correct?

thanks,

Mich



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 20 May 2020 at 11:58, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> I have a spark job that reads an XML file from HDFS, process it and port
> data to Hive tables, one good and one exception table
>
> The Code itself works fine. I need to create Unit Test with Mockito
> <https://www.vogella.com/tutorials/Mockito/article.html>for it.. A unit
> test should test functionality in isolation. Side effects from other
> classes or the system should be eliminated for a unit test, if possible. So
> basically there are three classes.
>
>
>    1. Class A, reads XML file and created a DF1 on it plus a DF2 on top
>    of DF1. Test data for XML file is already created
>    2. Class B, reads DF2 and post correct data through TempView and Spark
>    SQL to the underlying Hive table
>    3. Class C, read DF2 and post exception data again through TempView
>    and Spark SQL to the underlying Hive exception table
>
> I would like to know for cases covering tests for Class B and Class C what
> Mockito format needs to be used..
>
> Thanks,
>
> Mich
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Reply via email to