Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-30 Thread Mich Talebzadeh
Hi Alex,
Well, that is just Justin's opinion vis-à-vis his matter. It is different
from mine. Bottom line, you can always refer to Oracle or a copyright
expert on this matter and see what they suggest.

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 29 Jan 2024 at 22:05, Alex Porcelli  wrote:

> Hi Mich,
>
> Thank you for the prompt response.
>
> Looks like Justin Mclean has a slightly different perspective on the
> Oracle's license as you can see in [3].
>
>
> On Mon, Jan 29, 2024 at 4:17 PM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> This is not an official response and should not be taken as an
>> official view. It is my own opinion.
>>
>> Looking at the reference [1], I can see a host of inclusion to other JDBC
>> vendor' drivers such as IBM DB2 and MSSQL
>>
>> With regard to link [2], it is already closed (3+ years) and it is
>> assumed that these references are taken as "convenience". There is no
>> implication that JDBC drivers are included on these releases, modified or
>> not modified.
>> Oracle provides multiple JDBC drivers like ojdbc5.jar, ojdbc6.jar,
>> ojdbc7.jar, ojdbv11.jar and so forth, free to download and use within the
>> license (you need to have a valid login to Oracle Center)
>>
>> This is what it says with regard to license
>>
>> Governed by the No-clickthrough FDHUT license
>> 
>>
>> I glanced through the license and did not find anything that
>> contravened Spark references as in [1] in
>>
>>- spark 
>>- /sql 
>>- /core 
>>
>>
>> /pom.xml
>>
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 29 Jan 2024 at 16:16, Alex Porcelli  wrote:
>>
>>> Hi Spark Devs,
>>>
>>> I'm reaching out to understand how you managed to include the Oracle
>>> JDBC as one of your dependencies [1]. According to legal tickers
>>> [2][3], this is considered a Category X dependency and is not allowed.
>>>
>>> (I'm part of the Apache KIE podling, and we are struggling with such a
>>> dependency and being pointed out that you may have a solution to
>>> share).
>>>
>>> [1] - https://github.com/apache/spark/blob/master/sql/core/pom.xml#L187
>>> [2] - https://issues.apache.org/jira/browse/LEGAL-526
>>> [3] - https://issues.apache.org/jira/browse/LEGAL-663
>>>
>>> Regards,
>>>
>>> Alex
>>> Apache KIE
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Spark 3.5.1

2024-01-30 Thread Santosh Pingale
Hey there

Spark 3.5 branch has accumulated 199 commits with quite a few bug
fixes related to correctness. Are there any plans for releasing 3.5.1?

Kind regards
Santosh


Extracting Input and Output Partitions in Spark

2024-01-30 Thread Aditya Sohoni
Hello Spark Devs!


We are from Uber's Spark team.


Our ETL jobs use Spark to read and write from Hive datasets stored in HDFS.
The freshness of the partition written to depends on the freshness of the
data in the input partition(s). We monitor this freshness score, so that
partitions in our critical tables always have fresh data.


We are looking for some code/helper function/utility etc built into the
Spark engine, through which we can programmatically get the list of
partitions read and written by an execution.


We looked for this in the plan, and our initial code study did not pinpoint
us to any such method. We have been dependent on indirect ways like audit
logs of storage, HMS, etc. We find them difficult to use and scale.


However, the spark code does contain the list of partitions read and
written. The below files have the partition data for the given file format:

1. Input partitions from HiveTableScanExec.scala(Text format)

2. Input partitions from DataSourceScanExec.scala(Parquet/Hudi/Orc).

3. Output partitions from InsertIntoHiveTable.scala(Text format)

4. Output partitions from
InsertIntoHadoopFsRelationCommand.scala(Parquet/Hudi/Orc).


We did come up with some code that can help gather this info in a
programmatically friendly way. We maintained this information in the plan.
We wrapped the plan with some convenience classes and methods to extract
the partition details.


We felt that such a programmatic interface could be used for more purposes
as well, like showing in SHS a new set of statistics that can aid in
troubleshooting.


I wanted to know from the Dev Community, is there already something that
is/was implemented in Spark that can solve our requirement? If not, we
would love to share how we have implemented this and contribute to the
community.


Regards,

Aditya Sohoni