Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "

2022-01-10 Thread Pralabh Kumar
Hi Spark Team

When creating a database via Spark 3.0 on Hive

1) spark.sql("create database test location '/user/hive'").  It creates the
database location on hdfs . As expected

2) When running the same command on 3.1 the database is created on the
local file system by default. I have to prefix with hdfs to create db on
hdfs.

Why is there a difference in the behavior, Can you please point me to the
jira which causes this change.

Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
having default values(not explicitly set)

Regards
Pralabh Kumar


Re: Apache Spark Jenkins Infra 2022

2022-01-10 Thread Dongjoon Hyun
Thank you, Yikun, Shane, and DB.

Dongjoon

On Sun, Jan 9, 2022 at 10:20 PM DB Tsai  wrote:

> Thank you, Dongjoon for driving the build infra.
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Jan 9, 2022, at 6:38 PM, shane knapp ☠  wrote:
>
>
> apache spark jenkins lives on!
>
> @dongjoon, let me know if there's anything you need.  nice work, as
> always.  :)
>
> shane
>
> On Sat, Jan 8, 2022 at 7:40 PM Yikun Jiang  wrote:
>
>> @Dongjoon Hyun  Thanks for your work on “Apache
>> Spark Jenkins Infra 2022”. I think this work has very important and useful
>> for CI job that Github Actions cannot support yet.
>>
>> Regards,
>> Yikun
>>
>>
>> Dongjoon Hyun  于2022年1月9日周日 07:11写道:
>>
>>> Happy New Year!
>>>
>>> After we sunset our legacy Jenkins Infra on December 23th, 2021,
>>> there were many missing parts in our test coverage combinations.
>>>
>>> From Today, January 8th, 2022, the following test coverage is recovered
>>> and newly added as a starter. Although this is a pilot and a small step
>>> forward,
>>> we will continue to build and improve our test coverage for the
>>> community.
>>>
>>> *## Maven Test Coverage*
>>> Although Apache Spark supports Maven/SBT build and testing,
>>> Maven is our official standard for building Apache Spark distributions.
>>> Since GitHub Action has been covering Maven building only, new Jenkins
>>> infrastructure recovers Maven building and testing.
>>>
>>> *## Java 17 on Apple Silicon Coverage*
>>> Since there is no publicly available CI option for us to test Apple
>>> Silicon machines,
>>> the new Jenkins infrastructure is running Java/Scala/Python/R testing on
>>> Apple Silicon.
>>> Please note that Java 17 is the first Java release supporting Apple
>>> Silicon natively.
>>> (JEP 391: macOS/AArch64 Port supporting Apple M1-based machine)
>>>
>>> 
>>>
>>> This is *maintained by Apache Spark PMC*.
>>> We have more details to be discussed before exposing this infra to the
>>> public.
>>>
>>> I want to give my heartfelt thanks to the generous donor and
>>> ASF Foundation Fundraise Team for making this happen.
>>>
>>> Thanks,
>>> Dongjoon
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
>
>


[VOTE] Release Spark 3.2.1 (RC1)

2022-01-10 Thread huaxin gao
Please vote on releasing the following candidate as Apache Spark version
3.2.1.

The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if a
majority
+1 PMC votes are cast, with a minimum of 3 + 1 votes.

[ ] +1 Release this package as Apache Spark 3.2.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 3.2.1 (try project = SPARK AND
"Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v3.2.1-rc1 (commit
2b0ee226f8dd17b278ad11139e62464433191653):
https://github.com/apache/spark/tree/v3.2.1-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1395/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/

The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/7tzik

This release is using the release script of the tag v3.2.1-rc1.

FAQ


=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.1?
===

The current list of open tickets targeted at 3.2.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "

2022-01-10 Thread Pablo Langa Blanco
Hi Pralabh,

If it helps, it is probably related to this change
https://github.com/apache/spark/pull/28527

Regards

On Mon, Jan 10, 2022 at 10:42 AM Pralabh Kumar 
wrote:

> Hi Spark Team
>
> When creating a database via Spark 3.0 on Hive
>
> 1) spark.sql("create database test location '/user/hive'").  It creates
> the database location on hdfs . As expected
>
> 2) When running the same command on 3.1 the database is created on the
> local file system by default. I have to prefix with hdfs to create db on
> hdfs.
>
> Why is there a difference in the behavior, Can you please point me to the
> jira which causes this change.
>
> Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
> having default values(not explicitly set)
>
> Regards
> Pralabh Kumar
>