Hi all,

As source code cleanup work is in progress, I would like to start a
discussion thread on the License headers strategy.

Our project includes codes from Greenplum and PostgreSQL, so we need to
handle the license headers carefully. I believe we all hope to align with
the ASF policies while maintaining traditional code conventions for better
engineering practices.

/*
Here are some related tasks for reference. Feedback is welcome via GitHub:
* For NOTICE and LICENSE files, see PR #812[1] (not ready for review).
* For renaming the old brand in files to align the Apache brand, see PR
#731[2] (ready for review).
* For GitHub workflows regarding brand name checks, see PR #787[3] (ready
for review).

We must complete the cleanup work before creating any releases as requested
by ASF(https://incubator.apache.org/guides/transitioning_asf.html).
*/

For this discussion, I referred to the `ASF Source header policy`[4] page
and practices from other ASF projects like HAWQ[5] & Pekko[6]. Hope we can
work together to achieve consensus on the following proposal points before
taking action. Welcome to have your suggestions and feedback, especially
from our mentors and members familiar with ASF rules.

## Proposed Cases

I divided the scenarios into four cases:

- Case 1: Completely original, new files created by the Cloudberry
community.
- Case 2: Modifications/additions to the 3rd-party source files with
existing copyright headers.
- Case 3: Modifications/additions to the 3rd-party source files without
copyright headers.
- Case 4: No modifications/additions to the 3rd-party (PG/GP) source files.

I will share my thoughts on each case below.

## Case 1: New, Original Files

For completely original, new files (created either before or after the
donation to ASF), we can use the standard Apache License header.

Standard version (use Clang as an example):
```
/*-------------------------------------------------------------------------
 *
 * example.c (option)
 *  use for one example of how to add a licenser header (option)
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 * IDENTIFICATION (option)
 *  src/timezone/zic.c (option)
 *
 *-------------------------------------------------------------------------
 */
```

As we adopt the PostgreSQL coding conventions and align the style, add the
`file name`, `description`, `IDENTIFICATION` and `file path` fields as the
options. They're not required, but should be encouraged.

## Case 2 Modifications/Additions to Files with Existing Copyright Headers

When we have modifications/additions to the existing 3rd-party (PG/GP)
source files with copyright headers, we're asked to `Do Not` listed in the
following. Instead, we can use a short Apache License version.

/* Reference: ASF Rules [7]
1. Do not modify or remove any copyright notices or licenses within
third-party works.
2. Do not add the standard Apache License header to the top of third-party
source files.
3. Minor modifications/additions to third-party source files should
typically be licensed under the same terms as the rest of the third-party
source for convenience.
4. The project's PMC should deal with major modifications/additions to
third-party source files on a case-by-case basis.
*/

Short version 1 (use Clang as an example):
```
/*-------------------------------------------------------------------------
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * license agreements; and to You under the Apache License, version 2.0:
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * This file is part of the Apache Cloudberry project.
 * -------------------------------------------------------------------------
 */
 ```

When we want to add the license header in bulk to mounts of 3rd-party
files, we can directly insert the short version to the file header, this
way is accepted but not gorgeous. Here is one example of how to insert the
short header version into the existing files with the legacy copyright info:

```
/*-------------------------------------------------------------------------
 *
 * index.c
 *  code to create and destroy POSTGRES index relations
 *
 * -------------------------------------------------------------------------
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * license agreements; and to You under the Apache License, version 2.0:
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * This file is part of the Apache Cloudberry project.
 * -------------------------------------------------------------------------
 *
 * Portions Copyright (c) 2006-2009, Greenplum inc
 * Portions Copyright (c) 2012-Present VMware, Inc. or its affiliates.
 * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 *
 * IDENTIFICATION
 *  src/backend/catalog/index.c
 *
 *-------------------------------------------------------------------------
 */
```

We need to figure out the Minor or Major modifications/additions. It needs
a trick for this. I would suggest that if the modifications/additions
exceed 5 lines (including comments) then call it a major modification, or
see if the modifications/additions have a significant percentage for a file
with a few lines though modifications/additions less than 5 lines (eg,
https://github.com/apache/cloudberry/blob/f34ae7241633e4672c7a0bfb6d5e9f5be72f8619/src/include/storage/copydir.h,
this file has only 5 Lines of Effective Code, changed one line will have
major effects on this file.).

Why 5 lines? It's just an empirical value. We can write a basic function
statement in 5 lines.

## Case 3 Modifications/additions to the 3rd-party source files without
copyright headers.

If we have modifications/additions to the existing 3rd-party (PG/GP) source
files, but they don't have a copyright header, we can add the following
short version to the files.

Short version 2 (use Clang as an example):
```
/*-------------------------------------------------------------------------
 *
 * example.c (option)
 *  use for one example of how to add a licenser header (option)
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * license agreements; and to You under the Apache License, version 2.0:
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * This file is part of the Apache Cloudberry project.
 *
 * IDENTIFICATION (option)
 *  src/timezone/zic.c (option)
 *
 * -------------------------------------------------------------------------
 */
 ```

## Case 4 No modifications/additions to the 3rd-party (PG/GP) source files.

If we have no modifications/additions to the existing 3rd-party (PG/GP)
source files, there are no actions to be taken.

[1] https://github.com/apache/cloudberry/pull/812
[2] https://github.com/apache/cloudberry/pull/731
[3] https://github.com/apache/cloudberry/pull/787
[4] https://www.apache.org/legal/src-headers.html#header-text
[5] https://github.com/apache/hawq/blob/master/src/include/utils/uri.h
[6]
https://github.com/apache/pekko/blob/main/cluster-typed/src/main/protobuf/ClusterMessages.proto
[7] https://www.apache.org/legal/src-headers.html#3party

I know many of our community members are on holiday and not yet back to
their usual work routine. There’s no rush to close this discussion—let’s
keep it open until most people are back.

Best,
Dianjin Wang
## Standard version (use Clang as an example):

/*-------------------------------------------------------------------------
 *
 * example.c (option)
 *        use for one example of how to add a licenser header (option)
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 * IDENTIFICATION (option)
 *        src/timezone/zic.c (option)
 *            
 *-------------------------------------------------------------------------
 */

## Short version 1 (use Clang as an example):

/*-------------------------------------------------------------------------
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * license agreements; and to You under the Apache License, version 2.0:
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * This file is part of the Apache Cloudberry project.
 * -------------------------------------------------------------------------
 */

## Short version 2 (use Clang as an example):

/*-------------------------------------------------------------------------
 *
 * example.c (option)
 *        use for one example of how to add a licenser header (option)
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * license agreements; and to You under the Apache License, version 2.0:
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * This file is part of the Apache Cloudberry project.
 *
 * IDENTIFICATION (option)
 *        src/timezone/zic.c (option)
 *
 * -------------------------------------------------------------------------
 */

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to