Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-17 Thread Péter Váry
Bumping this thread a bit.

Cleaning up the pool in non-static cases should be a responsibility of the
user. If they want a pool which is closed by a hook when the JVM exists
they should explicitly "say" so, for example calling "newExitingWorkerPool".

This is a behaviour change in the API, so I think we need feedback from the
community before proceeding with it.
What are your thoughts?

Thanks,
Peter

冯佳捷  ezt írta (időpont: 2024. szept. 13., P, 17:16):

> Hi all,
>
> During the investigation of a metaspace memory leak issue in Flink
> IcebergSource ( https://github.com/apache/iceberg/pull/11073 ), a
> discussion with @pvary revealed that *ThreadPools.newWorkerPool*
> currently registers a Shutdown Hook via ExitingExecutorService for all
> created thread pools. While this ensures graceful shutdown of the pools
> when the JVM exits, it might lead to unnecessary Shutdown Hook
> accumulation, especially when the pool is explicitly closed within the
> application's lifecycle.
>
> I propose to *modify ThreadPools.newWorkerPool to not register a Shutdown
> Hook by default*. This would prevent potential issues where developers
> might unintentionally register numerous Shutdown Hooks when using
> ThreadPools.newWorkerPool for short-lived thread pools.
> To retain the existing functionality for long-lived thread pools that
> require a Shutdown Hook, I suggest introducing a new, more descriptive
> function, such as *newExitingWorkerPool*. This function would explicitly
> create thread pools that are registered with a Shutdown Hook.
>
> *This change might potentially impact users who rely on the implicit
> Shutdown Hook registration provided by the current
> ThreadPools.newWorkerPool implementation.*
> I would like to gather feedback from the community regarding this proposed
> change, especially regarding potential compatibility concerns.
>
> Best regards,
> Feng Jiajie
>
>


[DISCUSS] Column to Column filtering

2024-09-17 Thread Baldwin, Jennifer
I’m starting a thread to discuss a feature for comparisons using column 
references on the left and right side of an expression wherever iceberg 
supports column reference to literal value(s) comparisons.  The use case we 
want to support is filtering of date columns from a single table.  For instance:

select * from travel_table
where expected_date > travel_date;

select * from travel_table
where payment_date <>  due_date;


The changes will impact row and scan file filtering.  Impacted jars are 
iceberg-api, iceberg-core, iceberg-orc and iceberg-parquet.

Is this a feature the Iceberg community would be willing to accept?

Here is a link to a Draft PR with current changes, Thanks.
https://github.com/apache/iceberg/pull/11152



Re: [DISCUSS] Improving Position Deletes in V3

2024-09-17 Thread Bryan Keller
Thanks for the doc Anton, I reviewed it and it looks good to me. Let me know if 
I can help with anything, it is an area of interest for me.

-Bryan


> On Aug 21, 2024, at 2:28 PM, Anton Okolnychyi  wrote:
> 
> Hey folks,
> 
> As discussed during the sync, I've been working on a proposal to improve the 
> handling of position deletes in V3. It builds on lessons learned from 
> deploying the current approach at scale and addresses all unresolved 
> questions from past community discussions and proposals around this topic.
> 
> In particular, the proposal attempts to address the following shortcomings we 
> observe today:
> Choosing between fewer delete files on disk or targeted deletes.
> Dependence on external maintenance for consistent write and read performance.
> Writing and reading overhead as in-memory and on-disk representations differ.
> Please, take a look at the doc [1] and let me know what you think. Any 
> feedback is highly appreciated!
> 
> - Anton
> 
> [1] - 
> https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM
> 
> 



Re: Hive 4 integration to store table on S3 and ADLS gen2

2024-09-17 Thread Ayush Saxena
Hi Somesh,

But while trying so we are seeing following exception :
> hadoop fs -ls s3a://somesh.qa.bucket/ -:



This has nothing to do with Hive as such, You have configured Hadoop S3
client wrong, you are missing configs, your hadoop ls command itself is
failing, there is no Hive involved here. You need to setup the FileSystem
correctly...

This is a hadoop problem, maybe you can explore reading this doc in hadoop
[1] & that might help, if you still face issues, you should bug the Hadoop
mailing lists not hive

-Ayush

[1]
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

On Wed, 18 Sept 2024 at 11:12, Awasthi, Somesh
 wrote:

> Hi Team,
>
>
>
> I want to setup hive4 standalone to store table on S3 and Adls gen2 as a
> storage .
>
>
>
> Could you please help me as with proper steps and configurations required
> for this.
>
>
>
> Because we are facing multiple issue on this please help me here ASPA.
>
>
>
> *What we tried.*
>
>
>
> I am trying to configure AWS S3 configuration with the Hadoop and Hive
> setup.
>
> But while trying so we are seeing following exception :
>
> hadoop fs -ls s3a://somesh.qa.bucket/ -:
>
> Fatal internal error java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>
> *To resolve this I have added hadoop-aws-3.3.6.jar and
> aws-java-sdk-bundle-1.12.770.jar in Hadoop classpath.*
>
> *i.e is under : /usr/local/hadoop/share/hadoop/common/lib*
>
> *And S3 related configurations in the core-site.xml file: under
> /usr/local/hadoop/etc/hadoop directory.*
>
> fs.default.name s3a://somesh.qa.bucket fs.s3a.impl
> org.apache.hadoop.fs.s3a.S3AFileSystem fs.s3a.endpoint
> s3.us-west-2.amazonaws.com fs.s3a.access.key {Access _Key_Value}
> fs.s3a.secret.key {Secret_Key_Value} fs.s3a.path.style.access false
>
> Now when we try hadoop fs -ls s3a://somesh.qa.bucket/
>
> We are observing following exception :
>
> 2024-08-22 13:50:11,294 INFO impl.MetricsConfig: Loaded properties from
> hadoop-metrics2.properties
> 2024-08-22 13:50:11,376 INFO impl.MetricsSystemImpl: Scheduled Metric
> snapshot period at 10 second(s).
> 2024-08-22 13:50:11,376 INFO impl.MetricsSystemImpl: s3a-file-system
> metrics system started
> 2024-08-22 13:50:11,434 WARN util.VersionInfoUtils: The AWS SDK for Java
> 1.x entered maintenance mode starting July 31, 2024 and will reach end of
> support on December 31, 2025. For more information, see
> https://aws.amazon.com/blogs/developer/the-aws-sdk-for-java-1-x-is-in-maintenance-mode-effective-july-31-2024/
> You can print where on the file system the AWS SDK for Java 1.x core
> runtime is located by setting the AWS_JAVA_V1_PRINT_LOCATION environment
> variable or aws.java.v1.printLocation system property to 'true'.
> This message can be disabled by setting the
> AWS_JAVA_V1_DISABLE_DEPRECATION_ANNOUNCEMENT environment variable or
> aws.java.v1.disableDeprecationAnnouncement system property to 'true'.
> The AWS SDK for Java 1.x is being used here:
> at java.lang.Thread.getStackTrace(Thread.java:1564)
> at
> com.amazonaws.util.VersionInfoUtils.printDeprecationAnnouncement(VersionInfoUtils.java:81)
> at com.amazonaws.util.VersionInfoUtils.(VersionInfoUtils.java:59)
> at com.amazonaws.internal.EC2ResourceFetcher.(EC2ResourceFetcher.java:44)
> at
> com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.(InstanceMetadataServiceCredentialsFetcher.java:38)
> at
> com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:111)
> at
> com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:91)
> at
> com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:75)
> at
> com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:58)
> at
> com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.initializeProvider(EC2ContainerCredentialsProviderWrapper.java:66)
> at
> com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.(EC2ContainerCredentialsProviderWrapper.java:55)
> at
> org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider.(IAMInstanceCredentialsProvider.java:53)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProvider(S3AUtils.java:727)
> at
> org.apache.hadoop.fs.s3a.S3AUtils.buildAWSProviderList(S3AUtils.java:659)
> at
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:585)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:959)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:586)