[jira] [Created] (HDFS-16156) HADOOP-AWS with Spark on Kubernetes (EKS)

Abhinav Kumar (Jira) Sun, 08 Aug 2021 21:14:10 -0700

Abhinav Kumar created HDFS-16156:
------------------------------------

             Summary: HADOOP-AWS with Spark on Kubernetes (EKS)
                 Key: HDFS-16156
                 URL: https://issues.apache.org/jira/browse/HDFS-16156
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.2.0
            Reporter: Abhinav Kumar



I am trying to read parquet saved in S3 via Spark on EKS using hadoop-AWS 
3.2.0. There are 112 partitions (each around 130MB) for a particular month.

 

The data is being read but very very slowly. I just keep seeing below and very 
small dataset actually being fetched.

 

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: Values passed - text: read on 
s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet,
 idempotent: true, Retried: 
org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, 
Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@454de3d3

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: retryUntranslated begin

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: Values passed - text: lazySeek on 
s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet,
 idempotent: true, Retried: 
org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, 
Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3776ef6c

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: retryUntranslated begin

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: Values passed - text: read on 
s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet,
 idempotent: true, Retried: 
org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, 
Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3602676a

21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 
(TID 63) Invoker: retryUntranslated begin

 

Here is the spark config for hadoop-aws.
|spark.hadoop.fs.s3a.assumed.role.sts.endpoint: https://sts.amazonaws.com|
|spark.hadoop.fs.s3a.assumed.role.sts.endpoint.region: us-east-1|
|spark.hadoop.fs.s3a.attempts.maximum: 20|
|spark.hadoop.fs.s3a.aws.credentials.provider: 
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider|
|spark.hadoop.fs.s3a.block.size: 128M|
|spark.hadoop.fs.s3a.connection.establish.timeout: 50000|
|spark.hadoop.fs.s3a.connection.maximum: 50|
|spark.hadoop.fs.s3a.connection.ssl.enabled: true|
|spark.hadoop.fs.s3a.connection.timeout: 2000000|
|spark.hadoop.fs.s3a.endpoint: s3.us-east-1.amazonaws.com|
|spark.hadoop.fs.s3a.etag.checksum.enabled: false|
|spark.hadoop.fs.s3a.experimental.input.fadvise: normal|
|spark.hadoop.fs.s3a.fast.buffer.size: 1048576|
|spark.hadoop.fs.s3a.fast.upload: true|
|spark.hadoop.fs.s3a.fast.upload.active.blocks: 8|
|spark.hadoop.fs.s3a.fast.upload.buffer: bytebuffer|
|spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem|
|spark.hadoop.fs.s3a.list.version: 2|
|spark.hadoop.fs.s3a.max.total.tasks: 30|
|spark.hadoop.fs.s3a.metadatastore.authoritative: false|
|spark.hadoop.fs.s3a.metadatastore.impl: 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore|
|spark.hadoop.fs.s3a.multiobjectdelete.enable: true|
|spark.hadoop.fs.s3a.multipart.purge: true|
|spark.hadoop.fs.s3a.multipart.purge.age: 86400|
|spark.hadoop.fs.s3a.multipart.size: 32M|
|spark.hadoop.fs.s3a.multipart.threshold: 64M|
|spark.hadoop.fs.s3a.paging.maximum: 5000|
|spark.hadoop.fs.s3a.readahead.range: 65536|
|spark.hadoop.fs.s3a.retry.interval: 500ms|
|spark.hadoop.fs.s3a.retry.limit: 20|
|spark.hadoop.fs.s3a.retry.throttle.interval: 500ms|
|spark.hadoop.fs.s3a.retry.throttle.limit: 20|
|spark.hadoop.fs.s3a.s3.client.factory.impl: 
org.apache.hadoop.fs.s3a.DefaultS3ClientFactory|
|spark.hadoop.fs.s3a.s3guard.ddb.background.sleep: 25|
|spark.hadoop.fs.s3a.s3guard.ddb.max.retries: 20|
|spark.hadoop.fs.s3a.s3guard.ddb.region: us-east-1|
|spark.hadoop.fs.s3a.s3guard.ddb.table: s3-data-guard-master|
|spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.read: 500|
|spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.write: 100|
|spark.hadoop.fs.s3a.s3guard.ddb.table.create: true|
|spark.hadoop.fs.s3a.s3guard.ddb.throttle.retry.interval: 1s|
|spark.hadoop.fs.s3a.socket.recv.buffer: 8388608|
|spark.hadoop.fs.s3a.socket.send.buffer: 8388608|
|spark.hadoop.fs.s3a.threads.keepalivetime: 60|
|spark.hadoop.fs.s3a.threads.max: 50|

 

Not sure if you need it - still putting it across (other spark configuration)
|spark.app.id: spark-b97cb651f3f14c6cb3197079376a74c7|
|spark.app.startTime: 1628476986471|
|spark.blockManager.port: 0|
|spark.broadcast.compress: true|
|spark.checkpoint.compress: true|
|spark.cleaner.periodicGC.interval: 2min|
|spark.cleaner.referenceTracking: true|
|spark.cleaner.referenceTracking.blocking: true|
|spark.cleaner.referenceTracking.blocking.shuffle: true|
|spark.cleaner.referenceTracking.cleanCheckpoints: true|
|spark.cores.max: 5|
|spark.driver.bindAddress: 28.132.124.86|
|spark.driver.blockManager.port: 0|
|spark.driver.cores: 5|
|spark.driver.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions 
-XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'|
|spark.driver.host: xxx-xxxx-xxx-8be6777b28caacc7-driver-svc.default.svc|
|spark.driver.maxResultSize: 10008m|
|spark.driver.memory: 10008m|
|spark.driver.memoryOverhead: 384m|
|spark.driver.port: 7078|
|spark.driver.rpc.io.clientThreads: 5|
|spark.driver.rpc.io.serverThreads: 5|
|spark.driver.rpc.netty.dispatcher.numThreads: 5|
|spark.driver.shuffle.io.clientThreads: 5|
|spark.driver.shuffle.io.serverThreads: 5|
|spark.dynamicAllocation.cachedExecutorIdleTimeout: 600s|
|spark.dynamicAllocation.enabled: false|
|spark.dynamicAllocation.executorAllocationRatio: 1.0|
|spark.dynamicAllocation.executorIdleTimeout: 60s|
|spark.dynamicAllocation.initialExecutors: 1|
|spark.dynamicAllocation.maxExecutors: 2147483647|
|spark.dynamicAllocation.minExecutors: 1|
|spark.dynamicAllocation.schedulerBacklogTimeout: 1s|
|spark.dynamicAllocation.shuffleTracking.enabled: true|
|spark.dynamicAllocation.shuffleTracking.timeout: 600s|
|spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: 1s|
|spark.eventLog.dir: /opt/efs/spark|
|spark.eventLog.enabled: true|
|spark.eventLog.logStageExecutorMetrics: false|
|spark.excludeOnFailure.enabled: true|
|spark.executor.cores: 5|
|spark.executor.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions 
-XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'|
|spark.executor.id: driver|
|spark.executor.instances: 22|
|spark.executor.logs.rolling.enableCompression: false|
|spark.executor.logs.rolling.maxRetainedFiles: 5|
|spark.executor.logs.rolling.maxSize: 10m|
|spark.executor.logs.rolling.strategy: size|
|spark.executor.memory: 10008m|
|spark.executor.memoryOverhead: 384m|
|spark.executor.processTreeMetrics.enabled: false|
|spark.executor.rpc.io.clientThreads: 5|
|spark.executor.rpc.io.serverThreads: 5|
|spark.executor.rpc.netty.dispatcher.numThreads: 5|
|spark.executor.shuffle.io.clientThreads: 5|
|spark.executor.shuffle.io.serverThreads: 5|
|spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: 2|
|spark.history.fs.driverlog.cleaner.enabled: true|
|spark.history.fs.driverlog.cleaner.maxAge: 2d|
|spark.history.fs.logDirectory: /opt/efs/spark|
|spark.history.ui.port: 4040|
|spark.io.compression.codec: org.apache.spark.io.SnappyCompressionCodec|
|spark.io.compression.snappy.blockSize: 32k|
|spark.jars: 
local:///opt/spark/examples/xxx.jar,local:///opt/spark/examples/yyy.jar|
|spark.kryo.referenceTracking: false|
|spark.kryo.registrationRequired: false|
|spark.kryo.unsafe: true|
|spark.kryoserializer.buffer: 8m|
|spark.kryoserializer.buffer.max: 1024m|
|spark.kubernetes.allocation.batch.delay: 1s|
|spark.kubernetes.allocation.batch.size: 5|
|spark.kubernetes.allocation.executor.timeout: 600s|
|spark.kubernetes.appKillPodDeletionGracePeriod: 5s|
|spark.kubernetes.authenticate.driver.serviceAccountName: spark|
|spark.kubernetes.configMap.maxSize: 1572864|
|spark.kubernetes.container.image: xxx/xxx:latest|
|spark.kubernetes.container.image.pullPolicy: Always|
|spark.kubernetes.driver.connectionTimeout: 10000|
|spark.kubernetes.driver.limit.cores: 8|
|spark.kubernetes.driver.master: 
https://asdkadalksjdas.gr7.us-east-1.eks.amazonaws.com:443|
|spark.kubernetes.driver.pod.name: xxx-ddd-rrrr-8be6777b28caacc7-driver|
|spark.kubernetes.driver.request.cores: 5|
|spark.kubernetes.driver.requestTimeout: 10000|
|spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.path:
 /opt/efs/spark|
|spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.readOnly:
 false|
|spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.subPath:
 spark|
|spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.claimName:
 efs-pvc|
|spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.storageClass:
 manual|
|spark.kubernetes.dynamicAllocation.deleteGracePeriod: 5s|
|spark.kubernetes.executor.apiPollingInterval: 60s|
|spark.kubernetes.executor.checkAllContainers: true|
|spark.kubernetes.executor.deleteOnTermination: false|
|spark.kubernetes.executor.eventProcessingInterval: 5s|
|spark.kubernetes.executor.limit.cores: 8|
|spark.kubernetes.executor.missingPodDetectDelta: 30s|
|spark.kubernetes.executor.podNamePrefix: uscb-exec|
|spark.kubernetes.executor.request.cores: 5|
|spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.path:
 /opt/efs/spark|
|spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.readOnly:
 false|
|spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.subPath:
 spark|
|spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.claimName:
 efs-pvc|
|spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.storageClass:
 manual|
|spark.kubernetes.local.dirs.tmpfs: false|
|spark.kubernetes.memoryOverheadFactor: 0.1|
|spark.kubernetes.namespace: default|
|spark.kubernetes.report.interval: 5s|
|spark.kubernetes.resource.type: java|
|spark.kubernetes.submission.connectionTimeout: 10000|
|spark.kubernetes.submission.requestTimeout: 10000|
|spark.kubernetes.submission.waitAppCompletion: true|
|spark.kubernetes.submitInDriver: true|
|spark.local.dir: /tmp|
|spark.locality.wait: 3s|
|spark.locality.wait.node: 3s|
|spark.locality.wait.process: 3s|
|spark.locality.wait.rack: 3s|
|spark.master: 
k8s://https://NKSLODISNJSKSJSKKLS.gr7.us-east-1.eks.amazonaws.com:443|
|spark.memory.fraction: 0.6|
|spark.memory.offHeap.enabled: false|
|spark.memory.storageFraction: 0.5|
|spark.network.io.preferDirectBufs: true|
|spark.network.maxRemoteBlockSizeFetchToMem: 200m|
|spark.network.timeout: 120s|
|spark.port.maxRetries: 16|
|spark.rdd.compress: false|
|spark.reducer.maxBlocksInFlightPerAddress: 2147483647|
|spark.reducer.maxReqsInFlight: 2147483647|
|spark.reducer.maxSizeInFlight: 48m|
|spark.repl.local.jars: local:///opt/spark/examples/asdasdasd.jar|
|spark.rpc.askTimeout: 120s|
|spark.rpc.io.backLog: 256|
|spark.rpc.io.clientThreads: 5|
|spark.rpc.io.serverThreads: 5|
|spark.rpc.lookupTimeout: 120s|
|spark.rpc.message.maxSize: 128|
|spark.rpc.netty.dispatcher.numThreads: 5|
|spark.rpc.numRetries: 3|
|spark.rpc.retry.wait: 3s|
|spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout: 120s|
|spark.scheduler.listenerbus.eventqueue.appStatus.capacity: 10000|
|spark.scheduler.listenerbus.eventqueue.capacity: 10000|
|spark.scheduler.listenerbus.eventqueue.eventLog.capacity: 10000|
|spark.scheduler.listenerbus.eventqueue.executorManagement.capacity: 10000|
|spark.scheduler.listenerbus.eventqueue.shared.capacity: 10000|
|spark.scheduler.maxRegisteredResourcesWaitingTime: 30s|
|spark.scheduler.minRegisteredResourcesRatio: 0.8|
|spark.scheduler.mode: FIFO|
|spark.scheduler.resource.profileMergeConflicts: false|
|spark.scheduler.revive.interval: 1s|
|spark.serializer: org.apache.spark.serializer.KryoSerializer|
|spark.serializer.objectStreamReset: 100|
|spark.shuffle.accurateBlockThreshold: 104857600|
|spark.shuffle.compress: true|
|spark.shuffle.file.buffer: 128m|
|spark.shuffle.io.backLog: -1|
|spark.shuffle.io.maxRetries: 3|
|spark.shuffle.io.numConnectionsPerPeer: 4|
|spark.shuffle.io.preferDirectBufs: true|
|spark.shuffle.io.retryWait: 5s|
|spark.shuffle.maxChunksBeingTransferred: 9223372036854775807|
|spark.shuffle.registration.maxAttempts: 3|
|spark.shuffle.registration.timeout: 200|
|spark.shuffle.service.enabled: false|
|spark.shuffle.service.index.cache.size: 100m|
|spark.shuffle.service.port: 7737|
|spark.shuffle.sort.bypassMergeThreshold: 200|
|spark.shuffle.spill.compress: true|
|spark.speculation: false|
|spark.speculation.interval: 5s|
|spark.speculation.multiplier: 1.5|
|spark.speculation.quantile: 0.75|
|spark.speculation.task.duration.threshold: 10s|
|spark.sql.adaptive.coalescePartitions.enabled: true|
|spark.sql.adaptive.enabled: true|
|spark.sql.adaptive.fetchShuffleBlocksInBatch: true|
|spark.sql.adaptive.forceApply: false|
|spark.sql.adaptive.localShuffleReader.enabled: true|
|spark.sql.adaptive.logLevel: debug|
|spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin: 0|
|spark.sql.adaptive.skewJoin.enabled: true|
|spark.sql.adaptive.skewJoin.skewedPartitionFactor: 5|
|spark.sql.adaptive.skewJoin.skewedPartitionThresholdInByte: 256MB|
|spark.sql.addPartitionInBatch.size: 100|
|spark.sql.analyzer.failAmbiguousSelfJoin: true|
|spark.sql.analyzer.maxIterations: 100|
|spark.sql.ansi.enabled: false|
|spark.sql.autoBroadcastJoinThreshold: 10MB|
|spark.sql.avro.filterPushdown.enabled: true|
|spark.sql.broadcastExchange.maxThreadThreshold: 128|
|spark.sql.bucketing.coalesceBucketsInJoin.enabled: false|
|spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio: 4|
|spark.sql.cache.serializer: 
org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer|
|spark.sql.cartesianProductExec.buffer.in.memory.threshold: 4096|
|spark.sql.caseSensitive: false|
|spark.sql.catalogImplementation: in-memory|
|spark.sql.cbo.enabled: false|
|spark.sql.cbo.joinReorder.card.weight: 0|
|spark.sql.cbo.joinReorder.dp.star.filter: false|
|spark.sql.cbo.joinReorder.dp.threshold: 12|
|spark.sql.cbo.joinReorder.enabled: false|
|spark.sql.cbo.planStats.enabled: false|
|spark.sql.cbo.starJoinFTRatio: 0|
|spark.sql.cbo.starSchemaDetection: false|
|spark.sql.codegen.aggregate.fastHashMap.capacityBit: 16|
|spark.sql.codegen.aggregate.map.twolevel.enabled: true|
|spark.sql.codegen.aggregate.map.vectorized.enable: false|
|spark.sql.codegen.aggregate.splitAggregateFunc.enabled: true|
|spark.sql.codegen.cache.maxEntries: 100|
|spark.sql.codegen.comments: false|
|spark.sql.codegen.fallback: true|
|spark.sql.codegen.hugeMethodLimit: 65535|
|spark.sql.codegen.logging.maxLines: 1000|
|spark.sql.codegen.maxFields: 100|
|spark.sql.codegen.methodSplitThreshold: 1024|
|spark.sql.codegen.splitConsumeFuncByOperator: true|
|spark.sql.codegen.useIdInClassName: true|
|spark.sql.codegen.wholeStage: true|
|spark.sql.columnVector.offheap.enabled: false|
|spark.sql.constraintPropagation.enabled: true|
|spark.sql.crossJoin.enabled: true|
|spark.sql.csv.filterPushdown.enabled: true|
|spark.sql.csv.parser.columnPruning.enabled: true|
|spark.sql.datetime.java8API.enabled: false|
|spark.sql.debug: false|
|spark.sql.debug.maxToStringFields: 25|
|spark.sql.decimalOperations.allowPrecisionLoss: true|
|spark.sql.event.truncate.length: 2147483647|
|spark.sql.exchange.reuse: true|
|spark.sql.execution.arrow.enabled: false|
|spark.sql.execution.arrow.fallback.enabled: true|
|spark.sql.execution.arrow.maxRecordsPerBatch: 10000|
|spark.sql.execution.arrow.sparkr.enabled: false|
|spark.sql.execution.broadcastHashJoin.outputPartitioningExpandLimit: 8|
|spark.sql.execution.fastFailOnFileFormatOutput: false|
|spark.sql.execution.pandas.convertToArrowArraySafely: false|
|spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled: false|
|spark.sql.execution.rangeExchange.sampleSizePerPartition: 100|
|spark.sql.execution.removeRedundantProjects: true|
|spark.sql.execution.removeRedundantSorts: true|
|spark.sql.execution.reuseSubquery: true|
|spark.sql.execution.sortBeforeRepartition: true|
|spark.sql.execution.useObjectHashAggregateExec: true|
|spark.sql.files.ignoreCorruptFiles: false|
|spark.sql.files.ignoreMissingFiles: false|
|spark.sql.files.maxPartitionBytes: 128MB|
|spark.sql.files.maxRecordsPerFile: 0|
|spark.sql.filesourceTableRelationCacheSize: 1000|
|spark.sql.function.concatBinaryAsString: false|
|spark.sql.function.eltOutputAsString: false|
|spark.sql.globalTempDatabase: global_temp|
|spark.sql.groupByAliases: true|
|spark.sql.groupByOrdinal: true|
|spark.sql.hive.advancedPartitionPredicatePushdown.enabled: true|
|spark.sql.hive.convertCTAS: false|
|spark.sql.hive.gatherFastStats: true|
|spark.sql.hive.manageFilesourcePartitions: true|
|spark.sql.hive.metastorePartitionPruning: true|
|spark.sql.hive.metastorePartitionPruningInSetThreshold: 1000|
|spark.sql.hive.verifyPartitionPath: false|
|spark.sql.inMemoryColumnarStorage.batchSize: 10000|
|spark.sql.inMemoryColumnarStorage.compressed: true|
|spark.sql.inMemoryColumnarStorage.enableVectorizedReader: true|
|spark.sql.inMemoryColumnarStorage.partitionPruning: true|
|spark.sql.inMemoryTableScanStatistics.enable: false|
|spark.sql.join.preferSortMergeJoin: true|
|spark.sql.json.filterPushdown.enabled: true|
|spark.sql.jsonGenerator.ignoreNullFields: true|
|spark.sql.legacy.addSingleFileInAddFile: false|
|spark.sql.legacy.allowHashOnMapType: false|
|spark.sql.legacy.allowNegativeScaleOfDecimal: false|
|spark.sql.legacy.allowParameterlessCount: false|
|spark.sql.legacy.allowUntypedScalaUDF: false|
|spark.sql.legacy.bucketedTableScan.outputOrdering: false|
|spark.sql.legacy.castComplexTypesToString.enabled: false|
|spark.sql.legacy.charVarcharAsString: false|
|spark.sql.legacy.createEmptyCollectionUsingStringType: false|
|spark.sql.legacy.createHiveTableByDefault: true|
|spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue: false|
|spark.sql.legacy.doLooseUpcast: false|
|spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName: true|
|spark.sql.legacy.exponentLiteralAsDecimal.enabled: false|
|spark.sql.legacy.extraOptionsBehavior.enabled: false|
|spark.sql.legacy.followThreeValuedLogicInArrayExists: true|
|spark.sql.legacy.fromDayTimeString.enabled: false|
|spark.sql.legacy.integerGroupingId: false|
|spark.sql.legacy.json.allowEmptyString.enabled: false|
|spark.sql.legacy.keepCommandOutputSchema: false|
|spark.sql.legacy.literal.pickMinimumPrecision: true|
|spark.sql.legacy.notReserveProperties: false|
|spark.sql.legacy.parseNullPartitionSpecAsStringLiteral: false|
|spark.sql.legacy.parser.havingWithoutGroupByAsWhere: false|
|spark.sql.legacy.pathOptionBehavior.enabled: false|
|spark.sql.legacy.sessionInitWithConfigDefaults: false|
|spark.sql.legacy.setCommandRejectsSparkCoreConfs: true|
|spark.sql.legacy.setopsPrecedence.enabled: false|
|spark.sql.legacy.sizeOfNull: true|
|spark.sql.legacy.statisticalAggregate: false|
|spark.sql.legacy.storeAnalyzedPlanForView: false|
|spark.sql.legacy.typeCoercion.datetimeToString.enabled: false|
|spark.sql.legacy.useCurrentConfigsForView: false|
|spark.sql.limit.scaleUpFactor: 4|
|spark.sql.maxMetadataStringLength: 100|
|spark.sql.metadataCacheTTLSeconds: -1|
|spark.sql.objectHashAggregate.sortBased.fallbackThreshold: 128|
|spark.sql.optimizeNullAwareAntiJoin: true|
|spark.sql.optimizer.disableHints: false|
|spark.sql.optimizer.dynamicPartitionPruning.enabled: true|
|spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio: 0|
|spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly: true|
|spark.sql.optimizer.dynamicPartitionPruning.useStats: true|
|spark.sql.optimizer.enableJsonExpressionOptimization: true|
|spark.sql.optimizer.expression.nestedPruning.enabled: true|
|spark.sql.optimizer.inSetConversionThreshold: 10|
|spark.sql.optimizer.inSetSwitchThreshold: 400|
|spark.sql.optimizer.maxIterations: 100|
|spark.sql.optimizer.metadataOnly: false|
|spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources: parquet,orc|
|spark.sql.optimizer.nestedSchemaPruning.enabled: true|
|spark.sql.optimizer.replaceExceptWithFilter: true|
|spark.sql.optimizer.serializer.nestedSchemaPruning.enabled: true|
|spark.sql.orderByOrdinal: true|
|spark.sql.parquet.binaryAsString: false|
|spark.sql.parquet.columnarReaderBatchSize: 4096|
|spark.sql.parquet.compression.codec: snappy|
|spark.sql.parquet.enableVectorizedReader: true|
|spark.sql.parquet.filterPushdown: true|
|spark.sql.parquet.filterPushdown.date: true|
|spark.sql.parquet.filterPushdown.decimal: true|
|spark.sql.parquet.filterPushdown.string.startsWith: true|
|spark.sql.parquet.filterPushdown.timestamp: true|
|spark.sql.parquet.int96AsTimestamp: true|
|spark.sql.parquet.int96TimestampConversion: false|
|spark.sql.parquet.mergeSchema: false|
|spark.sql.parquet.output.committer.class: 
org.apache.parquet.hadoop.ParquetOutputCommitter|
|spark.sql.parquet.pushdown.inFilterThreshold: 10|
|spark.sql.parquet.recordLevelFilter.enabled: false|
|spark.sql.parquet.respectSummaryFiles: false|
|spark.sql.parquet.writeLegacyFormat: false|
|spark.sql.parser.escapedStringLiterals: false|
|spark.sql.parser.quotedRegexColumnNames: false|
|spark.sql.pivotMaxValues: 10000|
|spark.sql.planChangeLog.level: trace|
|spark.sql.pyspark.jvmStacktrace.enabled: false|
|spark.sql.repl.eagerEval.enabled: false|
|spark.sql.repl.eagerEval.maxNumRows: 20|
|spark.sql.repl.eagerEval.truncate: 20|
|spark.sql.retainGroupColumns: true|
|spark.sql.runSQLOnFiles: true|
|spark.sql.scriptTransformation.exitTimeoutInSeconds: 5s|
|spark.sql.selfJoinAutoResolveAmbiguity: true|
|spark.sql.shuffle.partitions: 200|
|spark.sql.sort.enableRadixSort: true|
|spark.sql.sources.binaryFile.maxLength: 2147483647|
|spark.sql.sources.bucketing.autoBucketedScan.enabled: true|
|spark.sql.sources.bucketing.enabled: true|
|spark.sql.sources.bucketing.maxBuckets: 100000|
|spark.sql.sources.commitProtocolClass: 
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol|
|spark.sql.sources.default: parquet|
|spark.sql.sources.fileCompressionFactor: 1|
|spark.sql.sources.ignoreDataLocality: false|
|spark.sql.sources.parallelPartitionDiscovery.parallelism: 10000|
|spark.sql.sources.parallelPartitionDiscovery.threshold: 32|
|spark.sql.sources.partitionColumnTypeInference.enabled: true|
|spark.sql.sources.validatePartitionColumns: true|
|spark.sql.statistics.fallBackToHdfs: false|
|spark.sql.statistics.histogram.enabled: false|
|spark.sql.statistics.histogram.numBins: 254|
|spark.sql.statistics.ndv.maxError: 0|
|spark.sql.statistics.parallelFileListingInStatsComputation.enabled: true|
|spark.sql.statistics.percentile.accuracy: 10000|
|spark.sql.statistics.size.autoUpdate.enabled: false|
|spark.sql.streaming.continuous.epochBacklogQueueSize: 10000|
|spark.sql.streaming.continuous.executorPollIntervalMs: 100|
|spark.sql.streaming.continuous.executorQueueSize: 1024|
|spark.sql.streaming.metricsEnabled: true|
|spark.sql.subexpressionElimination.cache.maxEntries: 100|
|spark.sql.subexpressionElimination.enabled: true|
|spark.sql.subquery.maxThreadThreshold: 16|
|spark.sql.thriftServer.incrementalCollect: false|
|spark.sql.thriftServer.queryTimeout: 20s|
|spark.sql.thriftserver.ui.retainedSessions: 200|
|spark.sql.thriftserver.ui.retainedStatements: 200|
|spark.sql.truncateTable.ignorePermissionAcl.enabled: false|
|spark.sql.ui.explainMode: formatted|
|spark.sql.ui.retainedExecutions: 500|
|spark.sql.variable.substitute: true|
|spark.sql.view.maxNestedViewDepth: 100|
|spark.sql.warehouse.dir: file:/opt/spark/work-dir/spark-warehouse|
|spark.sql.windowExec.buffer.in.memory.threshold: 4096|
|spark.stage.maxConsecutiveAttempts: 4|
|spark.storage.replication.proactive: true|
|spark.submit.deployMode: cluster|
|spark.submit.pyFiles: |
|spark.task.cpus: 1|
|spark.task.maxFailures: 4|
|spark.task.reaper.enabled: true|
|spark.task.reaper.killTimeout: -1|
|spark.task.reaper.pollingInterval: 20s|
|spark.task.reaper.threadDump: true|

 

Any quick help will be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-16156) HADOOP-AWS with Spark on Kubernetes (EKS)

Reply via email to