Abhinav Kumar created HDFS-16156: ------------------------------------ Summary: HADOOP-AWS with Spark on Kubernetes (EKS) Key: HDFS-16156 URL: https://issues.apache.org/jira/browse/HDFS-16156 Project: Hadoop HDFS Issue Type: Bug Components: fs/s3 Affects Versions: 3.2.0 Reporter: Abhinav Kumar
I am trying to read parquet saved in S3 via Spark on EKS using hadoop-AWS 3.2.0. There are 112 partitions (each around 130MB) for a particular month. The data is being read but very very slowly. I just keep seeing below and very small dataset actually being fetched. 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@454de3d3 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: lazySeek on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3776ef6c 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3602676a 21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin Here is the spark config for hadoop-aws. |spark.hadoop.fs.s3a.assumed.role.sts.endpoint: https://sts.amazonaws.com| |spark.hadoop.fs.s3a.assumed.role.sts.endpoint.region: us-east-1| |spark.hadoop.fs.s3a.attempts.maximum: 20| |spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider| |spark.hadoop.fs.s3a.block.size: 128M| |spark.hadoop.fs.s3a.connection.establish.timeout: 50000| |spark.hadoop.fs.s3a.connection.maximum: 50| |spark.hadoop.fs.s3a.connection.ssl.enabled: true| |spark.hadoop.fs.s3a.connection.timeout: 2000000| |spark.hadoop.fs.s3a.endpoint: s3.us-east-1.amazonaws.com| |spark.hadoop.fs.s3a.etag.checksum.enabled: false| |spark.hadoop.fs.s3a.experimental.input.fadvise: normal| |spark.hadoop.fs.s3a.fast.buffer.size: 1048576| |spark.hadoop.fs.s3a.fast.upload: true| |spark.hadoop.fs.s3a.fast.upload.active.blocks: 8| |spark.hadoop.fs.s3a.fast.upload.buffer: bytebuffer| |spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem| |spark.hadoop.fs.s3a.list.version: 2| |spark.hadoop.fs.s3a.max.total.tasks: 30| |spark.hadoop.fs.s3a.metadatastore.authoritative: false| |spark.hadoop.fs.s3a.metadatastore.impl: org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore| |spark.hadoop.fs.s3a.multiobjectdelete.enable: true| |spark.hadoop.fs.s3a.multipart.purge: true| |spark.hadoop.fs.s3a.multipart.purge.age: 86400| |spark.hadoop.fs.s3a.multipart.size: 32M| |spark.hadoop.fs.s3a.multipart.threshold: 64M| |spark.hadoop.fs.s3a.paging.maximum: 5000| |spark.hadoop.fs.s3a.readahead.range: 65536| |spark.hadoop.fs.s3a.retry.interval: 500ms| |spark.hadoop.fs.s3a.retry.limit: 20| |spark.hadoop.fs.s3a.retry.throttle.interval: 500ms| |spark.hadoop.fs.s3a.retry.throttle.limit: 20| |spark.hadoop.fs.s3a.s3.client.factory.impl: org.apache.hadoop.fs.s3a.DefaultS3ClientFactory| |spark.hadoop.fs.s3a.s3guard.ddb.background.sleep: 25| |spark.hadoop.fs.s3a.s3guard.ddb.max.retries: 20| |spark.hadoop.fs.s3a.s3guard.ddb.region: us-east-1| |spark.hadoop.fs.s3a.s3guard.ddb.table: s3-data-guard-master| |spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.read: 500| |spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.write: 100| |spark.hadoop.fs.s3a.s3guard.ddb.table.create: true| |spark.hadoop.fs.s3a.s3guard.ddb.throttle.retry.interval: 1s| |spark.hadoop.fs.s3a.socket.recv.buffer: 8388608| |spark.hadoop.fs.s3a.socket.send.buffer: 8388608| |spark.hadoop.fs.s3a.threads.keepalivetime: 60| |spark.hadoop.fs.s3a.threads.max: 50| Not sure if you need it - still putting it across (other spark configuration) |spark.app.id: spark-b97cb651f3f14c6cb3197079376a74c7| |spark.app.startTime: 1628476986471| |spark.blockManager.port: 0| |spark.broadcast.compress: true| |spark.checkpoint.compress: true| |spark.cleaner.periodicGC.interval: 2min| |spark.cleaner.referenceTracking: true| |spark.cleaner.referenceTracking.blocking: true| |spark.cleaner.referenceTracking.blocking.shuffle: true| |spark.cleaner.referenceTracking.cleanCheckpoints: true| |spark.cores.max: 5| |spark.driver.bindAddress: 28.132.124.86| |spark.driver.blockManager.port: 0| |spark.driver.cores: 5| |spark.driver.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'| |spark.driver.host: xxx-xxxx-xxx-8be6777b28caacc7-driver-svc.default.svc| |spark.driver.maxResultSize: 10008m| |spark.driver.memory: 10008m| |spark.driver.memoryOverhead: 384m| |spark.driver.port: 7078| |spark.driver.rpc.io.clientThreads: 5| |spark.driver.rpc.io.serverThreads: 5| |spark.driver.rpc.netty.dispatcher.numThreads: 5| |spark.driver.shuffle.io.clientThreads: 5| |spark.driver.shuffle.io.serverThreads: 5| |spark.dynamicAllocation.cachedExecutorIdleTimeout: 600s| |spark.dynamicAllocation.enabled: false| |spark.dynamicAllocation.executorAllocationRatio: 1.0| |spark.dynamicAllocation.executorIdleTimeout: 60s| |spark.dynamicAllocation.initialExecutors: 1| |spark.dynamicAllocation.maxExecutors: 2147483647| |spark.dynamicAllocation.minExecutors: 1| |spark.dynamicAllocation.schedulerBacklogTimeout: 1s| |spark.dynamicAllocation.shuffleTracking.enabled: true| |spark.dynamicAllocation.shuffleTracking.timeout: 600s| |spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: 1s| |spark.eventLog.dir: /opt/efs/spark| |spark.eventLog.enabled: true| |spark.eventLog.logStageExecutorMetrics: false| |spark.excludeOnFailure.enabled: true| |spark.executor.cores: 5| |spark.executor.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'| |spark.executor.id: driver| |spark.executor.instances: 22| |spark.executor.logs.rolling.enableCompression: false| |spark.executor.logs.rolling.maxRetainedFiles: 5| |spark.executor.logs.rolling.maxSize: 10m| |spark.executor.logs.rolling.strategy: size| |spark.executor.memory: 10008m| |spark.executor.memoryOverhead: 384m| |spark.executor.processTreeMetrics.enabled: false| |spark.executor.rpc.io.clientThreads: 5| |spark.executor.rpc.io.serverThreads: 5| |spark.executor.rpc.netty.dispatcher.numThreads: 5| |spark.executor.shuffle.io.clientThreads: 5| |spark.executor.shuffle.io.serverThreads: 5| |spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: 2| |spark.history.fs.driverlog.cleaner.enabled: true| |spark.history.fs.driverlog.cleaner.maxAge: 2d| |spark.history.fs.logDirectory: /opt/efs/spark| |spark.history.ui.port: 4040| |spark.io.compression.codec: org.apache.spark.io.SnappyCompressionCodec| |spark.io.compression.snappy.blockSize: 32k| |spark.jars: local:///opt/spark/examples/xxx.jar,local:///opt/spark/examples/yyy.jar| |spark.kryo.referenceTracking: false| |spark.kryo.registrationRequired: false| |spark.kryo.unsafe: true| |spark.kryoserializer.buffer: 8m| |spark.kryoserializer.buffer.max: 1024m| |spark.kubernetes.allocation.batch.delay: 1s| |spark.kubernetes.allocation.batch.size: 5| |spark.kubernetes.allocation.executor.timeout: 600s| |spark.kubernetes.appKillPodDeletionGracePeriod: 5s| |spark.kubernetes.authenticate.driver.serviceAccountName: spark| |spark.kubernetes.configMap.maxSize: 1572864| |spark.kubernetes.container.image: xxx/xxx:latest| |spark.kubernetes.container.image.pullPolicy: Always| |spark.kubernetes.driver.connectionTimeout: 10000| |spark.kubernetes.driver.limit.cores: 8| |spark.kubernetes.driver.master: https://asdkadalksjdas.gr7.us-east-1.eks.amazonaws.com:443| |spark.kubernetes.driver.pod.name: xxx-ddd-rrrr-8be6777b28caacc7-driver| |spark.kubernetes.driver.request.cores: 5| |spark.kubernetes.driver.requestTimeout: 10000| |spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.path: /opt/efs/spark| |spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.readOnly: false| |spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.subPath: spark| |spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.claimName: efs-pvc| |spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.storageClass: manual| |spark.kubernetes.dynamicAllocation.deleteGracePeriod: 5s| |spark.kubernetes.executor.apiPollingInterval: 60s| |spark.kubernetes.executor.checkAllContainers: true| |spark.kubernetes.executor.deleteOnTermination: false| |spark.kubernetes.executor.eventProcessingInterval: 5s| |spark.kubernetes.executor.limit.cores: 8| |spark.kubernetes.executor.missingPodDetectDelta: 30s| |spark.kubernetes.executor.podNamePrefix: uscb-exec| |spark.kubernetes.executor.request.cores: 5| |spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.path: /opt/efs/spark| |spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.readOnly: false| |spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.subPath: spark| |spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.claimName: efs-pvc| |spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.storageClass: manual| |spark.kubernetes.local.dirs.tmpfs: false| |spark.kubernetes.memoryOverheadFactor: 0.1| |spark.kubernetes.namespace: default| |spark.kubernetes.report.interval: 5s| |spark.kubernetes.resource.type: java| |spark.kubernetes.submission.connectionTimeout: 10000| |spark.kubernetes.submission.requestTimeout: 10000| |spark.kubernetes.submission.waitAppCompletion: true| |spark.kubernetes.submitInDriver: true| |spark.local.dir: /tmp| |spark.locality.wait: 3s| |spark.locality.wait.node: 3s| |spark.locality.wait.process: 3s| |spark.locality.wait.rack: 3s| |spark.master: k8s://https://NKSLODISNJSKSJSKKLS.gr7.us-east-1.eks.amazonaws.com:443| |spark.memory.fraction: 0.6| |spark.memory.offHeap.enabled: false| |spark.memory.storageFraction: 0.5| |spark.network.io.preferDirectBufs: true| |spark.network.maxRemoteBlockSizeFetchToMem: 200m| |spark.network.timeout: 120s| |spark.port.maxRetries: 16| |spark.rdd.compress: false| |spark.reducer.maxBlocksInFlightPerAddress: 2147483647| |spark.reducer.maxReqsInFlight: 2147483647| |spark.reducer.maxSizeInFlight: 48m| |spark.repl.local.jars: local:///opt/spark/examples/asdasdasd.jar| |spark.rpc.askTimeout: 120s| |spark.rpc.io.backLog: 256| |spark.rpc.io.clientThreads: 5| |spark.rpc.io.serverThreads: 5| |spark.rpc.lookupTimeout: 120s| |spark.rpc.message.maxSize: 128| |spark.rpc.netty.dispatcher.numThreads: 5| |spark.rpc.numRetries: 3| |spark.rpc.retry.wait: 3s| |spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout: 120s| |spark.scheduler.listenerbus.eventqueue.appStatus.capacity: 10000| |spark.scheduler.listenerbus.eventqueue.capacity: 10000| |spark.scheduler.listenerbus.eventqueue.eventLog.capacity: 10000| |spark.scheduler.listenerbus.eventqueue.executorManagement.capacity: 10000| |spark.scheduler.listenerbus.eventqueue.shared.capacity: 10000| |spark.scheduler.maxRegisteredResourcesWaitingTime: 30s| |spark.scheduler.minRegisteredResourcesRatio: 0.8| |spark.scheduler.mode: FIFO| |spark.scheduler.resource.profileMergeConflicts: false| |spark.scheduler.revive.interval: 1s| |spark.serializer: org.apache.spark.serializer.KryoSerializer| |spark.serializer.objectStreamReset: 100| |spark.shuffle.accurateBlockThreshold: 104857600| |spark.shuffle.compress: true| |spark.shuffle.file.buffer: 128m| |spark.shuffle.io.backLog: -1| |spark.shuffle.io.maxRetries: 3| |spark.shuffle.io.numConnectionsPerPeer: 4| |spark.shuffle.io.preferDirectBufs: true| |spark.shuffle.io.retryWait: 5s| |spark.shuffle.maxChunksBeingTransferred: 9223372036854775807| |spark.shuffle.registration.maxAttempts: 3| |spark.shuffle.registration.timeout: 200| |spark.shuffle.service.enabled: false| |spark.shuffle.service.index.cache.size: 100m| |spark.shuffle.service.port: 7737| |spark.shuffle.sort.bypassMergeThreshold: 200| |spark.shuffle.spill.compress: true| |spark.speculation: false| |spark.speculation.interval: 5s| |spark.speculation.multiplier: 1.5| |spark.speculation.quantile: 0.75| |spark.speculation.task.duration.threshold: 10s| |spark.sql.adaptive.coalescePartitions.enabled: true| |spark.sql.adaptive.enabled: true| |spark.sql.adaptive.fetchShuffleBlocksInBatch: true| |spark.sql.adaptive.forceApply: false| |spark.sql.adaptive.localShuffleReader.enabled: true| |spark.sql.adaptive.logLevel: debug| |spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin: 0| |spark.sql.adaptive.skewJoin.enabled: true| |spark.sql.adaptive.skewJoin.skewedPartitionFactor: 5| |spark.sql.adaptive.skewJoin.skewedPartitionThresholdInByte: 256MB| |spark.sql.addPartitionInBatch.size: 100| |spark.sql.analyzer.failAmbiguousSelfJoin: true| |spark.sql.analyzer.maxIterations: 100| |spark.sql.ansi.enabled: false| |spark.sql.autoBroadcastJoinThreshold: 10MB| |spark.sql.avro.filterPushdown.enabled: true| |spark.sql.broadcastExchange.maxThreadThreshold: 128| |spark.sql.bucketing.coalesceBucketsInJoin.enabled: false| |spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio: 4| |spark.sql.cache.serializer: org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer| |spark.sql.cartesianProductExec.buffer.in.memory.threshold: 4096| |spark.sql.caseSensitive: false| |spark.sql.catalogImplementation: in-memory| |spark.sql.cbo.enabled: false| |spark.sql.cbo.joinReorder.card.weight: 0| |spark.sql.cbo.joinReorder.dp.star.filter: false| |spark.sql.cbo.joinReorder.dp.threshold: 12| |spark.sql.cbo.joinReorder.enabled: false| |spark.sql.cbo.planStats.enabled: false| |spark.sql.cbo.starJoinFTRatio: 0| |spark.sql.cbo.starSchemaDetection: false| |spark.sql.codegen.aggregate.fastHashMap.capacityBit: 16| |spark.sql.codegen.aggregate.map.twolevel.enabled: true| |spark.sql.codegen.aggregate.map.vectorized.enable: false| |spark.sql.codegen.aggregate.splitAggregateFunc.enabled: true| |spark.sql.codegen.cache.maxEntries: 100| |spark.sql.codegen.comments: false| |spark.sql.codegen.fallback: true| |spark.sql.codegen.hugeMethodLimit: 65535| |spark.sql.codegen.logging.maxLines: 1000| |spark.sql.codegen.maxFields: 100| |spark.sql.codegen.methodSplitThreshold: 1024| |spark.sql.codegen.splitConsumeFuncByOperator: true| |spark.sql.codegen.useIdInClassName: true| |spark.sql.codegen.wholeStage: true| |spark.sql.columnVector.offheap.enabled: false| |spark.sql.constraintPropagation.enabled: true| |spark.sql.crossJoin.enabled: true| |spark.sql.csv.filterPushdown.enabled: true| |spark.sql.csv.parser.columnPruning.enabled: true| |spark.sql.datetime.java8API.enabled: false| |spark.sql.debug: false| |spark.sql.debug.maxToStringFields: 25| |spark.sql.decimalOperations.allowPrecisionLoss: true| |spark.sql.event.truncate.length: 2147483647| |spark.sql.exchange.reuse: true| |spark.sql.execution.arrow.enabled: false| |spark.sql.execution.arrow.fallback.enabled: true| |spark.sql.execution.arrow.maxRecordsPerBatch: 10000| |spark.sql.execution.arrow.sparkr.enabled: false| |spark.sql.execution.broadcastHashJoin.outputPartitioningExpandLimit: 8| |spark.sql.execution.fastFailOnFileFormatOutput: false| |spark.sql.execution.pandas.convertToArrowArraySafely: false| |spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled: false| |spark.sql.execution.rangeExchange.sampleSizePerPartition: 100| |spark.sql.execution.removeRedundantProjects: true| |spark.sql.execution.removeRedundantSorts: true| |spark.sql.execution.reuseSubquery: true| |spark.sql.execution.sortBeforeRepartition: true| |spark.sql.execution.useObjectHashAggregateExec: true| |spark.sql.files.ignoreCorruptFiles: false| |spark.sql.files.ignoreMissingFiles: false| |spark.sql.files.maxPartitionBytes: 128MB| |spark.sql.files.maxRecordsPerFile: 0| |spark.sql.filesourceTableRelationCacheSize: 1000| |spark.sql.function.concatBinaryAsString: false| |spark.sql.function.eltOutputAsString: false| |spark.sql.globalTempDatabase: global_temp| |spark.sql.groupByAliases: true| |spark.sql.groupByOrdinal: true| |spark.sql.hive.advancedPartitionPredicatePushdown.enabled: true| |spark.sql.hive.convertCTAS: false| |spark.sql.hive.gatherFastStats: true| |spark.sql.hive.manageFilesourcePartitions: true| |spark.sql.hive.metastorePartitionPruning: true| |spark.sql.hive.metastorePartitionPruningInSetThreshold: 1000| |spark.sql.hive.verifyPartitionPath: false| |spark.sql.inMemoryColumnarStorage.batchSize: 10000| |spark.sql.inMemoryColumnarStorage.compressed: true| |spark.sql.inMemoryColumnarStorage.enableVectorizedReader: true| |spark.sql.inMemoryColumnarStorage.partitionPruning: true| |spark.sql.inMemoryTableScanStatistics.enable: false| |spark.sql.join.preferSortMergeJoin: true| |spark.sql.json.filterPushdown.enabled: true| |spark.sql.jsonGenerator.ignoreNullFields: true| |spark.sql.legacy.addSingleFileInAddFile: false| |spark.sql.legacy.allowHashOnMapType: false| |spark.sql.legacy.allowNegativeScaleOfDecimal: false| |spark.sql.legacy.allowParameterlessCount: false| |spark.sql.legacy.allowUntypedScalaUDF: false| |spark.sql.legacy.bucketedTableScan.outputOrdering: false| |spark.sql.legacy.castComplexTypesToString.enabled: false| |spark.sql.legacy.charVarcharAsString: false| |spark.sql.legacy.createEmptyCollectionUsingStringType: false| |spark.sql.legacy.createHiveTableByDefault: true| |spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue: false| |spark.sql.legacy.doLooseUpcast: false| |spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName: true| |spark.sql.legacy.exponentLiteralAsDecimal.enabled: false| |spark.sql.legacy.extraOptionsBehavior.enabled: false| |spark.sql.legacy.followThreeValuedLogicInArrayExists: true| |spark.sql.legacy.fromDayTimeString.enabled: false| |spark.sql.legacy.integerGroupingId: false| |spark.sql.legacy.json.allowEmptyString.enabled: false| |spark.sql.legacy.keepCommandOutputSchema: false| |spark.sql.legacy.literal.pickMinimumPrecision: true| |spark.sql.legacy.notReserveProperties: false| |spark.sql.legacy.parseNullPartitionSpecAsStringLiteral: false| |spark.sql.legacy.parser.havingWithoutGroupByAsWhere: false| |spark.sql.legacy.pathOptionBehavior.enabled: false| |spark.sql.legacy.sessionInitWithConfigDefaults: false| |spark.sql.legacy.setCommandRejectsSparkCoreConfs: true| |spark.sql.legacy.setopsPrecedence.enabled: false| |spark.sql.legacy.sizeOfNull: true| |spark.sql.legacy.statisticalAggregate: false| |spark.sql.legacy.storeAnalyzedPlanForView: false| |spark.sql.legacy.typeCoercion.datetimeToString.enabled: false| |spark.sql.legacy.useCurrentConfigsForView: false| |spark.sql.limit.scaleUpFactor: 4| |spark.sql.maxMetadataStringLength: 100| |spark.sql.metadataCacheTTLSeconds: -1| |spark.sql.objectHashAggregate.sortBased.fallbackThreshold: 128| |spark.sql.optimizeNullAwareAntiJoin: true| |spark.sql.optimizer.disableHints: false| |spark.sql.optimizer.dynamicPartitionPruning.enabled: true| |spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio: 0| |spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly: true| |spark.sql.optimizer.dynamicPartitionPruning.useStats: true| |spark.sql.optimizer.enableJsonExpressionOptimization: true| |spark.sql.optimizer.expression.nestedPruning.enabled: true| |spark.sql.optimizer.inSetConversionThreshold: 10| |spark.sql.optimizer.inSetSwitchThreshold: 400| |spark.sql.optimizer.maxIterations: 100| |spark.sql.optimizer.metadataOnly: false| |spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources: parquet,orc| |spark.sql.optimizer.nestedSchemaPruning.enabled: true| |spark.sql.optimizer.replaceExceptWithFilter: true| |spark.sql.optimizer.serializer.nestedSchemaPruning.enabled: true| |spark.sql.orderByOrdinal: true| |spark.sql.parquet.binaryAsString: false| |spark.sql.parquet.columnarReaderBatchSize: 4096| |spark.sql.parquet.compression.codec: snappy| |spark.sql.parquet.enableVectorizedReader: true| |spark.sql.parquet.filterPushdown: true| |spark.sql.parquet.filterPushdown.date: true| |spark.sql.parquet.filterPushdown.decimal: true| |spark.sql.parquet.filterPushdown.string.startsWith: true| |spark.sql.parquet.filterPushdown.timestamp: true| |spark.sql.parquet.int96AsTimestamp: true| |spark.sql.parquet.int96TimestampConversion: false| |spark.sql.parquet.mergeSchema: false| |spark.sql.parquet.output.committer.class: org.apache.parquet.hadoop.ParquetOutputCommitter| |spark.sql.parquet.pushdown.inFilterThreshold: 10| |spark.sql.parquet.recordLevelFilter.enabled: false| |spark.sql.parquet.respectSummaryFiles: false| |spark.sql.parquet.writeLegacyFormat: false| |spark.sql.parser.escapedStringLiterals: false| |spark.sql.parser.quotedRegexColumnNames: false| |spark.sql.pivotMaxValues: 10000| |spark.sql.planChangeLog.level: trace| |spark.sql.pyspark.jvmStacktrace.enabled: false| |spark.sql.repl.eagerEval.enabled: false| |spark.sql.repl.eagerEval.maxNumRows: 20| |spark.sql.repl.eagerEval.truncate: 20| |spark.sql.retainGroupColumns: true| |spark.sql.runSQLOnFiles: true| |spark.sql.scriptTransformation.exitTimeoutInSeconds: 5s| |spark.sql.selfJoinAutoResolveAmbiguity: true| |spark.sql.shuffle.partitions: 200| |spark.sql.sort.enableRadixSort: true| |spark.sql.sources.binaryFile.maxLength: 2147483647| |spark.sql.sources.bucketing.autoBucketedScan.enabled: true| |spark.sql.sources.bucketing.enabled: true| |spark.sql.sources.bucketing.maxBuckets: 100000| |spark.sql.sources.commitProtocolClass: org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol| |spark.sql.sources.default: parquet| |spark.sql.sources.fileCompressionFactor: 1| |spark.sql.sources.ignoreDataLocality: false| |spark.sql.sources.parallelPartitionDiscovery.parallelism: 10000| |spark.sql.sources.parallelPartitionDiscovery.threshold: 32| |spark.sql.sources.partitionColumnTypeInference.enabled: true| |spark.sql.sources.validatePartitionColumns: true| |spark.sql.statistics.fallBackToHdfs: false| |spark.sql.statistics.histogram.enabled: false| |spark.sql.statistics.histogram.numBins: 254| |spark.sql.statistics.ndv.maxError: 0| |spark.sql.statistics.parallelFileListingInStatsComputation.enabled: true| |spark.sql.statistics.percentile.accuracy: 10000| |spark.sql.statistics.size.autoUpdate.enabled: false| |spark.sql.streaming.continuous.epochBacklogQueueSize: 10000| |spark.sql.streaming.continuous.executorPollIntervalMs: 100| |spark.sql.streaming.continuous.executorQueueSize: 1024| |spark.sql.streaming.metricsEnabled: true| |spark.sql.subexpressionElimination.cache.maxEntries: 100| |spark.sql.subexpressionElimination.enabled: true| |spark.sql.subquery.maxThreadThreshold: 16| |spark.sql.thriftServer.incrementalCollect: false| |spark.sql.thriftServer.queryTimeout: 20s| |spark.sql.thriftserver.ui.retainedSessions: 200| |spark.sql.thriftserver.ui.retainedStatements: 200| |spark.sql.truncateTable.ignorePermissionAcl.enabled: false| |spark.sql.ui.explainMode: formatted| |spark.sql.ui.retainedExecutions: 500| |spark.sql.variable.substitute: true| |spark.sql.view.maxNestedViewDepth: 100| |spark.sql.warehouse.dir: file:/opt/spark/work-dir/spark-warehouse| |spark.sql.windowExec.buffer.in.memory.threshold: 4096| |spark.stage.maxConsecutiveAttempts: 4| |spark.storage.replication.proactive: true| |spark.submit.deployMode: cluster| |spark.submit.pyFiles: | |spark.task.cpus: 1| |spark.task.maxFailures: 4| |spark.task.reaper.enabled: true| |spark.task.reaper.killTimeout: -1| |spark.task.reaper.pollingInterval: 20s| |spark.task.reaper.threadDump: true| Any quick help will be greatly appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org