Re: [Spark Core]: Adding support for size based partition coalescing

2021-04-01 Thread German Schiavon
Hi! have you tried spark.sql.files.maxRecordsPerFile ? As a workaround you could try to see how many rows are 128MB and then set that number in that property. Best On Thu, 1 Apr 2021 at 00:38, mhawes wrote: > Okay from looking closer at some of the code, I'm not sure that what I'm > asking f

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-10 Thread German Schiavon
all files problem on checkpoint, you > wouldn't need to tune the value. I guess that's why the configuration is > marked as "internal" meaning just some admins need to know about such > configuration. > > On Wed, Mar 10, 2021 at 3:58 AM German Schiavon > wrote: > >

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread German Schiavon
Software Engineer > > Databricks, Inc. > > > On Tue, Mar 9, 2021 at 3:27 PM German Schiavon > wrote: > >> Hello all, >> >> I wanted to ask if this property is still active? I can't find it in the >> doc https://spark.apache.org/docs/latest/config

Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread German Schiavon
Hello all, I wanted to ask if this property is still active? I can't find it in the doc https://spark.apache.org/docs/latest/configuration.html or anywhere in the code(only in Tests). If so, should we remove it? val MIN_BATCHES_TO_RETAIN = buildConf("spark.sql.streaming.minBatchesToRetain") .i

Re: Unpersist return type

2020-10-22 Thread German Schiavon
s ignored. > This does seem to compile; are you sure? what error? may not be related to > that, quite. > > > On Thu, Oct 22, 2020 at 5:40 AM German Schiavon > wrote: > >> Hello! >> >> I'd like to ask if there is any reason to return *type *when calling

Unpersist return type

2020-10-22 Thread German Schiavon
Hello! I'd like to ask if there is any reason to return *type *when calling *dataframe.unpersist* def unpersist(blocking: Boolean): this.type = { sparkSession.sharedState.cacheManager.uncacheQuery( sparkSession, logicalPlan, cascade = false, blocking) this } Just pointing it out because

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-03 Thread German Schiavon
Hi! I just run to this same issue while testing k8s in local mode https://issues.apache.org/jira/browse/SPARK-31800 Note that the tittle shouldn't be "*Unable to disable Kerberos when submitting jobs to Kubernetes" *(based on the comments) and something more related with the spark.kubernetes.fil

Re: [DISCUSS] "latestFirst" option and metadata growing issue in File stream source

2020-07-30 Thread German Schiavon
HI Jungtaek, I have a question, aren't both approaches compatible? How I see it, I think It would be interesting to have a retention period to delete old files and/or the possibility of indicating an offset (Timestamp). It would be very "similar" to how we do it with kafka. WDYT? On Thu, 30 Jul