Hi!
have you tried spark.sql.files.maxRecordsPerFile ?
As a workaround you could try to see how many rows are 128MB and then set
that number in that property.
Best
On Thu, 1 Apr 2021 at 00:38, mhawes wrote:
> Okay from looking closer at some of the code, I'm not sure that what I'm
> asking f
all files problem on checkpoint, you
> wouldn't need to tune the value. I guess that's why the configuration is
> marked as "internal" meaning just some admins need to know about such
> configuration.
>
> On Wed, Mar 10, 2021 at 3:58 AM German Schiavon
> wrote:
>
>
Software Engineer
>
> Databricks, Inc.
>
>
> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon
> wrote:
>
>> Hello all,
>>
>> I wanted to ask if this property is still active? I can't find it in the
>> doc https://spark.apache.org/docs/latest/config
Hello all,
I wanted to ask if this property is still active? I can't find it in the
doc https://spark.apache.org/docs/latest/configuration.html or anywhere in
the code(only in Tests).
If so, should we remove it?
val MIN_BATCHES_TO_RETAIN = buildConf("spark.sql.streaming.minBatchesToRetain")
.i
s ignored.
> This does seem to compile; are you sure? what error? may not be related to
> that, quite.
>
>
> On Thu, Oct 22, 2020 at 5:40 AM German Schiavon
> wrote:
>
>> Hello!
>>
>> I'd like to ask if there is any reason to return *type *when calling
Hello!
I'd like to ask if there is any reason to return *type *when calling
*dataframe.unpersist*
def unpersist(blocking: Boolean): this.type = {
sparkSession.sharedState.cacheManager.uncacheQuery(
sparkSession, logicalPlan, cascade = false, blocking)
this
}
Just pointing it out because
Hi!
I just run to this same issue while testing k8s in local mode
https://issues.apache.org/jira/browse/SPARK-31800
Note that the tittle shouldn't be "*Unable to disable Kerberos when
submitting jobs to Kubernetes" *(based on the comments) and something more
related with the spark.kubernetes.fil
HI Jungtaek,
I have a question, aren't both approaches compatible?
How I see it, I think It would be interesting to have a retention period to
delete old files and/or the possibility of indicating an offset
(Timestamp). It would be very "similar" to how we do it with kafka.
WDYT?
On Thu, 30 Jul