Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if
it does then the problem may be somewhere else.
> On Jul 19, 2020, at 5:37 AM, Jungtaek Lim
> wrote:
>
> Please provide logs and dump file for the OOM case - otherwise no one could
> say what's the cause.
>
> Add
data can be loaded?
> It should be simple, just open the notebook and see why the exact code you
> have given does not work, and shows only 11 records.
>
>
> Regards,
> Gourav Sengupta
>
> On Tue, Jun 30, 2020 at 4:15 PM Sanjeev Mishra
> wrote:
>
>> Hi Gourav
Sengupta
>
> On Tue, Jun 30, 2020 at 1:42 PM Sanjeev Mishra <mailto:sanjeev.mis...@gmail.com>> wrote:
> There are total 11 files as part of tar. You will have to untar it to get to
> actual files (.json.gz)
>
> No, I am getting
>
> Count: 33447
>
> sp
gt;
> Hi Sanjeev,
> that just gives 11 records from the sample that you have loaded to the JIRA
> tickets is it correct?
>
>
> Regards,
> Gourav Sengupta
>
> On Tue, Jun 30, 2020 at 1:25 PM Sanjeev Mishra <mailto:sanjeev.mis...@gmail.com>> wrote:
> There is no
he Databricks engineers will find an answer or bug fix soon.
>
> -- ND
>
> On 6/29/20 12:27 PM, Sanjeev Mishra wrote:
>> The tar file that I have attached has bunch of json.zip files and this is
>> the file that is being processed. Each line is self contained JSON as sho
the JSON files there (or samples or code which generates JSON
> files)?
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Mon, Jun 29, 2020 at 6:12 PM Sanjeev Mishra
> wrote:
>
>> It has read everything. As you notice the timing
Are you sure you Spark 2.4 cluster had indeed
> read anything? Looks like the Input size field is empty under 2.4.
>
> -- ND
> On 6/27/20 7:58 PM, Sanjeev Mishra wrote:
>
>
> I have large amount of json files that Spark can read in 36 seconds but
> Spark 3.0 takes almost 33 mi
ds,
> Gourav
>
> On Sun, Jun 28, 2020 at 12:58 AM Sanjeev Mishra
> wrote:
>
>>
>> I have large amount of json files that Spark can read in 36 seconds but
>> Spark 3.0 takes almost 33 minutes to read the same. On closer analysis,
>> looks like Spark 3.0 is choos
I have large amount of json files that Spark can read in 36 seconds but
Spark 3.0 takes almost 33 minutes to read the same. On closer analysis,
looks like Spark 3.0 is choosing different DAG than Spark 2.0. Does anyone
have any idea what is going on? Is there any configuration problem with
Spark 3.
HI all,
I have huge amount of json files that Spark 2.4 can easily finish reading
but Spark 3.0.0 never competes. I am running both Spark 2 and Spark 3 on Mac
You can use catalog apis see following
https://stackoverflow.com/questions/54268845/how-to-check-the-number-of-partitions-of-a-spark-dataframe-without-incurring-the/54270537
On Thu, Jun 25, 2020 at 6:19 AM Tzahi File wrote:
> I don't want to query with a distinct on the partitioned columns, the
11 matches
Mail list logo