FYI we are using Spark 2.2.0. Should the change be present in this spark
version? Wanted to check before opening a JIRA ticket?
*Regards,Dhrubajyoti Hati.*
On Thu, Apr 23, 2020 at 10:12 AM Wenchen Fan wrote:
> This looks like a bug that path filter doesn't work for hive table
&
Just wondering if any one could help me out on this.
Thank you!
*Regards,Dhrubajyoti Hati.*
On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati
wrote:
> Hi,
>
> Is there any way to discard files starting with dot(.) or ending with .tmp
> in the hive partition while reading fro
I was wondering if anyone could help with this question.
On Fri, 20 Sep, 2019, 11:52 AM Dhrubajyoti Hati,
wrote:
> Hi,
>
> I have a question regarding passing a dictionary from driver to executors
> in spark on yarn. This dictionary is needed in an udf. I am using pyspark.
>
&g
Also the performance remains identical when running the same script from
jupyter terminal instead or normal terminal. In the script the spark
context is created by
spark = SparkSession \
.builder \
..
..
getOrCreate() command
On Wed, Sep 11, 2019 at 10:28 PM Dhrubajyoti Hati
wrote:
>
e you creating the Spark Session in jupyter ?
>
>
> On Wed, Sep 11, 2019 at 7:33 PM Dhrubajyoti Hati
> wrote:
>
>> But would it be the case for multiple tasks running on the same worker
>> and also both the tasks are running in client mode, so the one true is true
>&g
eight
> minutes.
>
> On Wed, Sep 11, 2019 at 3:17 AM Dhrubajyoti Hati
> wrote:
>
>> Hi,
>>
>> I just ran the same script in a shell in jupyter notebook and find the
>> performance to be similar. So I can confirm this is because the libraries
>> used jupyter
.
*Regards,Dhrubajyoti Hati.Mob No: 9886428028/9652029028*
On Wed, Sep 11, 2019 at 9:45 AM Dhrubajyoti Hati
wrote:
> Just checked from where the script is submitted i.e. wrt Driver, the
> python env are different. Jupyter one is running within a the virtual
> environment which is Python
but in any case: are they
>> both running against the same spark cluster with the same configuration
>> parameters especially executor memory and number of workers?
>>
>> Am Di., 10. Sept. 2019 um 20:05 Uhr schrieb Dhrubajyoti Hati <
>> dhruba.w...@gmail.com>
>
> Am Di., 10. Sept. 2019 um 20:05 Uhr schrieb Dhrubajyoti Hati <
> dhruba.w...@gmail.com>:
>
>> No, i checked for that, hence written "brand new" jupyter notebook. Also
>> the time taken by both are 30 mins and ~3hrs as i am reading a 500 gigs
>> co
sks for each.
>
> On Tue, Sep 10, 2019 at 2:33 PM Dhrubajyoti Hati
> wrote:
>
>> Hi,
>>
>> I am facing a weird behaviour while running a python script. Here is what
>> the code looks like mostly:
>>
>> def fn1(ip):
>>some code.
sually
> also requires more memory for the executor, but less executors. Similarly
> the executor instances might be too many and they may not have enough heap.
> You can also increase the memory of the executor.
>
> Am 29.07.2019 um 08:22 schrieb Dhrubajyoti Hati :
>
> Hi,
>
Hi,
We were running Logistic Regression in Spark 2.2.X and then we tried to see
how does it do in Spark 2.3.X. Now we are facing an issue while running a
Logistic Regression Model in Spark 2.3.X on top of Yarn(GCP-Dataproc). In
the TreeAggregate method it takes a huge time due to very High GC Acti
12 matches
Mail list logo