Re: Enabling fully disaggregated shuffle on Spark

2019-11-20 Thread Aniket Mokashi
ual objects > cheaply. Right now, that’s only possible at the stream level. (There are > hacks around this, but this would enable more idiomatic use in efficient > shuffle implementations.) > > > Have serializers indicate whether they are deterministic. This provides > much of

Fwd: Check

2019-09-27 Thread Aniket Khandelwal
Hi all, I was stuck on a problem that I faced recently. The problem statement is like : Event Bean consists of eventId, eventTag, text, . We need to run a spark job that aggregates the eventTag column and picks top K1 of them. Additionally, we need for each eventTag, list of eventIds (first K2

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-21 Thread Aniket
Thanks Ryan. I am running into this rarer issue. For now, I have moved away from parquet but if I will create a bug in jira if I am able to produce code that easily reproduces this. Thanks, Aniket On Mon, Nov 21, 2016, 3:24 PM Ryan Blue [via Apache Spark Developers List] < ml-node+s1001551n19

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-20 Thread Aniket
Was anyone able find a solution or recommended conf for this? I am running into the same "java.lang.OutOfMemoryError: Direct buffer memory" but during snappy compression. Thanks, Aniket On Tue, Sep 23, 2014 at 7:04 PM Aaron Davidson [via Apache Spark Developers List] wrote: >

Re: RDD API patterns

2015-09-16 Thread Aniket
painful and I share the pain :) Thanks, Aniket On Tue, Sep 15, 2015, 5:06 AM sim [via Apache Spark Developers List] < ml-node+s1001551n14116...@n3.nabble.com> wrote: > I'd like to get some feedback on an API design issue pertaining to RDDs. > > The design goal to avoid RDD nesting

Re: Data source API | sizeInBytes should be to *Scan

2015-02-11 Thread Aniket Bhatnagar
Circling back on this. Did you get a chance to re-look at this? Thanks, Aniket On Sun, Feb 8, 2015, 2:53 AM Aniket Bhatnagar wrote: > Thanks for looking into this. If this true, isn't this an issue today? The > default implementation of sizeInBytes is 1 + broadcast thresh

Re: Data source API | sizeInBytes should be to *Scan

2015-02-08 Thread Aniket Bhatnagar
e more accurate than Catalyst's prediction. Therefore, if its not a fundamental change in Catalyst, I would think this makes sense. Thanks, Aniket On Sat, Feb 7, 2015, 4:50 AM Reynold Xin wrote: > We thought about this today after seeing this email. I actually built a > patch fo

Data source API | sizeInBytes should be to *Scan

2015-02-06 Thread Aniket Bhatnagar
large relation broadcast-able. Thoughts? Aniket

Re: Data source API | Support for dynamic schema

2015-01-29 Thread Aniket Bhatnagar
Thanks Reynold and Cheng. It does seem quiet a bit of heavy lifting to have schema per row. I will for now settle with having to do a union schema of all the schema versions and complain any incompatibilities :-) Looking forward to do great things with the API! Thanks, Aniket On Thu Jan 29 2015

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Aniket
Hi Patrick, I am wondering if this version will address issues around certain artifacts not getting published in 1.2 which are gating people to migrate to 1.2. One such issue is https://issues.apache.org/jira/browse/SPARK-5144 Thanks, Aniket On Wed Jan 28 2015 at 15:39:43 Patrick Wendell [via

Data source API | Support for dynamic schema

2015-01-28 Thread Aniket Bhatnagar
chema upfront? Thanks, Aniket

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Aniket
Hi Chris This is super cool. I was wondering if this would be an open source project so that people can contribute or reuse? Thanks, Aniket On Thu Jan 15 2015 at 07:39:29 Mattmann, Chris A (3980) [via Apache Spark Developers List] wrote: > Hi Spark Devs, > > Just wanted to FYI t

Re: YARN | SPARK-5164 | Submitting jobs from windows to linux YARN

2015-01-12 Thread Aniket Bhatnagar
Ohh right. It is. I will mark my defect as duplicate and cross check my notes with the fixes in the pull request. Thanks for pointing out Zsolt :) On Mon, Jan 12, 2015, 7:42 PM Zsolt Tóth wrote: > Hi Aniket, > > I think this is a duplicate of SPARK-1825, isn't it? > > Zsolt

YARN | SPARK-5164 | Submitting jobs from windows to linux YARN

2015-01-12 Thread Aniket Bhatnagar
would be a great help for windows users (like me). Thanks, Aniket

Discussion | SparkContext 's setJobGroup and clearJobGroup should return a new instance of SparkContext

2015-01-12 Thread Aniket Bhatnagar
fely. I am also happy mutating the original SparkContext just not break backward compatibility as long as the returned SparkContext is not affected by set/unset of job groups on original SparkContext. Thoughts please? Thanks, Aniket

Re: Dependency hell in Spark applications

2014-09-22 Thread Aniket Bhatnagar
upgrading httpclient? (or jets3t?) > > 2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar : > >> Thanks everyone for weighing in on this. >> >> I had backported kinesis module from master to spark 1.0.2 so just to >> confirm if I am not missing anything, I did a dependenc

Re: spark 1.1.0 (w/ hadoop 2.4) vs aws java sdk 1.7.2

2014-09-19 Thread Aniket
Looks like the same issue as http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E On Sep 20, 2014 11:09 AM, "tian zhang [via Apache Spark Developers List]" < ml-node+s1001551n8481...@n3.nabble.com> wrote: > > > Hi,

Re: Dependency hell in Spark applications

2014-09-11 Thread Aniket Bhatnagar
d deal with >> > some of these issues, but I don't think it works. >> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego" >> wrote: >> > >> > > Hi, >> > > I run into the same issue and apart from the ideas Aniket said, I on

Dependency hell in Spark applications

2014-09-04 Thread Aniket Bhatnagar
d user. My personal preference is OSGi (or atleast some support for OSGi) but I would love to hear what Spark devs are thinking in terms of resolving the problem. Thanks, Aniket

Kinesis streaming integration in upcoming 1.1

2014-08-21 Thread Aniket Bhatnagar
ances which makes sense. Maybe the API should provide ability to provide parallelism and default to numShards? I can submit pull requests for some of the above items, provided the community agrees and nobody else is working on it. Thanks, Aniket

Re: RFC: Supporting the Scala drop Method for Spark RDDs

2014-07-21 Thread Aniket
I too would like this feature. Erik's post makes sense. However, shouldn't the RDD also repartition itself after drop to effectively make use of cluster resources? On Jul 21, 2014 8:58 PM, "Andrew Ash [via Apache Spark Developers List]" < ml-node+s1001551n7434...@n3.nabble.com> wrote: > Personally

Why does spark REPL not embed scala REPL?

2014-05-30 Thread Aniket
My apologies in advance if this is a dev mailing list topic. I am working on a small project to provide web interface to spark REPL. The interface will allow people to use spark REPL and perform exploratory analysis on the data. I already have a play application running that provides web interface