ual objects
> cheaply. Right now, that’s only possible at the stream level. (There are
> hacks around this, but this would enable more idiomatic use in efficient
> shuffle implementations.)
>
>
> Have serializers indicate whether they are deterministic. This provides
> much of
Hi all,
I was stuck on a problem that I faced recently. The problem statement is
like :
Event Bean consists of eventId, eventTag, text, .
We need to run a spark job that aggregates the eventTag column and picks
top K1 of them.
Additionally, we need for each eventTag, list of eventIds (first K2
Thanks Ryan. I am running into this rarer issue. For now, I have moved away
from parquet but if I will create a bug in jira if I am able to produce
code that easily reproduces this.
Thanks,
Aniket
On Mon, Nov 21, 2016, 3:24 PM Ryan Blue [via Apache Spark Developers List] <
ml-node+s1001551n19
Was anyone able find a solution or recommended conf for this? I am running
into the same "java.lang.OutOfMemoryError: Direct buffer memory" but during
snappy compression.
Thanks,
Aniket
On Tue, Sep 23, 2014 at 7:04 PM Aaron Davidson [via Apache Spark Developers
List] wrote:
>
painful and I share the pain :)
Thanks,
Aniket
On Tue, Sep 15, 2015, 5:06 AM sim [via Apache Spark Developers List] <
ml-node+s1001551n14116...@n3.nabble.com> wrote:
> I'd like to get some feedback on an API design issue pertaining to RDDs.
>
> The design goal to avoid RDD nesting
Circling back on this. Did you get a chance to re-look at this?
Thanks,
Aniket
On Sun, Feb 8, 2015, 2:53 AM Aniket Bhatnagar
wrote:
> Thanks for looking into this. If this true, isn't this an issue today? The
> default implementation of sizeInBytes is 1 + broadcast thresh
e more accurate than Catalyst's prediction.
Therefore, if its not a fundamental change in Catalyst, I would think this
makes sense.
Thanks,
Aniket
On Sat, Feb 7, 2015, 4:50 AM Reynold Xin wrote:
> We thought about this today after seeing this email. I actually built a
> patch fo
large relation
broadcast-able. Thoughts?
Aniket
Thanks Reynold and Cheng. It does seem quiet a bit of heavy lifting to have
schema per row. I will for now settle with having to do a union schema of
all the schema versions and complain any incompatibilities :-)
Looking forward to do great things with the API!
Thanks,
Aniket
On Thu Jan 29 2015
Hi Patrick,
I am wondering if this version will address issues around certain artifacts
not getting published in 1.2 which are gating people to migrate to 1.2. One
such issue is https://issues.apache.org/jira/browse/SPARK-5144
Thanks,
Aniket
On Wed Jan 28 2015 at 15:39:43 Patrick Wendell [via
chema upfront?
Thanks,
Aniket
Hi Chris
This is super cool. I was wondering if this would be an open source project
so that people can contribute or reuse?
Thanks,
Aniket
On Thu Jan 15 2015 at 07:39:29 Mattmann, Chris A (3980) [via Apache Spark
Developers List] wrote:
> Hi Spark Devs,
>
> Just wanted to FYI t
Ohh right. It is. I will mark my defect as duplicate and cross check my
notes with the fixes in the pull request. Thanks for pointing out Zsolt :)
On Mon, Jan 12, 2015, 7:42 PM Zsolt Tóth wrote:
> Hi Aniket,
>
> I think this is a duplicate of SPARK-1825, isn't it?
>
> Zsolt
would be a great help
for windows users (like me).
Thanks,
Aniket
fely. I am also happy mutating the original SparkContext just
not break backward compatibility as long as the returned SparkContext is
not affected by set/unset of job groups on original SparkContext.
Thoughts please?
Thanks,
Aniket
upgrading httpclient? (or jets3t?)
>
> 2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar :
>
>> Thanks everyone for weighing in on this.
>>
>> I had backported kinesis module from master to spark 1.0.2 so just to
>> confirm if I am not missing anything, I did a dependenc
Looks like the same issue as
http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E
On Sep 20, 2014 11:09 AM, "tian zhang [via Apache Spark Developers List]" <
ml-node+s1001551n8481...@n3.nabble.com> wrote:
>
>
> Hi,
d deal with
>> > some of these issues, but I don't think it works.
>> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego"
>> wrote:
>> >
>> > > Hi,
>> > > I run into the same issue and apart from the ideas Aniket said, I on
d user.
My personal preference is OSGi (or atleast some support for OSGi) but I
would love to hear what Spark devs are thinking in terms of resolving the
problem.
Thanks,
Aniket
ances
which makes sense. Maybe the API should provide ability to provide
parallelism and default to numShards?
I can submit pull requests for some of the above items, provided the
community agrees and nobody else is working on it.
Thanks,
Aniket
I too would like this feature. Erik's post makes sense. However, shouldn't
the RDD also repartition itself after drop to effectively make use of
cluster resources?
On Jul 21, 2014 8:58 PM, "Andrew Ash [via Apache Spark Developers List]" <
ml-node+s1001551n7434...@n3.nabble.com> wrote:
> Personally
My apologies in advance if this is a dev mailing list topic. I am working on
a small project to provide web interface to spark REPL. The interface will
allow people to use spark REPL and perform exploratory analysis on the data.
I already have a play application running that provides web interface
22 matches
Mail list logo