Re: mllib + SQL

2018-09-01 Thread Hemant Bhanawat
its > distributed execution environment. SQL-only analysts would struggle to be > effective with SQL-only access to Spark. > > On Fri, Aug 31, 2018 at 5:05 AM Hemant Bhanawat > wrote: > >> We allow our users to interact with spark cluster using SQL queries only. >> Tha

Re: mllib + SQL

2018-08-31 Thread Hemant Bhanawat
BTW, I can contribute if there is already an effort going on somewhere. On Fri, Aug 31, 2018 at 3:35 PM Hemant Bhanawat wrote: > We allow our users to interact with spark cluster using SQL queries only. > That's easy for them. MLLib does not have SQL extensions and we cannot > ex

Re: mllib + SQL

2018-08-31 Thread Hemant Bhanawat
ng, this is certainly the best place to start. > > See here: https://spark.apache.org/docs/latest/ml-guide.html > > > best, > wb > > > > On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat > wrote: > >> Is there a plan to support SQL extensions for mllib?

mllib + SQL

2018-08-29 Thread Hemant Bhanawat
Is there a plan to support SQL extensions for mllib? Or is there an effort already underway? Any information is appreciated. Thanks in advance. Hemant

Re: Sorting on a streaming dataframe

2018-05-01 Thread Hemant Bhanawat
! > > On Fri, Apr 27, 2018 at 3:59 AM Hemant Bhanawat > wrote: > >> I see. >> >> monotonically_increasing_id on streaming dataFrames will be really >> helpful to me and I believe to many more users. Adding this functionality >> in Spark would b

Re: Sorting on a streaming dataframe

2018-04-27 Thread Hemant Bhanawat
; spec. However, from my experience with Spark, there are many good reasons >> why this requirement is not supported ;) >> >> Best, >> >> Chayapan (A) >> >> >> On Apr 24, 2018, at 2:18 PM, Hemant Bhanawat >> wrote: >> >> Thanks Chris. The

Re: Sorting on a streaming dataframe

2018-04-24 Thread Hemant Bhanawat
gt; all. For example, if you are using kafka, a proper partitioning scheme and > message offsets may be “good enough”. > ------ > *From:* Hemant Bhanawat > *Sent:* Thursday, April 12, 2018 11:42:59 PM > *To:* Reynold Xin > *Cc:* dev > *Subject:* Re: Sorti

Re: Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
the dataframe so that the records always get the same snapshot id. On Fri, Apr 13, 2018 at 11:43 AM, Reynold Xin wrote: > Can you describe your use case more? > > On Thu, Apr 12, 2018 at 11:12 PM Hemant Bhanawat > wrote: > >> Hi Guys, >> >> Why is sorting on s

Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
Hi Guys, Why is sorting on streaming dataframes not supported(unless it is complete mode)? My downstream needs me to sort the streaming dataframe. Hemant

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-10 Thread Hemant Bhanawat
BTW, aggregate push-down support is desirable and should be considered as an enhancement going forward. Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Sun, Sep 10, 2017 at 8:45 PM, vaquar khan wrote: > +1 > > Regards, > Vaquar khan >

Re: CSV Reader with row numbers

2016-09-22 Thread Hemant Bhanawat
API documentation. https://spark.apache.org/docs/1.6.1/api/scala/index.html#org.apache.spark.rdd.RDD Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Thu, Sep 15, 2016 at 4:28 AM, Akshay Sachdeva wrote: > Environment: > Apache Spark 1.6.2 &

Memory usage by Spark jobs

2016-09-21 Thread Hemant Bhanawat
processing a specific data size of let's say parquet data? Also, has someone investigated memory usage for the individual SQL operators like Filter, group by, order by, Exchange etc.? Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io

Re: Executor shutdown hooks?

2016-04-06 Thread Hemant Bhanawat
exit thread will wait for a certain period of time before the executor jvm exits to allow proper cleanups of the tasks. Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Thu, Apr 7, 2016 at 6:08 AM, Reynold Xin wrote: > > On Wed, Apr 6, 201

Re: how about a custom coalesce() policy?

2016-04-02 Thread Hemant Bhanawat
correcting email id for Nezih Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Sun, Apr 3, 2016 at 11:09 AM, Hemant Bhanawat wrote: > Hi Nezih, > > Can you share JIRA and PR numbers? > > This partial de-coupling of data partitioni

Re: how about a custom coalesce() policy?

2016-04-02 Thread Hemant Bhanawat
Hi Nezih, Can you share JIRA and PR numbers? This partial de-coupling of data partitioning strategy and spark parallelism would be a useful feature for any data store. Hemant Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Fri, Apr 1, 2016 at

Re: taking an n number of rows from and RDD starting from an index

2015-09-01 Thread Hemant Bhanawat
I think rdd.toLocalIterator is what you want. But it will keep one partition's data in-memory. On Wed, Sep 2, 2015 at 10:05 AM, Niranda Perera wrote: > Hi all, > > I have a large set of data which would not fit into the memory. So, I wan > to take n number of data from the RDD given a particular

Re: Spark Streaming - Design considerations/Knobs

2015-05-21 Thread Hemant Bhanawat
o remember about Spark Streaming. > > > On Wed, May 20, 2015 at 3:40 AM, Hemant Bhanawat > wrote: > >> Hi, >> >> I have compiled a list (from online sources) of knobs/design >> considerations that need to be taken care of by applications running on >> spark

Spark Streaming - Design considerations/Knobs

2015-05-20 Thread Hemant Bhanawat
Hi, I have compiled a list (from online sources) of knobs/design considerations that need to be taken care of by applications running on spark streaming. Is my understanding correct? Any other important design consideration that I should take care of? - A DStream is associated with a single