Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
can confirm > spark.dynamicAllocation.enabled is enough. > > Best Regards > Richard > > From: Sourav Mazumder > Date: Sunday, December 3, 2017 at 12:31 PM > To: user > Subject: Dynamic Resource allocation in Spark Streaming > > Hi, > > I see the following jira is reso

Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
Hi, I see the following jira is resolved in Spark 2.0 https://issues.apache.org/jira/browse/SPARK-12133 which is supposed to support Dynamic Resource Allocation in Spark Streaming. I also see the JiRA https://issues.apache.org/jira/browse/SPARK-22008 which is about fixing numer of executor relate

Re: Custom Data Source for getting data from Rest based services

2017-11-27 Thread Sourav Mazumder
It would be great if you can elaborate on the bulk provisioning use case. Regards, Sourav On Sun, Nov 26, 2017 at 11:53 PM, shankar.roy wrote: > This would be a useful feature. > We can leverage it while doing bulk provisioning. > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3

Custom Data Source for getting data from Rest based services

2017-11-21 Thread Sourav Mazumder
same. Here is the link to the repo - https://github.com/sourav-mazumder/Data-Science-Extensions/tree/master/spark-datasource-rest The interface goes like this : - Inputs : REST API endpoint URL, input Data in a Temporary Spark Table - the name of the table has to be passed, type of method (Get, Post

Monitoring ongoing Spark Job when run in Yarn Cluster mode

2017-03-13 Thread Sourav Mazumder
Hi, Is there a way to monitor an ongoing Spark Job when running in Yarn Cluster mode ? In my understanding in Yarn Cluster mode Spark Monitoring UI for the ongoing job would not be available in 4040 port. So is there an alternative ? Regards, Sourav

Problem in accessing swebhdfs

2016-09-04 Thread Sourav Mazumder
Hi, When I try to access a swebhdfs uri I get following error. In my hadoop cluster webhdfs is enabled. Also I can access the same resource using webhdfs API from a http client with SSL. Any idea what is going wring ? Regards, Sourav java.io.IOException: Unexpected HTTP response: code=404 !=

Creating RDD using swebhdfs with truststore

2016-09-03 Thread Sourav Mazumder
Hi, I am trying to create a RDD by using swebhdfs to a remote hadoop cluster which is protected by Knox and uses SSL. The code looks like this - sc.textFile("swebhdfs:/host:port/gateway/default/webhdfs/v1/").count. I'm passing the truststore and trustorepassword through extra java options while

Re: Reliability of JMS Custom Receiver in Spark Streaming JMS

2016-05-12 Thread Sourav Mazumder
Any inputs on this issue ? Regards, Sourav On Tue, May 10, 2016 at 6:17 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > Hi, > > Need to get bit more understanding of reliability aspects of the Custom > Receivers in the context of the code in spark-streaming-jms &g

Reliability of JMS Custom Receiver in Spark Streaming JMS

2016-05-10 Thread Sourav Mazumder
Hi, Need to get bit more understanding of reliability aspects of the Custom Receivers in the context of the code in spark-streaming-jms https://github.com/mattf/spark-streaming-jms. Based on the documentation in http://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliabil

Error in spark-xml

2016-04-30 Thread Sourav Mazumder
Hi, Looks like there is a problem in spark-xml if the xml has multiple attributes with no child element. For example say the xml has a nested object as below bk_113 bk_114 Now if I create a dataframe starting with rowtag bkval and then I do a select on that data frame it gives

Re: Spark 2.0 forthcoming features

2016-04-25 Thread Sourav Mazumder
iPhone > Pardon the dumb thumb typos :) > > On Apr 20, 2016, at 10:15 AM, Michael Malak < > michaelma...@yahoo.com.INVALID > wrote: > > > http://go.databricks.com/apache-spark-2.0-presented-by-databricks-co-founder-reynold-xin > > > > > --

Spark 2.0 forthcoming features

2016-04-20 Thread Sourav Mazumder
Hi All, Is there somewhere we can get idea of the upcoming features in Spark 2.0. I got a list for Spark ML from here https://issues.apache.org/jira/browse/SPARK-12626. Is there other links where I can similar enhancements planned for Sparl SQL, Spark Core, Spark Streaming. GraphX etc. ? Thanks

SSL support for Spark Thrift Server

2016-03-04 Thread Sourav Mazumder
Hi All, While starting the Spark Thrift Server I don't see any option to start it with SSL support. Is that support currently there ? Regards, Sourav

Spark with SAS

2016-02-03 Thread Sourav Mazumder
Hi, Is anyone aware of any work going on for integrating Spark with SAS for executing queries in Spark? For example calling Spark Jobs from SAS using Spark SQL through Spark SQL's JDBC/ODBC library. Regards, Sourav

Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Sourav Mazumder
You can also try out IBM's spark as a service in IBM Bluemix. You'll get there all required features for security, multitenancy, notebook, integration with other big data services. You can try that out for free too. Regards, Sourav On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni wrote: > At its co

Re: 回复: how to use sparkR or spark MLlib load csv file on hdfs thencalculate covariance

2015-12-29 Thread Sourav Mazumder
Alternatively you can also try the ML library from System ML ( http://systemml.apache.org/) for covariance computation on Spark. Regards, Sourav On Mon, Dec 28, 2015 at 11:29 PM, Sun, Rui wrote: > Spark does not support computing cov matrix now. But there is a PR for > it. Maybe you can try it

Use of rdd.zipWithUniqueId() in DStream

2015-12-13 Thread Sourav Mazumder
Hi All, I'm trying to use zipWithUniqieId() function of RDD using transform function of dStream. It does generate unique id always starting from 0 and in sequence. However, not sure whether this is a reliable behavior which is always guaranteed to generate sequence number starting form 0. Can an

Fwd: Window function in Spark SQL

2015-12-11 Thread Sourav Mazumder
duce in my environment - might want to copy that to the Spark user list. Sorry! On Dec 11, 2015, at 1:37 PM, Sourav Mazumder wrote: Hi Ross, Thanks for your answer. In 1.5.x whenever I try to create a HiveContext from SparkContext I get following error. Please note that I'm not running

Window function in Spark SQL

2015-12-11 Thread Sourav Mazumder
Hi, Spark SQL documentation says that it complies with Hive 1.2.1 APIs and supports Window functions. I'm using Spark 1.5.0. However, when I try to execute something like below I get an error val lol5 = sqlContext.sql("select ky, lead(ky, 5, 0) over (order by ky rows 5 following) from lolt") ja

HTTP Source for Spark Streaming

2015-12-09 Thread Sourav Mazumder
Hi All, Currently is there a way using which one can connect to a http server to get data as a dstream at a given frequency ? Or one has to write own utility for the same ? Regards, Sourav

java.util.NoSuchElementException: key not found error

2015-10-21 Thread Sourav Mazumder
In 1.5.0 if I use randomSplit on a data frame I get this error. Here is teh code snippet - val splitData = merged.randomSplit(Array(70,30)) val trainData = splitData(0).persist() val testData = splitData(1) trainData.registerTempTable("trn") %sql select * from trn The exception goes like this

SQL Context error in 1.5.1 - any work around ?

2015-10-15 Thread Sourav Mazumder
I keep on getting this error whenever I'm starting spark-shell : The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--. I cannot work with this if I need to do anything with sqlContext as that does not get created. I could see that a bug is raised for this

Sensitivity analysis using Spark MlLib

2015-10-14 Thread Sourav Mazumder
Is there any algorithm implementated in Spark MLLib which supports parameter sensitivity analysis ? After the model is created using a training data set, the model should be able to tell among the various features used which are the ones most important (from the perspective of their contribution t

Support for Hive Storage Handle in Spark SQL and core Spark

2015-08-28 Thread Sourav Mazumder
Hi, I have data written in HDFS using a custom storage handler of Hive. Can I access that data in Spark using Spark SQL . For example can I write a Spark SQL to access the data from a hive table in HDFS which was created as - CREATE TABLE custom_table_1(key int, value string) STORED BY 'org.apac

Random Forest in MLLib

2015-07-06 Thread Sourav Mazumder
Hi, Is there a way to get variable importance for RandomForest model created using MLLib ? This way one can know among multiple features which are the one contributing the most to the dependent variable. Regards, Sourav

How to create a LabeledPoint RDD from a Data Frame

2015-07-06 Thread Sourav Mazumder
Hi, I have a Dataframe which I want to use for creating a RandomForest model using MLLib. The RandonForest model needs a RDD with LabeledPoints. Wondering how do I convert the DataFrame to LabeledPointRDD Regards, Sourav

Re: Calling MLLib from SparkR

2015-07-01 Thread Sourav Mazumder
/jira/browse/SPARK-6805 > > On Wed, Jul 1, 2015 at 4:23 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Hi, >> >> Does Spark 1.4 support calling MLLib directly from SparkR ? >> >> If not, is there any work around, any example available somewhere ? >> >> Regards, >> Sourav >> > >

Calling MLLib from SparkR

2015-07-01 Thread Sourav Mazumder
Hi, Does Spark 1.4 support calling MLLib directly from SparkR ? If not, is there any work around, any example available somewhere ? Regards, Sourav

Re: sparkR could not find function "textFile"

2015-07-01 Thread Sourav Mazumder
4577954. BTW we added a new option to > sparkR.init to pass in packages and that should be a part of 1.5 > > Shivaram > > On Wed, Jul 1, 2015 at 10:03 AM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Hi, >> >> Piggybacking on this discus

Re: sparkR could not find function "textFile"

2015-07-01 Thread Sourav Mazumder
Hi, Piggybacking on this discussion. I'm trying to achieve the same, reading a csv file, from RStudio. Where I'm stuck is how to supply some additional package from RStudio to spark.init() as sparkR.init does() not provide an option to specify additional package. I tried following codefrom RStud

Passing name of package in sparkR.init()

2015-07-01 Thread Sourav Mazumder
Hi, What is the right way to pass package name in sparkR.init() ? I can successfully pass the package name if I'm using sparkR shell by using --package while invoking sparkR. However, if I'm trying to use sparkR from RStudio and neeed to pass a package name in sparkR.init() not sure how to do th

Issues in reading a CSV file from local file system using spark-shell

2015-06-30 Thread Sourav Mazumder
Hi, I'm running Spark 1.4.0 without Hadoop. I'm using the binary spark-1.4.0-bin-hadoop2.6. I start the spark-shell as : spark-shell --master local[2] --packages com.databricks:spark-csv_2.11:1.1.0 --executor-memory 2G --conf spark.local.dir="C:/Users/Sourav". Then I run : val df = sqlContext

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Sourav Mazumder
but the parameter passed > to spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0". > > On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> HI Jey, >> >> Not much of luck.

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Sourav Mazumder
e for Spark 1.4 with > Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". > > -Jey > > On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Hi Jey, >> >> Thanks for your inputs. >> >> Probabl

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Sourav Mazumder
f course you will not be able to >> use any hadoop inputformat etc. out of the box. >> >> ** I am assuming its a learning question :) For production, I would >> suggest build it from source. >> >> If you are using python and need some help, please drop me a not

Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Sourav Mazumder
Hi, I'm trying to run Spark without Hadoop where the data would be read and written to local disk. For this I have few Questions - 1. Which download I need to use ? In the download option I don't see any binary download which does not need Hadoop. Is the only way to do this to download the sourc

Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread Sourav Mazumder
Hi, Though the documentation does not explicitly mention support for Windowing and Analytics function in Spark SQL, looks like it is not supported. I tried running a query like Select Lead(, 1) over (Partition By order by ) from and I got error saying that this feature is unsupported. I tried

Re: Spark SQL with Thrift Server is very very slow and finally failing

2015-06-10 Thread Sourav Mazumder
| > | Code Generation: false | > | == RDD == | > +-+ > > On 6/10/15 1:28 PM, Sourav Mazumder wrote: > >From log file I no

Re: Spark SQL with Thrift Server is very very slow and finally failing

2015-06-09 Thread Sourav Mazumder
e tables, one with 100 MB data (around 1 M rows) and another with 20 KB data (around 100 rows) why an executor is consuming so much of memory. Even if I increase the memory to 20 GB. The same failure happens. Regards, Sourav On Tue, Jun 9, 2015 at 12:58 PM, Sourav Mazumder < sourav.mazumde..

Spark SQL with Thrift Server is very very slow and finally failing

2015-06-08 Thread Sourav Mazumder
Hi, I am trying to run a SQL form a JDBC driver using Spark's Thrift Server. I'm doing a join between a Hive Table of size around 100 GB and another Hive Table with 10 KB, with a filter on a particular column The query takes more than 45 minutes and then I get ExecutorLostFailure. That is becaus

Re: Spark SQL is not able to connect to hive metastore

2015-05-16 Thread Sourav Mazumder
Hi Ayan, Thanks for your response. In my case the constraint is I have to use Hive 0.14 for some other usecases. I believe the incompatibility is at the thrift server level (the hiveserver 2 which comes with hive). If I use Hive 0.13 hiverserver 2 in the same node as of spark master should that