can confirm
> spark.dynamicAllocation.enabled is enough.
>
> Best Regards
> Richard
>
> From: Sourav Mazumder
> Date: Sunday, December 3, 2017 at 12:31 PM
> To: user
> Subject: Dynamic Resource allocation in Spark Streaming
>
> Hi,
>
> I see the following jira is reso
Hi,
I see the following jira is resolved in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-12133 which is supposed to
support Dynamic Resource Allocation in Spark Streaming.
I also see the JiRA https://issues.apache.org/jira/browse/SPARK-22008 which
is about fixing numer of executor relate
It would be great if you can elaborate on the bulk provisioning use case.
Regards,
Sourav
On Sun, Nov 26, 2017 at 11:53 PM, shankar.roy wrote:
> This would be a useful feature.
> We can leverage it while doing bulk provisioning.
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3
same. Here
is the link to the repo -
https://github.com/sourav-mazumder/Data-Science-Extensions/tree/master/spark-datasource-rest
The interface goes like this :
- Inputs : REST API endpoint URL, input Data in a Temporary Spark Table -
the name of the table has to be passed, type of method (Get, Post
Hi,
Is there a way to monitor an ongoing Spark Job when running in Yarn Cluster
mode ?
In my understanding in Yarn Cluster mode Spark Monitoring UI for the
ongoing job would not be available in 4040 port. So is there an alternative
?
Regards,
Sourav
Hi,
When I try to access a swebhdfs uri I get following error.
In my hadoop cluster webhdfs is enabled.
Also I can access the same resource using webhdfs API from a http client
with SSL.
Any idea what is going wring ?
Regards,
Sourav
java.io.IOException: Unexpected HTTP response: code=404 !=
Hi,
I am trying to create a RDD by using swebhdfs to a remote hadoop cluster
which is protected by Knox and uses SSL.
The code looks like this -
sc.textFile("swebhdfs:/host:port/gateway/default/webhdfs/v1/").count.
I'm passing the truststore and trustorepassword through extra java options
while
Any inputs on this issue ?
Regards,
Sourav
On Tue, May 10, 2016 at 6:17 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:
> Hi,
>
> Need to get bit more understanding of reliability aspects of the Custom
> Receivers in the context of the code in spark-streaming-jms
&g
Hi,
Need to get bit more understanding of reliability aspects of the Custom
Receivers in the context of the code in spark-streaming-jms
https://github.com/mattf/spark-streaming-jms.
Based on the documentation in
http://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliabil
Hi,
Looks like there is a problem in spark-xml if the xml has multiple
attributes with no child element.
For example say the xml has a nested object as below
bk_113
bk_114
Now if I create a dataframe starting with rowtag bkval and then I do a
select on that data frame it gives
iPhone
> Pardon the dumb thumb typos :)
>
> On Apr 20, 2016, at 10:15 AM, Michael Malak <
> michaelma...@yahoo.com.INVALID > wrote:
>
>
> http://go.databricks.com/apache-spark-2.0-presented-by-databricks-co-founder-reynold-xin
>
>
>
>
> --
Hi All,
Is there somewhere we can get idea of the upcoming features in Spark 2.0.
I got a list for Spark ML from here
https://issues.apache.org/jira/browse/SPARK-12626.
Is there other links where I can similar enhancements planned for Sparl
SQL, Spark Core, Spark Streaming. GraphX etc. ?
Thanks
Hi All,
While starting the Spark Thrift Server I don't see any option to start it
with SSL support.
Is that support currently there ?
Regards,
Sourav
Hi,
Is anyone aware of any work going on for integrating Spark with SAS for
executing queries in Spark?
For example calling Spark Jobs from SAS using Spark SQL through Spark SQL's
JDBC/ODBC library.
Regards,
Sourav
You can also try out IBM's spark as a service in IBM Bluemix. You'll get
there all required features for security, multitenancy, notebook,
integration with other big data services. You can try that out for free too.
Regards,
Sourav
On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni wrote:
> At its co
Alternatively you can also try the ML library from System ML (
http://systemml.apache.org/) for covariance computation on Spark.
Regards,
Sourav
On Mon, Dec 28, 2015 at 11:29 PM, Sun, Rui wrote:
> Spark does not support computing cov matrix now. But there is a PR for
> it. Maybe you can try it
Hi All,
I'm trying to use zipWithUniqieId() function of RDD using transform
function of dStream. It does generate unique id always starting from 0 and
in sequence.
However, not sure whether this is a reliable behavior which is always
guaranteed to generate sequence number starting form 0.
Can an
duce in my
environment - might want to copy that to the Spark user list. Sorry!
On Dec 11, 2015, at 1:37 PM, Sourav Mazumder
wrote:
Hi Ross,
Thanks for your answer.
In 1.5.x whenever I try to create a HiveContext from SparkContext I get
following error. Please note that I'm not running
Hi,
Spark SQL documentation says that it complies with Hive 1.2.1 APIs and
supports Window functions. I'm using Spark 1.5.0.
However, when I try to execute something like below I get an error
val lol5 = sqlContext.sql("select ky, lead(ky, 5, 0) over (order by ky rows
5 following) from lolt")
ja
Hi All,
Currently is there a way using which one can connect to a http server to
get data as a dstream at a given frequency ?
Or one has to write own utility for the same ?
Regards,
Sourav
In 1.5.0 if I use randomSplit on a data frame I get this error.
Here is teh code snippet -
val splitData = merged.randomSplit(Array(70,30))
val trainData = splitData(0).persist()
val testData = splitData(1)
trainData.registerTempTable("trn")
%sql select * from trn
The exception goes like this
I keep on getting this error whenever I'm starting spark-shell : The root
scratch dir: /tmp/hive on HDFS should be writable. Current permissions are:
rwx--.
I cannot work with this if I need to do anything with sqlContext as that
does not get created.
I could see that a bug is raised for this
Is there any algorithm implementated in Spark MLLib which supports
parameter sensitivity analysis ?
After the model is created using a training data set, the model should be
able to tell among the various features used which are the ones most
important (from the perspective of their contribution t
Hi,
I have data written in HDFS using a custom storage handler of Hive. Can I
access that data in Spark using Spark SQL .
For example can I write a Spark SQL to access the data from a hive table in
HDFS which was created as -
CREATE TABLE custom_table_1(key int, value string)
STORED BY 'org.apac
Hi,
Is there a way to get variable importance for RandomForest model created
using MLLib ? This way one can know among multiple features which are the
one contributing the most to the dependent variable.
Regards,
Sourav
Hi,
I have a Dataframe which I want to use for creating a RandomForest model
using MLLib.
The RandonForest model needs a RDD with LabeledPoints.
Wondering how do I convert the DataFrame to LabeledPointRDD
Regards,
Sourav
/jira/browse/SPARK-6805
>
> On Wed, Jul 1, 2015 at 4:23 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Hi,
>>
>> Does Spark 1.4 support calling MLLib directly from SparkR ?
>>
>> If not, is there any work around, any example available somewhere ?
>>
>> Regards,
>> Sourav
>>
>
>
Hi,
Does Spark 1.4 support calling MLLib directly from SparkR ?
If not, is there any work around, any example available somewhere ?
Regards,
Sourav
4577954. BTW we added a new option to
> sparkR.init to pass in packages and that should be a part of 1.5
>
> Shivaram
>
> On Wed, Jul 1, 2015 at 10:03 AM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Hi,
>>
>> Piggybacking on this discus
Hi,
Piggybacking on this discussion.
I'm trying to achieve the same, reading a csv file, from RStudio. Where I'm
stuck is how to supply some additional package from RStudio to spark.init()
as sparkR.init does() not provide an option to specify additional package.
I tried following codefrom RStud
Hi,
What is the right way to pass package name in sparkR.init() ?
I can successfully pass the package name if I'm using sparkR shell by using
--package while invoking sparkR.
However, if I'm trying to use sparkR from RStudio and neeed to pass a
package name in sparkR.init() not sure how to do th
Hi,
I'm running Spark 1.4.0 without Hadoop. I'm using the binary
spark-1.4.0-bin-hadoop2.6.
I start the spark-shell as :
spark-shell --master local[2] --packages
com.databricks:spark-csv_2.11:1.1.0 --executor-memory 2G --conf
spark.local.dir="C:/Users/Sourav".
Then I run :
val df =
sqlContext
but the parameter passed
> to spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0".
>
> On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> HI Jey,
>>
>> Not much of luck.
e for Spark 1.4 with
> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0".
>
> -Jey
>
> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Hi Jey,
>>
>> Thanks for your inputs.
>>
>> Probabl
f course you will not be able to
>> use any hadoop inputformat etc. out of the box.
>>
>> ** I am assuming its a learning question :) For production, I would
>> suggest build it from source.
>>
>> If you are using python and need some help, please drop me a not
Hi,
I'm trying to run Spark without Hadoop where the data would be read and
written to local disk.
For this I have few Questions -
1. Which download I need to use ? In the download option I don't see any
binary download which does not need Hadoop. Is the only way to do this to
download the sourc
Hi,
Though the documentation does not explicitly mention support for Windowing
and Analytics function in Spark SQL, looks like it is not supported.
I tried running a query like Select Lead(, 1) over (Partition
By order by ) from and I got error
saying that this feature is unsupported.
I tried
|
> | Code Generation: false |
> | == RDD == |
> +-+
>
> On 6/10/15 1:28 PM, Sourav Mazumder wrote:
>
>From log file I no
e tables, one with 100 MB data (around
1 M rows) and another with 20 KB data (around 100 rows) why an executor is
consuming so much of memory. Even if I increase the memory to 20 GB. The
same failure happens.
Regards,
Sourav
On Tue, Jun 9, 2015 at 12:58 PM, Sourav Mazumder <
sourav.mazumde..
Hi,
I am trying to run a SQL form a JDBC driver using Spark's Thrift Server.
I'm doing a join between a Hive Table of size around 100 GB and another
Hive Table with 10 KB, with a filter on a particular column
The query takes more than 45 minutes and then I get ExecutorLostFailure.
That is becaus
Hi Ayan,
Thanks for your response.
In my case the constraint is I have to use Hive 0.14 for some other
usecases.
I believe the incompatibility is at the thrift server level (the hiveserver
2 which comes with hive). If I use Hive 0.13 hiverserver 2 in the same node
as of spark master should that
41 matches
Mail list logo