is it only Update statement or in general queries do not work? And can you
paste your code so far?
We use stored procedures (ms sql though) from spark all the time with
different db client libraries and never had any issue.
On 21 July 2017 at 03:19, Cassa L wrote:
> Hi,
> I want to use Spark to
Hi Marcelo,
Thanks for looking into it. I have opened a jira for this:
https://issues.apache.org/jira/browse/SPARK-21494
And yes, it works fine with internal shuffle service. But for our system we
have external shuffle/dynamic allocation configured by default. We wanted
to try switching from the
Also, things seem to work with all your settings if you disable use of
the shuffle service (which also means no dynamic allocation), if that
helps you make progress in what you wanted to do.
On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin wrote:
> Hmm... I tried this with the new shuffle service
Hmm... I tried this with the new shuffle service (I generally have an
old one running) and also see failures. I also noticed some odd things
in your logs that I'm also seeing in mine, but it's better to track
these in a bug instead of e-mail.
Please file a bug and attach your logs there, I'll take
Hi
As Mark said, scheduler mode works within application ie within a Spark
Session and Spark context. This is also clear if you think where you set
the configuration - in a Spark Config which used to build a context.
If you are using Yarn as resource manager, however, you can set YARN with
fair s
The fair scheduler doesn't have anything to do with reallocating resource
across Applications.
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
On Thu, Jul 20, 2017 at
Mark, Thanks for the response.
Let me rephrase my statements.
"I am submitting a Spark application(*Application*#A) with scheduler.mode
as FAIR and dynamicallocation=true and it got all the available executors.
In the meantime, submitting another Spark Application (*Application* # B)
with the sc
First, Executors are not allocated to Jobs, but rather to Applications. If
you run multiple Jobs within a single Application, then each of the Tasks
associated with Stages of those Jobs has the potential to run on any of the
Application's Executors. Second, once a Task starts running on an Executor
Hello All,
We are having cluster with 50 Executors each with 4 Cores so can avail max.
200 Executors.
I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and
dynamicallocation=true and it got all the available executors.
In the meantime, submitting another Spark Application (J
Hi,
I want to use Spark to parallelize some update operations on Oracle
database. However I could not find a way to call Update statements (Update
Employee WHERE ???) , use transactions or call stored procedures from
Spark/JDBC.
Has anyone had this use case before and how did you solve it?
Thanks,
Hi,
Just a quick clarification question: from what I understand, blocks in a batch
together form a single RDD which is partitioned (usually using the
HashPartitioner) across multiple tasks. First, is this correct? Second, the
partitioner is called every single time a new task is created. Is the
Thanks Vadim. But I am looking for an API either in DataSet, DataFrame, or
DataFrameWriter etc. The way you suggested can be done via a query like
spark.sql(""" ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION
'/path/to/your/dataset' """), and before that I write it to a specified
location fi
This should work:
```
ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION
'/path/to/your/dataset'
```
On Wed, Jul 19, 2017 at 6:13 PM, ctang wrote:
> I wonder if there are any easy ways (or APIs) to insert a dataframe (or
> DataSet), which does not contain the partition columns, as a static
>
On Thu, Jul 20, 2017 at 7:51 PM, ayan guha wrote:
> It depends on your need. There are clear instructions around how to run
> mvn with specific hive and hadoop bindings. However if you are starting
> out, i suggest you to use prebuilt ones.
>
Hi Ayan,
I am setting up Apache Spark with Cassandra
It depends on your need. There are clear instructions around how to run mvn
with specific hive and hadoop bindings. However if you are starting out, i
suggest you to use prebuilt ones.
On Fri, 21 Jul 2017 at 12:17 am, Kaushal Shriyan
wrote:
> On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote:
>
On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote:
> You should download a pre built version. The code you have got is source
> code, you need to build it to generate the jar files.
>
>
Hi Ayan,
Can you please help me understand to build to generate the jar files?
Regards,
Kaushal
You should download a pre built version. The code you have got is source
code, you need to build it to generate the jar files.
On Thu, 20 Jul 2017 at 10:35 pm, Kaushal Shriyan
wrote:
> Hi,
>
> I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke
> /opt/spark-2.2.0/sbin/start-master.
Hello All,
our Spark Applications are designed to process the HDFS Files (Hive
External Tables).
Recently modified the Hive file size by setting the following parameters to
ensure that files are having with the average size of 512MB.
set hive.merge.mapfiles=true
set hive.merge.mapredfiles=true
se
Hi,
I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke
/opt/spark-2.2.0/sbin/start-master.sh, i get
*Failed to find Spark jars directory
> (/opt/spark-2.2.0/assembly/target/scala-2.10/jars). You need to build
> Spark with the target "package" before running this program.*
I am
Anyone faced same kind of issue with Spark 2.0.1 ?
On Thu, Jul 20, 2017 at 2:08 PM, Chetan Khatri
wrote:
> Hello All,
> I am facing issue with storing Dataframe to Hive table with partitioning ,
> without partitioning it works good.
>
> *Spark 2.0.1*
>
> finalDF.write.mode(SaveMode.Overwrite).pa
Hi
As the documentation said:
spark.python.worker.memory
Amount of memory to use per python worker process during aggregation, in
the same format as JVM memory strings (e.g. 512m, 2g). If the memory used
during aggregation goes above this amount, it will spill the data into
disks.
I search the con
I am unable to register the Solr Cloud as data source in Spark 2.1.0.
Following the documentation at
https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I
have used the 3.0.0.beta3 version.
The system path is displaying the added jar as
spark://172.31.208.1:55730/jars/spark-s
Hello All,
I am facing issue with storing Dataframe to Hive table with partitioning ,
without partitioning it works good.
*Spark 2.0.1*
finalDF.write.mode(SaveMode.Overwrite).partitionBy("week_end_date").saveAsTable(OUTPUT_TABLE.get)
and added below configuration too:
spark.sqlContext.setConf("h
weightCol sets the weight for each individual row of data (training
example). It does not set the initial coefficients.
On Thu, 20 Jul 2017 at 10:22 Aseem Bansal wrote:
> Hi
>
> I had asked about this somewhere else too and was told that weightCol
> method does that
>
> On Thu, Jul 20, 2017 at 1
Hi
I had asked about this somewhere else too and was told that weightCol
method does that
On Thu, Jul 20, 2017 at 12:50 PM, Nick Pentreath
wrote:
> Currently it's not supported, but is on the roadmap: see
> https://issues.apache.org/jira/browse/SPARK-13025
>
> The most recent attempt is to star
Currently it's not supported, but is on the roadmap: see
https://issues.apache.org/jira/browse/SPARK-13025
The most recent attempt is to start with simple linear regression, as here:
https://issues.apache.org/jira/browse/SPARK-21386
On Thu, 20 Jul 2017 at 08:36 Aseem Bansal wrote:
> We were abl
26 matches
Mail list logo