date:20170720

Re: How to use Update statement or call stored procedure of Oracle from Spark

2017-07-20 Thread Xiayun Sun

is it only Update statement or in general queries do not work? And can you paste your code so far? We use stored procedures (ms sql though) from spark all the time with different db client libraries and never had any issue. On 21 July 2017 at 03:19, Cassa L wrote: > Hi, > I want to use Spark to

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Udit Mehrotra

Hi Marcelo, Thanks for looking into it. I have opened a jira for this: https://issues.apache.org/jira/browse/SPARK-21494 And yes, it works fine with internal shuffle service. But for our system we have external shuffle/dynamic allocation configured by default. We wanted to try switching from the

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin

Also, things seem to work with all your settings if you disable use of the shuffle service (which also means no dynamic allocation), if that helps you make progress in what you wanted to do. On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin wrote: > Hmm... I tried this with the new shuffle service

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin

Hmm... I tried this with the new shuffle service (I generally have an old one running) and also see failures. I also noticed some odd things in your logs that I'm also seeing in mine, but it's better to track these in a bug instead of e-mail. Please file a bug and attach your logs there, I'll take

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread ayan guha

Hi As Mark said, scheduler mode works within application ie within a Spark Session and Spark context. This is also clear if you think where you set the configuration - in a Spark Config which used to build a context. If you are using Yarn as resource manager, however, you can set YARN with fair s

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Mark Hamstra

The fair scheduler doesn't have anything to do with reallocating resource across Applications. https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application On Thu, Jul 20, 2017 at

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Gokula Krishnan D

Mark, Thanks for the response. Let me rephrase my statements. "I am submitting a Spark application(*Application*#A) with scheduler.mode as FAIR and dynamicallocation=true and it got all the available executors. In the meantime, submitting another Spark Application (*Application* # B) with the sc

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Mark Hamstra

First, Executors are not allocated to Jobs, but rather to Applications. If you run multiple Jobs within a single Application, then each of the Tasks associated with Stages of those Jobs has the potential to run on any of the Application's Executors. Second, once a Task starts running on an Executor

Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-20 Thread Gokula Krishnan D

Hello All, We are having cluster with 50 Executors each with 4 Cores so can avail max. 200 Executors. I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and dynamicallocation=true and it got all the available executors. In the meantime, submitting another Spark Application (J

How to use Update statement or call stored procedure of Oracle from Spark

2017-07-20 Thread Cassa L

Hi, I want to use Spark to parallelize some update operations on Oracle database. However I could not find a way to call Update statements (Update Employee WHERE ???) , use transactions or call stored procedures from Spark/JDBC. Has anyone had this use case before and how did you solve it? Thanks,

Spark Streaming: Blocks and Partitions

2017-07-20 Thread Kalim, Faria

Hi, Just a quick clarification question: from what I understand, blocks in a batch together form a single RDD which is partitioned (usually using the HashPartitioner) across multiple tasks. First, is this correct? Second, the partitioner is called every single time a new task is created. Is the

Re: How to insert a dataframe as a static partition to a partitioned table

2017-07-20 Thread Chaoyu Tang

Thanks Vadim. But I am looking for an API either in DataSet, DataFrame, or DataFrameWriter etc. The way you suggested can be done via a query like spark.sql(""" ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION '/path/to/your/dataset' """), and before that I write it to a specified location fi

Re: How to insert a dataframe as a static partition to a partitioned table

2017-07-20 Thread Vadim Semenov

This should work: ``` ALTER TABLE `table` ADD PARTITION (partcol=1) LOCATION '/path/to/your/dataset' ``` On Wed, Jul 19, 2017 at 6:13 PM, ctang wrote: > I wonder if there are any easy ways (or APIs) to insert a dataframe (or > DataSet), which does not contain the partition columns, as a static >

Re: Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan

On Thu, Jul 20, 2017 at 7:51 PM, ayan guha wrote: > It depends on your need. There are clear instructions around how to run > mvn with specific hive and hadoop bindings. However if you are starting > out, i suggest you to use prebuilt ones. > Hi Ayan, I am setting up Apache Spark with Cassandra

Re: Failed to find Spark jars directory

2017-07-20 Thread ayan guha

It depends on your need. There are clear instructions around how to run mvn with specific hive and hadoop bindings. However if you are starting out, i suggest you to use prebuilt ones. On Fri, 21 Jul 2017 at 12:17 am, Kaushal Shriyan wrote: > On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote: >

Re: Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan

On Thu, Jul 20, 2017 at 7:42 PM, ayan guha wrote: > You should download a pre built version. The code you have got is source > code, you need to build it to generate the jar files. > > Hi Ayan, Can you please help me understand to build to generate the jar files? Regards, Kaushal

Re: Failed to find Spark jars directory

2017-07-20 Thread ayan guha

You should download a pre built version. The code you have got is source code, you need to build it to generate the jar files. On Thu, 20 Jul 2017 at 10:35 pm, Kaushal Shriyan wrote: > Hi, > > I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke > /opt/spark-2.2.0/sbin/start-master.

Spark sc.textFile() files with more partitions Vs files with less partitions

2017-07-20 Thread Gokula Krishnan D

Hello All, our Spark Applications are designed to process the HDFS Files (Hive External Tables). Recently modified the Hive file size by setting the following parameters to ensure that files are having with the average size of 512MB. set hive.merge.mapfiles=true set hive.merge.mapredfiles=true se

Failed to find Spark jars directory

2017-07-20 Thread Kaushal Shriyan

Hi, I have downloaded spark-2.2.0.tgz on CentOS 7.x and when i invoke /opt/spark-2.2.0/sbin/start-master.sh, i get *Failed to find Spark jars directory > (/opt/spark-2.2.0/assembly/target/scala-2.10/jars). You need to build > Spark with the target "package" before running this program.* I am

Re: Issue: Hive Table Stored as col(array) instead of Columns with Spark

2017-07-20 Thread Chetan Khatri

Anyone faced same kind of issue with Spark 2.0.1 ? On Thu, Jul 20, 2017 at 2:08 PM, Chetan Khatri wrote: > Hello All, > I am facing issue with storing Dataframe to Hive table with partitioning , > without partitioning it works good. > > *Spark 2.0.1* > > finalDF.write.mode(SaveMode.Overwrite).pa

What does spark.python.worker.memory affect?

2017-07-20 Thread Cyanny LIANG

Hi As the documentation said: spark.python.worker.memory Amount of memory to use per python worker process during aggregation, in the same format as JVM memory strings (e.g. 512m, 2g). If the memory used during aggregation goes above this amount, it will spill the data into disks. I search the con

solr data source not working

2017-07-20 Thread Imran Rajjad

I am unable to register the Solr Cloud as data source in Spark 2.1.0. Following the documentation at https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I have used the 3.0.0.beta3 version. The system path is displaying the added jar as spark://172.31.208.1:55730/jars/spark-s

Issue: Hive Table Stored as col(array) instead of Columns with Spark

2017-07-20 Thread Chetan Khatri

Hello All, I am facing issue with storing Dataframe to Hive table with partitioning , without partitioning it works good. *Spark 2.0.1* finalDF.write.mode(SaveMode.Overwrite).partitionBy("week_end_date").saveAsTable(OUTPUT_TABLE.get) and added below configuration too: spark.sqlContext.setConf("h

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath

weightCol sets the weight for each individual row of data (training example). It does not set the initial coefficients. On Thu, 20 Jul 2017 at 10:22 Aseem Bansal wrote: > Hi > > I had asked about this somewhere else too and was told that weightCol > method does that > > On Thu, Jul 20, 2017 at 1

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Aseem Bansal

Hi I had asked about this somewhere else too and was told that weightCol method does that On Thu, Jul 20, 2017 at 12:50 PM, Nick Pentreath wrote: > Currently it's not supported, but is on the roadmap: see > https://issues.apache.org/jira/browse/SPARK-13025 > > The most recent attempt is to star

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath

Currently it's not supported, but is on the roadmap: see https://issues.apache.org/jira/browse/SPARK-13025 The most recent attempt is to start with simple linear regression, as here: https://issues.apache.org/jira/browse/SPARK-21386 On Thu, 20 Jul 2017 at 08:36 Aseem Bansal wrote: > We were abl

Re: How to use Update statement or call stored procedure of Oracle from Spark

Re: Question regarding Sparks new Internal authentication mechanism

Re: Question regarding Sparks new Internal authentication mechanism

Re: Question regarding Sparks new Internal authentication mechanism

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Spark on Cloudera Configuration (Scheduler Mode = FAIR)

How to use Update statement or call stored procedure of Oracle from Spark

Spark Streaming: Blocks and Partitions

Re: How to insert a dataframe as a static partition to a partitioned table

Re: How to insert a dataframe as a static partition to a partitioned table

Re: Failed to find Spark jars directory

Re: Failed to find Spark jars directory

Re: Failed to find Spark jars directory

Re: Failed to find Spark jars directory

Spark sc.textFile() files with more partitions Vs files with less partitions

Failed to find Spark jars directory

Re: Issue: Hive Table Stored as col(array) instead of Columns with Spark

What does spark.python.worker.memory affect?

solr data source not working

Issue: Hive Table Stored as col(array) instead of Columns with Spark

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

26 matches

Site Navigation

Mail list logo

Footer information