date:20180525

what defines dataset partition number in spark sql

2018-05-25 Thread 崔苗

Hi, I want to know when I create a dataset by reading files in hdfs in spark sql, like : Dataset user = spark.read().format("json").load(filePath) , what defines the partition number of the dataset? And what if the filePath is a directory instead of a singe file ? Why we can't get the partitions n

Re: Using Apache Kylin as data source for Spark

2018-05-25 Thread Li Yang

That is very useful~~ :-) On Fri, May 18, 2018 at 11:56 AM, ShaoFeng Shi wrote: > Hello, Kylin and Spark users, > > A doc is newly added in Apache Kylin website on how to using Kylin as a > data source in Spark; > This can help the users who want to use Spark to analysis the aggregated > Cube d

[Query] Weight of evidence on Spark

2018-05-25 Thread Aakash Basu

Hi guys, What's the best way to create feature column with Weight of Evidence calculated for categorical columns on target column (both Binary and Multi-Class)? Any insight? Thanks, Aakash.

Re: Submit many spark applications

2018-05-25 Thread yncxcw

hi, please try to reduce the default heap size for the machine you use to submit applications: For example: export _JAVA_OPTIONS="-Xmx512M" The submitter which is also a JVM does not need to reserve lots of memory. Wei -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.c

Databricks 1/2 day certification course at Spark Summit

2018-05-25 Thread Sumona Routh

Hi all, My company just now approved for some of us to go to Spark Summit in SF this year. Unfortunately, the day long workshops on Monday are sold out now. We are considering what we might do instead. Have others done the 1/2 day certification course before? Is it worth considering? Does it cover

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin

I already gave my recommendation in my very first reply to this thread... On Fri, May 25, 2018 at 10:23 AM, raksja wrote: > ok, when to use what? > do you have any recommendation? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > >

Re: Submit many spark applications

2018-05-25 Thread raksja

ok, when to use what? do you have any recommendation? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin

On Fri, May 25, 2018 at 10:18 AM, raksja wrote: > InProcessLauncher would just start a subprocess as you mentioned earlier. No. As the name says, it runs things in the same process. -- Marcelo - To unsubscribe e-mail: user-uns

Re: Submit many spark applications

2018-05-25 Thread raksja

When you mean spark uses, did you meant this https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala? InProcessLauncher would just start a subprocess as you mentioned earlier. How about this, does this makes a rest api call to yar

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin

That's what Spark uses. On Fri, May 25, 2018 at 10:09 AM, raksja wrote: > thanks for the reply. > > Have you tried submit a spark job directly to Yarn using YarnClient. > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html > > Not sure whether its performan

Re: Submit many spark applications

2018-05-25 Thread raksja

thanks for the reply. Have you tried submit a spark job directly to Yarn using YarnClient. https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html Not sure whether its performant and scalable? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.c

Re: Why Spark JDBC Writing in a sequential order

2018-05-25 Thread Yong Zhang

I am not sure about Redshift, but I know the target table is not partitioned. But we should be able to just insert into non-partitioned remote table from 12 clients concurrently, right? Even let's say Redshift doesn't allow concurrently write, then Spark Driver will detect this and coordinatin

Re: Why Spark JDBC Writing in a sequential order

2018-05-25 Thread Jörn Franke

Can your database receive the writes concurrently ? Ie do you make sure that each executor writes into a different partition at database side ? > On 25. May 2018, at 16:42, Yong Zhang wrote: > > Spark version 2.2.0 > > > We are trying to write a DataFrame to remote relationship database (AWS

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

2018-05-25 Thread Chetan Khatri

Ajay, You can use Sqoop if wants to ingest data to HDFS. This is POC where customer wants to prove that Spark ETL would be faster than C# based raw SQL Statements. That's all, There are no time-stamp based columns in Source tables to make it incremental load. On Thu, May 24, 2018 at 1:08 AM, ayan

Re: help with streaming batch interval question needed

2018-05-25 Thread Peter Liu

Hi Jacek, This is exact what i'm looking for. Thanks!! Also thanks for the link. I just noticed that I can unfold the link of trigger and see the examples in java and scala languages - what a general help for a new comer :-) http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spa

Re: help with streaming batch interval question needed

2018-05-25 Thread Jacek Laskowski

Hi Peter, > Basically I need to find a way to set the batch-interval in (b), similar as in (a) below. That's trigger method on DataStreamWriter. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.streaming.DataStreamWriter import org.apache.spark.sql.streaming.Trigger

what defines dataset partition number in spark sql

Re: Using Apache Kylin as data source for Spark

[Query] Weight of evidence on Spark

Re: Submit many spark applications

Databricks 1/2 day certification course at Spark Summit

Re: Submit many spark applications

Re: Submit many spark applications

Re: Submit many spark applications

Re: Submit many spark applications

Re: Submit many spark applications

Re: Submit many spark applications

Re: Why Spark JDBC Writing in a sequential order

Re: Why Spark JDBC Writing in a sequential order

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Re: help with streaming batch interval question needed

Re: help with streaming batch interval question needed

16 matches

Site Navigation

Mail list logo

Footer information