from:"David Mitchell"

Re: How to use spark-on-k8s pod template?

2019-11-08 Thread David Mitchell

Are you using Spark 2.3 or above? See the documentation: https://spark.apache.org/docs/latest/running-on-kubernetes.html I looks like you do not need: --conf spark.kubernetes.driver.podTemplateFile='/spark-pod-template.yaml' \ --conf spark.kubernetes.executor.podTemplateFile='/spark-pod-template.

Re: What benefits do we really get out of colocation?

2016-12-03 Thread David Mitchell

To get a node local read from Spark to Cassandra, one has to use a read consistency level of LOCAL_ONE. For some use cases, this is not an option. For example, if you need to use a read consistency level of LOCAL_QUORUM, as many use cases demand, then one is not going to get a node local read. A

Re: How to avoid Spark shuffle spill memory?

2015-10-06 Thread David Mitchell

Hi unk1102, Try adding more memory to your nodes. Are you running Spark in the cloud? If so, increase the memory on your servers. Do you have default parallelism set (spark.default.parallelism)? If so, unset it, and let Spark decided how many partitions to allocate. You can also try refactoring

Re: submit_spark_job_to_YARN

2015-08-30 Thread David Mitchell

Hi Ajay, Are you trying to save to your local file system or to HDFS? // This would save to HDFS under "/user/hadoop/counter" counter.saveAsTextFile("/user/hadoop/counter"); David On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander wrote: > Hi Everyone, > > Recently we have installed spark on yar

Re: No. of Task vs No. of Executors

2015-07-18 Thread David Mitchell

This is likely due to data skew. If you are using key-value pairs, one key has a lot more records, than the other keys. Do you have any groupBy operations? David On Tue, Jul 14, 2015 at 9:43 AM, shahid wrote: > hi > > I have a 10 node cluster i loaded the data onto hdfs, so the no. of > par

Re: Spark performance

2015-07-11 Thread David Mitchell

You can certainly query over 4 TB of data with Spark. However, you will get an answer in minutes or hours, not in milliseconds or seconds. OLTP databases are used for web applications, and typically return responses in milliseconds. Analytic databases tend to operate on large data sets, and retu

Re: spark sql - reading data from sql tables having space in column names

2015-06-02 Thread David Mitchell

I am having the same problem reading JSON. There does not seem to be a way of selecting a field that has a space, "Executor Info" from the Spark logs. I suggest that we open a JIRA ticket to address this issue. On Jun 2, 2015 10:08 AM, "ayan guha" wrote: > I would think the easiest way would b

ORCFiles

2015-04-24 Thread David Mitchell

Does anyone know in which version of Spark will there be support for ORCFiles via spark.sql.hive? Will it be in 1.4? David

Re: Spark Release 1.3.0 DataFrame API

2015-03-15 Thread David Mitchell

rame. >> > >> > So people.toDF.registerTempTable("people") should work >> > >> > >> > >> > — >> > Sent from Mailbox >> > >> > >> > On Sat, Mar 14, 2015 at 5:33 PM, David Mitchell < >> jdavi

Spark Release 1.3.0 DataFrame API

2015-03-14 Thread David Mitchell

I am pleased with the release of the DataFrame API. However, I started playing with it, and neither of the two main examples in the documentation work: http://spark.apache.org/docs/1.3.0/sql-programming-guide.html Specfically: - Inferring the Schema Using Reflection - Programmatically Spec

Re: How to use spark-on-k8s pod template?

Re: What benefits do we really get out of colocation?

Re: How to avoid Spark shuffle spill memory?

Re: submit_spark_job_to_YARN

Re: No. of Task vs No. of Executors

Re: Spark performance

Re: spark sql - reading data from sql tables having space in column names

ORCFiles

Re: Spark Release 1.3.0 DataFrame API

Spark Release 1.3.0 DataFrame API

10 matches

Site Navigation

Mail list logo

Footer information