Re: Num of executors and cores

2016-07-26 Thread Mail.com
, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > >> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com wrote: >> More of jars and files and app

Re: Num of executors and cores

2016-07-26 Thread Mail.com
com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > >> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com wrote: >> Hi All, >> >> I have a directory which has 12 files. I want to rea

Re: spark context stop vs close

2016-07-25 Thread Mail.com
ow me at https://twitter.com/jaceklaskowski >> >> >>> On Sat, Jul 23, 2016 at 3:11 PM, Mail.com wrote: >>> Hi All, >>> >>> Where should we us spark context stop vs close. Should we stop the context >>> first and then close. >>

Num of executors and cores

2016-07-25 Thread Mail.com
Hi All, I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions). I run spark-submit as --num-executors 12 --executor-cores 1 and numPartitions 12. However, when I run the job I see that the stage which reads the direct

spark context stop vs close

2016-07-23 Thread Mail.com
Hi All, Where should we us spark context stop vs close. Should we stop the context first and then close. Are general guidelines around this. When I stop and later try to close I get RPC already closed error. Thanks, Pradeep --

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Mail.com
Hbase Spark module will be available with Hbase 2.0. Is that out yet? > On Jul 22, 2016, at 8:50 PM, Def_Os wrote: > > So it appears it should be possible to use HBase's new hbase-spark module, if > you follow this pattern: > https://hbase.apache.org/book.html#_sparksql_dataframes > > Unfortuna

Spark Streaming - Direct Approach

2016-07-11 Thread Mail.com
Hi All, Can someone please confirm if streaming direct approach for reading Kafka is still experimental or can it be used for production use. I see the documentation and talk from TD suggesting the advantages of the approach but docs state it is an "experimental" feature. Please suggest Than

Running streaming applications in Production environment

2016-06-14 Thread Mail.com
Hi All, Can you please advise best practices to running streaming jobs in Production that reads from Kafka. How do we trigger them - through a start script and best ways to monitor the application is running and send alert when down etc. Thanks, Pradeep --

Re: Kafka connection logs in Spark

2016-05-26 Thread Mail.com
fka 0.10 > >> On Wed, May 25, 2016 at 9:41 PM, Mail.com wrote: >> Hi All, >> >> I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran >> spark streaming in debug mode, but do not see any log saying it connected to >> Kafka or topic etc

Kafka connection logs in Spark

2016-05-25 Thread Mail.com
Hi All, I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran spark streaming in debug mode, but do not see any log saying it connected to Kafka or topic etc. How could I enable that. My spark streaming job runs but no messages are fetched from the RDD. Please suggest. Th

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com
Yes. Sent from my iPhone > On May 20, 2016, at 10:11 AM, Sahil Sareen wrote: > > I'm not sure if this happens on small files or big ones as I have a mix of > them always. > Did you see this only for big files? > >> On Fri, May 20, 2016 at 7:36 PM, Mail.com wrot

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com
Hi Sahil, I have seen this with high GC time. Do you ever get this error with small volume files Pradeep > On May 20, 2016, at 9:32 AM, Sahil Sareen wrote: > > Hey all > > I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my > application performance due to these erro

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-19 Thread Mail.com
opic name, KafkaUtils doesn't > fetch any messages. So, check you have specified the topic name correctly. > > ~Muthu > ____ > From: Mail.com [pradeep.mi...@mail.com] > Sent: Monday, May 16, 2016 9:33 PM > To: Ramaswamy, Muthuraman

Re: Filter out the elements from xml file in Spark

2016-05-19 Thread Mail.com
Hi Yogesh, Can you try map operation and get what you need. Whatever parser you are using. You could also look at spark-XML package . Thanks, Pradeep > On May 19, 2016, at 4:39 AM, Yogesh Vyas wrote: > > Hi, > I had xml files which I am reading through textFileStream, and then > filtering out

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-18 Thread Mail.com
Adding back users. > On May 18, 2016, at 11:49 AM, Mail.com wrote: > > Hi Uladzimir, > > I run is as below. > > Spark-submit --class com.test --num-executors 4 --executor-cores 5 --queue > Dev --master yarn-client --driver-memory 512M --executor-memory 512M test.ja

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-16 Thread Mail.com
Hi Muthu, Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for simple string messages. Console producer and consumer work fine. But spark always reruns empty RDD. I am using Receiver based Approach. Thanks, Pradeep > On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman > w

Re: Executors and Cores

2016-05-15 Thread Mail.com
rdpress.com > > >> On 15 May 2016 at 13:19, Mail.com wrote: >> Hi , >> >> I have seen multiple videos on spark tuning which shows how to determine # >> cores, #executors and memory size of the job. >> >> In all that I have seen, it seems each job h

Executors and Cores

2016-05-15 Thread Mail.com
Hi , I have seen multiple videos on spark tuning which shows how to determine # cores, #executors and memory size of the job. In all that I have seen, it seems each job has to be given the max resources allowed in the cluster. How do we factor in input size as well? I am processing a 1gb compr

Spark 1.4.1 + Kafka 0.8.2 with Kerberos

2016-05-13 Thread Mail.com
Hi All, I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled cluster. HDP 2.3.2 Is there any document I can refer to. Thanks, Pradeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: XML Processing using Spark SQL

2016-05-12 Thread Mail.com
Hi Arun, Could you try using Stax or JaxB. Thanks, Pradeep > On May 12, 2016, at 8:35 PM, Hyukjin Kwon wrote: > > Hi Arunkumar, > > > I guess your records are self-closing ones. > > There is an issue open here, https://github.com/databricks/spark-xml/issues/92 > > This is about XmlInputFor

Re: Spark-csv- partitionBy

2016-05-10 Thread Mail.com
use coalesce(1) (the spelling is wrong) and after that > do the partitions. > > Regards, > Gourav > >> On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote: >> Hi, >> >> I have to write tab delimited file and need to have one directory for each >> unique value

Spark-csv- partitionBy

2016-05-09 Thread Mail.com
Hi, I have to write tab delimited file and need to have one directory for each unique value of a column. I tried using spark-csv with partitionBy and seems it is not supported. Is there any other option available for doing this? Regards, Pradeep

Re: Error in spark-xml

2016-05-01 Thread Mail.com
Can you try once by creating your own schema file and using it to read the XML. I had similar issue but got that resolved by custom schema and by specifying each attribute in that. Pradeep > On May 1, 2016, at 9:45 AM, Hyukjin Kwon wrote: > > To be more clear, > > If you set the rowTag as "

Re: JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
is designed for that. > > - Harjit >> On Apr 26, 2016, at 7:19 PM, Mail.com wrote: >> >> >> Hi All, >> I am reading entire directory of gz XML files with wholeTextFiles. >> >> I understand as it is gz and with wholeTextFiles the individual files are

JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
Hi All, I am reading entire directory of gz XML files with wholeTextFiles. I understand as it is gz and with wholeTextFiles the individual files are not splittable but why the entire directory is read by one executor, single task. I have provided number of executors as number of files in that

Create tab separated file from a dataframe spark 1.4 with Java

2016-04-21 Thread Mail.com
> Hi I have a dataframe and need to write to a tab separated file using spark 1.4 and Java. Can some one please suggest. Thanks, Pradeep

Re: spark on yarn

2016-04-20 Thread Mail.com
I get an error with a message that state what is max number of cores allowed. > On Apr 20, 2016, at 11:21 AM, Shushant Arora > wrote: > > I am running a spark application on yarn cluster. > > say I have available vcors in cluster as 100.And I start spark application > with --num-executors 20

Re: Parse XML using java spark

2016-04-18 Thread Mail.com
You might look at using JaxB or Stax. If it is simple enough use data frames auto generated scheme. Pradeep > On Apr 18, 2016, at 6:37 PM, Jinan Alhajjaj wrote: > > Thank you for your help. > I would like to parse the XML file using Java not scala . Can you please > provide me with exsample