date:20141102

Re: Submiting Spark application through code

2014-11-02 Thread Marius Soutier

Just a wild guess, but I had to exclude “javax.servlet.servlet-api” from my Hadoop dependencies to run a SparkContext. In your build.sbt: "org.apache.hadoop" % "hadoop-common" % “..." exclude("javax.servlet", "servlet-api"), "org.apache.hadoop" % "hadoop-hdfs" % “..." exclude("javax.servlet",

Spark on Yarn probably trying to load all the data to RAM

2014-11-02 Thread jan.zikes

Hi, I am using Spark on Yarn, particularly Spark in Python. I am trying to run: myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json") myrdd.getNumPartitions() Unfortunately it seems that Spark tries to load everything to RAM, or at least after while of running this everything slows down and t

Re: Spark speed performance

2014-11-02 Thread jan.zikes

Thank you, I would expect it to work as you write, but I am probably experiencing it working other way. But now it seems that Spark is generally trying to fit everything to RAM. I run Spark on YARN and I have wraped this to another question: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL : how to find element where a field is in a given set

2014-11-02 Thread Rishi Yadav

did you create SQLContext? On Sat, Nov 1, 2014 at 7:51 PM, abhinav chowdary wrote: > I have same requirement of passing list of values to in clause, when i am > trying to do > > i am getting below error > > scala> val longList = Seq[Expression]("a", "b") > :11: error: type mismatch; > found :

Re: OOM with groupBy + saveAsTextFile

2014-11-02 Thread Bharath Ravi Kumar

Thanks for responding. This is what I initially suspected, and hence asked why the library needed to construct the entire value buffer on a single host before writing it out. The stacktrace appeared to suggest that user code is not constructing the large buffer. I'm simply calling groupBy and saveA

Re: properties file on a spark cluster

2014-11-02 Thread Akhil Das

The problem here is, when you run a spark program in cluster mode, it will look for the file in the worker machine. Best approach would be to put the file in hdfs and use it instead of local path. Another approach would be to create the same file in the same path on all worker machines and hopefull

Re: ExecutorLostFailure (executor lost)

2014-11-02 Thread Akhil Das

You can check in the worker logs for more accurate information(that are found under the work directory inside spark directory). I used to hit this issue with: - Too many open files : Increasing the ulimit would solve this issue - Akka connection timeout/Framesize: Setting the following while creat

Re: Cannot instantiate hive context

2014-11-02 Thread Akhil Das

Adding the libthrift jar in the class path would resolve this issue. Thanks Best Regards On Sat, Nov 1, 2014 at 12:34 AM, Pala M Muthaia wrote: > Hi, > > I am trying to load hive datasets using HiveContext, in spark shell. Sp

Re: hadoop_conf_dir when running spark on yarn

2014-11-02 Thread Akhil Das

You can set HADOOP_CONF_DIR inside the spark-env.sh file Thanks Best Regards On Sat, Nov 1, 2014 at 4:14 AM, ameyc wrote: > How do i setup hadoop_conf_dir correctly when I'm running my spark job on > Yarn? My Yarn environment has the correct hadoop_conf_dir settings by the > configuration that

Re: OOM with groupBy + saveAsTextFile

2014-11-02 Thread Sean Owen

saveAsText means "save every element of the RDD as one line of text". It works like TextOutputFormat in Hadoop MapReduce since that's what it uses. So you are causing it to create one big string out of each Iterable this way. On Sun, Nov 2, 2014 at 4:48 PM, Bharath Ravi Kumar wrote: > Thanks for

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-11-02 Thread ashu

Hi, Sorry to bounce back the old thread. What is the state now? Is this problem solved. How spark handle categorical data now? Regards, Ashutosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apac

Re: Prediction using Classification with text attributes in Apache Spark MLLib

2014-11-02 Thread Xiangrui Meng

This operation requires two transformers: 1) Indexer, which maps string features into categorical features 2) OneHotEncoder, which flatten categorical features into binary features We are working on the new dataset implementation, so we can easily express those transformations. Sorry for late! If

Spark Master Web UI showing "0 cores" in Completed Applications

2014-11-02 Thread Justin Yip

Hello, I have a question about the "Completed Applications" table on the Spark Master web UI page. For the column "Cores", it used to show the number of cores used in the application. However, after I added a line "sparkContext.stop()" as the end my spark app, it shows "0 cores". My application

How do I kill av job submitted with spark-submit

2014-11-02 Thread Steve Lewis

I see the job in the web interface but don't know how to kill it

Re: hadoop_conf_dir when running spark on yarn

2014-11-02 Thread Amey Chaugule

I thought that only applied when you're trying to run a job using spark-submit or in the shell... On Sun, Nov 2, 2014 at 8:47 AM, Akhil Das wrote: > You can set HADOOP_CONF_DIR inside the spark-env.sh file > > Thanks > Best Regards > > On Sat, Nov 1, 2014 at 4:14 AM, ameyc wrote: > >> How do i

Do Spark executors restrict native heap vs JVM heap?

2014-11-02 Thread Paul Wais

Thanks Sean! My novice understanding is that the 'native heap' is the address space not allocated to the JVM heap, but I wanted to check to see if I'm missing something. I found out my issue appeared to be actual memory pressure on the executor machine. There was space for the JVM heap but not mu

Spark SQL takes unexpected time

2014-11-02 Thread Shailesh Birari

Hello, I have written an Spark SQL application which reads data from HDFS and query on it. The data size is around 2GB (30 million records). The schema and query I am running is as below. The query takes around 05+ seconds to execute. I tried by adding rdd.persist(StorageLevel.MEMORY_AND

Re: Does SparkSQL work with custom defined SerDe?

2014-11-02 Thread Chirag Aggarwal

Did https://issues.apache.org/jira/browse/SPARK-3807 fix the issue seen by you? If yes, then please note that it shall be part of 1.1.1 and 1.2 Chirag From: Chen Song mailto:chen.song...@gmail.com>> Date: Wednesday, 15 October 2014 4:03 AM To: "user@spark.apache.org

Spark cluster stability

2014-11-02 Thread jatinpreet

Hi, I am running a small 6 node spark cluster for testing purposes. Recently, one of the node's physical memory was filled up by temporary files and there was no space left on the disk. Due to this my Spark jobs started failing even though on the Spark Web UI the was shown 'Alive'. Once I logged o

Re: Do Spark executors restrict native heap vs JVM heap?

2014-11-02 Thread Sean Owen

Yes, that's correct to my understanding and the probable explanation of your issue. There are no additional limits or differences from how the JVM works here. On Nov 3, 2014 4:40 AM, "Paul Wais" wrote: > Thanks Sean! My novice understanding is that the 'native heap' is the > address space not all

Re: Spark cluster stability

2014-11-02 Thread Akhil Das

You can enable monitoring (nagios) with alerts to tackle these kind of issues. Thanks Best Regards On Mon, Nov 3, 2014 at 10:55 AM, jatinpreet wrote: > Hi, > > I am running a small 6 node spark cluster for testing purposes. Recently, > one of the node's physical memory was filled up by temporar

Parquet files are only 6-20MB in size?

2014-11-02 Thread ag007

Hi there, I have a pySpark job that is simply taking a tab separated CSV outputting it to a Parquet file. The code is based on the SQL write parquet example. (Using a different inferred schema, only 35 columns). The input files range from 100MB to 12 Gb. I have tried different different block s

graph x extracting the path

2014-11-02 Thread dizzy5112

Hi all, just wondering if there was a way to extract paths in graphx. For example if i have the graph attached i would like to return the results along the lines of : 101 -> 103 101 ->104 ->108 102 ->105 102 ->106->107

Re: Submiting Spark application through code

Spark on Yarn probably trying to load all the data to RAM

Re: Spark speed performance

Re: Spark SQL : how to find element where a field is in a given set

Re: OOM with groupBy + saveAsTextFile

Re: properties file on a spark cluster

Re: ExecutorLostFailure (executor lost)

Re: Cannot instantiate hive context

Re: hadoop_conf_dir when running spark on yarn

Re: OOM with groupBy + saveAsTextFile

RE: Prediction using Classification with text attributes in Apache Spark MLLib

Re: Prediction using Classification with text attributes in Apache Spark MLLib

Spark Master Web UI showing "0 cores" in Completed Applications

How do I kill av job submitted with spark-submit

Re: hadoop_conf_dir when running spark on yarn

Do Spark executors restrict native heap vs JVM heap?

Spark SQL takes unexpected time

Re: Does SparkSQL work with custom defined SerDe?

Spark cluster stability

Re: Do Spark executors restrict native heap vs JVM heap?

Re: Spark cluster stability

Parquet files are only 6-20MB in size?

graph x extracting the path

23 matches

Site Navigation

Mail list logo

Footer information