Hi Thanks I tried that. But got this error. Again OOM. I am not sure what to do now. For spark.driver.maxResultSize i kept 2g. Rest I did as mentioned above. 16Gb for driver and 2g for executor. I have 16Gb mac. Please help. I am very delayed on my work because of this and not able to move ahead. My dataset is 56K rows and 8k columns mostly sparse. The column names are though long strings.
----------------------------------------------------- Py4JJavaError Traceback (most recent call last) <ipython-input-8-8c71bfcdfd03> in <module>() ----> 1 recommender_ct.show() /Users/i854319/spark/python/pyspark/sql/dataframe.pyc in show(self, n, truncate) 255 +---+-----+ 256 """ --> 257 print(self._jdf.showString(n, truncate)) 258 259 def __repr__(self): /Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 811 answer = self.gateway_client.send_command(command) 812 return_value = get_return_value( --> 813 answer, self.gateway_client, self.target_id, self.name) 814 815 for temp_arg in temp_args: /Users/i854319/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) 43 def deco(*a, **kw): 44 try: ---> 45 return f(*a, **kw) 46 except py4j.protocol.Py4JJavaError as e: 47 s = e.java_exception.toString() /Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise Py4JJavaError( 307 "An error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise Py4JError( Py4JJavaError: An error occurred while calling o40.showString. : java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at scala.StringContext.standardInterpolator(StringContext.scala:123) at scala.StringContext.s(StringContext.scala:90) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) at org.apache.spark.sql.DataFrame.org $apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498) at org.apache.spark.sql.DataFrame.org $apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456) at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) ᐧ On Wed, Sep 7, 2016 at 10:52 AM, neil90 [via Apache Spark User List] < ml-node+s1001560n27673...@n3.nabble.com> wrote: > If your in local mode just allocate all your memory you want to use to > your Driver(that acts as the executor in local mode) don't even bother > changing the executor memory. So your new settings should look like this... > > spark.driver.memory 16g > spark.driver.maxResultSize 2g > spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value > > You might need to change your spark.driver.maxResultSize settings if you > plan on doing a collect on the entire rdd/dataframe. > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark- > Java-Heap-Error-tp27669p27673.html > To unsubscribe from Spark Java Heap Error, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27669&code=dHIubWFuaXNoQGdtYWlsLmNvbXwyNzY2OXwtNjcyNzMzNjcz> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27686.html Sent from the Apache Spark User List mailing list archive at Nabble.com.