I am experiencing significant logging spam when running PySpark in IPython
Notebok
Exhibit A: http://i.imgur.com/BDP0R2U.png
I have taken into consideration advice from:
http://apache-spark-user-list.1001560.n3.nabble.com/Disable-all-spark-logging-td1960.html
also
http://stackoverflow.com/ques
to rolling file, or removed it entirely, all with the same
results.
On Wed, Oct 1, 2014 at 1:49 PM, Davies Liu wrote:
> On Tue, Sep 30, 2014 at 10:14 PM, Rick Richardson
> wrote:
> > I am experiencing significant logging spam when running PySpark in
> IPython
> > Note
gt; If you want to reduce the logging in console, you should change
> /opt/spark-1.1.0/conf/log4j.properties
>
> log4j.rootCategory=WARN, console
> og4j.logger.org.apache.spark=WARN
>
>
> On Wed, Oct 1, 2014 at 11:49 AM, Rick Richardson
> wrote:
> > Thanks for your reply. Un
ory 1g --executor-cores 1" ipython notebook
--profile=pyspark
On Wed, Oct 1, 2014 at 3:41 PM, Rick Richardson
wrote:
> I was starting PySpark as a profile within IPython Notebook as per:
>
> http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/
>
>
Out of curiosity, how do you actually launch pyspark in your set-up?
On Wed, Oct 1, 2014 at 3:44 PM, Rick Richardson
wrote:
> Here is the other relevant bit of my set-up:
> MASTER=spark://sparkmaster:7077
> IPYTHON_OPTS="notebook --pylab inline --ip=0.0.0.0"
> CASSA
all of the jars from the classpath and it began to use the
SPARK_HOME/conf/log4j.properties
On Wed, Oct 1, 2014 at 3:46 PM, Rick Richardson
wrote:
> Out of curiosity, how do you actually launch pyspark in your set-up?
>
> On Wed, Oct 1, 2014 at 3:44 PM, Rick Richardson > wrote:
>
Spark's API definitely covers all of the things that a relational database
can do. It will probably outperform a relational star schema if all of your
*working* data set can fit into RAM on your cluster. It will still perform
quite well if most of the data fits and some has to spill over to disk.
t;>
>> For the moment, it looks like we should store these events in SQL. When
>> appropriate, we will do analysis with relational queries. Or, when
>> appropriate we will extract data into working sets in Spark.
>>
>> I imagine this is a pretty common use case for S