PySpark API signature mismatch compared to Scala

2016-03-10 Thread Li Ming Tsai
Hi, Looking at 1.6.0, I see at least the following mismatch: 1. DStream mapWithState, thought declared experimental is not found in Python 2. Dstream updateStateByKey, missing initialRDD, partitioner Is Python API falling behind? Thanks! -

[Streaming] Batch interval and bulk export

2016-03-09 Thread Li Ming Tsai
Hi, I am doing a few basic operation like map -> reduceByKey -> filter, which is very similar to world count and I'm saving the result where the count > threshold. Currently the batch window is every 10s, but I would like to save the results to redshift at a lower frequency instead of every

[Kinesis] multiple KinesisRecordProcessor threads.

2016-03-04 Thread Li Ming Tsai
Hi, @chris @tdas Referring to the latest integration documentation, it states the following: "A single Kinesis input DStream can read from multiple shards of a Kinesis stream by creating multiple KinesisRecordProcessor threads." But looking at the API and the example, each time we call Ki

Re: off-heap certain operations

2016-02-16 Thread Li Ming Tsai
Hi Sean, > Personally, I would leave this off. Is this not production ready, thus we should disable it? Thanks, Liming From: Sean Owen Sent: Saturday, February 13, 2016 2:18 AM To: Ovidiu-Cristian MARCU Cc: Ted Yu; Sea; user@spark.apache.org Subject: R

Turning on logging for internal Spark logs

2016-02-09 Thread Li Ming Tsai
Hi, I have the default conf/log4j.properties: log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/

Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-09 Thread Li Ming Tsai
Liming From: Li Ming Tsai Sent: Sunday, February 7, 2016 10:03 AM To: user@spark.apache.org Subject: Re: Slowness in Kmeans calculating fastSquaredDistance Hi, I did more investigation and found out that BLAS.scala is calling the native reference architecture (f

Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-06 Thread Li Ming Tsai
/mllib/linalg/BLAS.scala#L126 private def dot(x: DenseVector, y: DenseVector): Double = { val n = x.size f2jBLAS.ddot(n, x.values, 1, y.values, 1) } Maybe Xiangrui can comment on this? From: Li Ming Tsai Sent: Friday, February 5, 2016 10:56 AM To

Add Singapore meetup

2016-02-04 Thread Li Ming Tsai
Hi, Realised that Singapore has not been added. Please add http://www.meetup.com/Spark-Singapore/ Thanks!

Slowness in Kmeans calculating fastSquaredDistance

2016-02-04 Thread Li Ming Tsai
Hi, I'm using INTEL MKL on Spark 1.6.0 which I built myself with the -Pnetlib-lgpl flag. I am using spark local[4] mode and I run it like this: # export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64 # bin/spark-shell ... I have also added the following to /opt/intel/mkl/li