Spark development with IntelliJ

2015-01-08 Thread Jakub Dubovsky
Hi devs,

  I'd like to ask if anybody has experience with using intellij 14 to step 
into spark code. Whatever I try I get compilation error:

Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar

  Project is set up by Patrick's instruction [1] and packaged by mvn -
DskipTests clean install. Compilation works fine. Then I just created 
breakpoint in test code and run debug with the error.

  Thanks for any hints

  Jakub

[1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA


Re: Spark development with IntelliJ

2015-01-08 Thread Petar Zecevic


This helped me:

http://stackoverflow.com/questions/26995023/errorscalac-bad-option-p-intellij-idea


On 8.1.2015. 11:00, Jakub Dubovsky wrote:

Hi devs,

   I'd like to ask if anybody has experience with using intellij 14 to step
into spark code. Whatever I try I get compilation error:

Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar

   Project is set up by Patrick's instruction [1] and packaged by mvn -
DskipTests clean install. Compilation works fine. Then I just created
breakpoint in test code and run debug with the error.

   Thanks for any hints

   Jakub

[1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark development with IntelliJ

2015-01-08 Thread Sean Owen
Yeah, I hit this too. IntelliJ picks this up from the build but then
it can't run its own scalac with this plugin added.

Go to Preferences > Build, Execution, Deployment > Scala Compiler and
clear the "Additional compiler options" field. It will work then
although the option will come back when the project reimports.

Right now I don't know of a better fix.

There's another recent open question about updating IntelliJ docs:
https://issues.apache.org/jira/browse/SPARK-5136  Should this stuff go
in the site docs, or wiki? I vote for wiki I suppose and make the site
docs point to the wiki. I'd be happy to make wiki edits if I can get
permission, or propose this text along with other new text on the
JIRA.

On Thu, Jan 8, 2015 at 10:00 AM, Jakub Dubovsky
 wrote:
> Hi devs,
>
>   I'd like to ask if anybody has experience with using intellij 14 to step
> into spark code. Whatever I try I get compilation error:
>
> Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
> paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar
>
>   Project is set up by Patrick's instruction [1] and packaged by mvn -
> DskipTests clean install. Compilation works fine. Then I just created
> breakpoint in test code and run debug with the error.
>
>   Thanks for any hints
>
>   Jakub
>
> [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
> Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark development with IntelliJ

2015-01-08 Thread Jakub Dubovsky
Thanks that helped.

I vote for wiki as well. More fine graned documentation should be on wiki 
and linked,

Jakub


-- Původní zpráva --
Od: Sean Owen 
Komu: Jakub Dubovsky 
Datum: 8. 1. 2015 11:29:22
Předmět: Re: Spark development with IntelliJ

"Yeah, I hit this too. IntelliJ picks this up from the build but then
it can't run its own scalac with this plugin added.

Go to Preferences > Build, Execution, Deployment > Scala Compiler and
clear the "Additional compiler options" field. It will work then
although the option will come back when the project reimports.

Right now I don't know of a better fix.

There's another recent open question about updating IntelliJ docs:
https://issues.apache.org/jira/browse/SPARK-5136 Should this stuff go
in the site docs, or wiki? I vote for wiki I suppose and make the site
docs point to the wiki. I'd be happy to make wiki edits if I can get
permission, or propose this text along with other new text on the
JIRA.

On Thu, Jan 8, 2015 at 10:00 AM, Jakub Dubovsky
 wrote:
> Hi devs,
>
> I'd like to ask if anybody has experience with using intellij 14 to step
> into spark code. Whatever I try I get compilation error:
>
> Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
> paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar
>
> Project is set up by Patrick's instruction [1] and packaged by mvn -
> DskipTests clean install. Compilation works fine. Then I just created
> breakpoint in test code and run debug with the error.
>
> Thanks for any hints
>
> Jakub
>
> [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
> Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org"

Results of tests

2015-01-08 Thread Tony Reix
Hi,
I'm checking that Spark works fine on a new environment (PPC64 hardware).
I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when 
running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can find 
the results of the tests of Spark, for each version and for the different 
versions, in order to have a reference to compare my results with. I cannot 
find them on Spark web-site.
Thx
Tony



Re: Results of tests

2015-01-08 Thread Ted Yu
Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/

On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix  wrote:

> Hi,
> I'm checking that Spark works fine on a new environment (PPC64 hardware).
> I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when
> running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can
> find the results of the tests of Spark, for each version and for the
> different versions, in order to have a reference to compare my results
> with. I cannot find them on Spark web-site.
> Thx
> Tony
>
>


K-Means And Class Tags

2015-01-08 Thread Devl Devel
Hi All,

I'm trying a simple K-Means example as per the website:

val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble)))

but I'm trying to write a Java based validation method first so that
missing values are omitted or replaced with 0.

public RDD prepareKMeans(JavaRDD data) {
JavaRDD words = data.flatMap(new FlatMapFunction() {
public Iterable call(String s) {
String[] split = s.split(",");
ArrayList add = new ArrayList();
if (split.length != 2) {
add.add(Vectors.dense(0, 0));
} else
{
add.add(Vectors.dense(Double.parseDouble(split[0]),
   Double.parseDouble(split[1])));
}

return add;
}
});

return words.rdd();
}

When I then call from scala:

val parsedData=dc.prepareKMeans(data);
val p=parsedData.collect();

I get Exception in thread "main" java.lang.ClassCastException:
[Ljava.lang.Object; cannot be cast to
[Lorg.apache.spark.mllib.linalg.Vector;

Why is the class tag is object rather than vector?

1) How do I get this working correctly using the Java validation example
above or
2) How can I modify val parsedData = data.map(s =>
Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size <2 I
ignore the line? or
3) Is there a better way to do input validation first?

Using spark and mlib:
libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"

Many thanks in advance
Dev


RE:Results of tests

2015-01-08 Thread Tony Reix
Thanks !

I've been able to see that there are 3745 tests for version 1.2.0 with profile 
Hadoop 2.4  .
However, on my side, the maximum tests I've seen are 3485... About 300 tests 
are missing on my side.
Which Maven option has been used for producing the report file used for 
building the page:
 
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/
  ? (I'm not authorized to look at the "configuration" part)

Thx !

Tony


De : Ted Yu [yuzhih...@gmail.com]
Envoyé : jeudi 8 janvier 2015 16:11
À : Tony Reix
Cc : dev@spark.apache.org
Objet : Re: Results of tests

Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/

On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix 
mailto:tony.r...@bull.net>> wrote:
Hi,
I'm checking that Spark works fine on a new environment (PPC64 hardware).
I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when 
running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can find 
the results of the tests of Spark, for each version and for the different 
versions, in order to have a reference to compare my results with. I cannot 
find them on Spark web-site.
Thx
Tony




Re: K-Means And Class Tags

2015-01-08 Thread Yana Kadiyska
How about

data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]),
Double.parseDouble(good_entry[1]))
​
(full disclosure, I didn't actually run this). But after the first map you
should have an RDD[Array[String]], then you'd discard everything shorter
than 2, and convert the rest to dense vectors?...In fact if you're
expecting length exactly 2 might want to filter ==2...


On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel 
wrote:

> Hi All,
>
> I'm trying a simple K-Means example as per the website:
>
> val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble)))
>
> but I'm trying to write a Java based validation method first so that
> missing values are omitted or replaced with 0.
>
> public RDD prepareKMeans(JavaRDD data) {
> JavaRDD words = data.flatMap(new FlatMapFunction Vector>() {
> public Iterable call(String s) {
> String[] split = s.split(",");
> ArrayList add = new ArrayList();
> if (split.length != 2) {
> add.add(Vectors.dense(0, 0));
> } else
> {
> add.add(Vectors.dense(Double.parseDouble(split[0]),
>Double.parseDouble(split[1])));
> }
>
> return add;
> }
> });
>
> return words.rdd();
> }
>
> When I then call from scala:
>
> val parsedData=dc.prepareKMeans(data);
> val p=parsedData.collect();
>
> I get Exception in thread "main" java.lang.ClassCastException:
> [Ljava.lang.Object; cannot be cast to
> [Lorg.apache.spark.mllib.linalg.Vector;
>
> Why is the class tag is object rather than vector?
>
> 1) How do I get this working correctly using the Java validation example
> above or
> 2) How can I modify val parsedData = data.map(s =>
> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size <2 I
> ignore the line? or
> 3) Is there a better way to do input validation first?
>
> Using spark and mlib:
> libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"
>
> Many thanks in advance
> Dev
>


Re: Registering custom metrics

2015-01-08 Thread Enno Shioji
FYI I found this approach by Ooyala.

/** Instrumentation for Spark based on accumulators.
  *
  * Usage:
  * val instrumentation = new SparkInstrumentation("example.metrics")
  * val numReqs = sc.accumulator(0L)
  * instrumentation.source.registerDailyAccumulator(numReqs, "numReqs")
  * instrumentation.register()
  *
  * Will create and report the following metrics:
  * - Gauge with total number of requests (daily)
  * - Meter with rate of requests
  *
  * @param prefix prefix for all metrics that will be reported by this
Instrumentation
  */

https://gist.github.com/ibuenros/9b94736c2bad2f4b8e23
ᐧ

On Mon, Jan 5, 2015 at 2:56 PM, Enno Shioji  wrote:

> Hi Gerard,
>
> Thanks for the answer! I had a good look at it, but I couldn't figure out
> whether one can use that to emit metrics from your application code.
>
> Suppose I wanted to monitor the rate of bytes I produce, like so:
>
> stream
> .map { input =>
>   val bytes = produce(input)
>   // metricRegistry.meter("some.metrics").mark(bytes.length)
>   bytes
> }
> .saveAsTextFile("text")
>
> Is there a way to achieve this with the MetricSystem?
>
>
> ᐧ
>
> On Mon, Jan 5, 2015 at 10:24 AM, Gerard Maas 
> wrote:
>
>> Hi,
>>
>> Yes, I managed to create a register custom metrics by creating an
>>  implementation  of org.apache.spark.metrics.source.Source and
>> registering it to the metrics subsystem.
>> Source is [Spark] private, so you need to create it under a org.apache.spark
>> package. In my case, I'm dealing with Spark Streaming metrics, and I
>> created my CustomStreamingSource under org.apache.spark.streaming as I
>> also needed access to some [Streaming] private components.
>>
>> Then, you register your new metric Source on the Spark's metric system,
>> like so:
>>
>> SparkEnv.get.metricsSystem.registerSource(customStreamingSource)
>>
>> And it will get reported to the metrics Sync active on your system. By
>> default, you can access them through the metric endpoint:
>> http://:/metrics/json
>>
>> I hope this helps.
>>
>> -kr, Gerard.
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 30, 2014 at 3:32 PM, eshioji  wrote:
>>
>>> Hi,
>>>
>>> Did you find a way to do this / working on this?
>>> Am trying to find a way to do this as well, but haven't been able to
>>> find a
>>> way.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Registering-custom-metrics-tp9030p9968.html
>>> Sent from the Apache Spark Developers List mailing list archive at
>>> Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>


Re: Spark on teradata?

2015-01-08 Thread xhudik
I don't think this makes sense. TD database is standard RDBMS (even parallel)
while Spark is used for non-relational issues. 
What could make sense is to deploy Spark on Teradata Aster. Aster is a
database cluster that might call external programs via STREAM operator. 
That said Spark/Scala app can be can be called and process some data. The
deployment itself should be easy the potential benefit - hard to say...


hope this helps, Tomas



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-teradata-tp10025p10042.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Results of tests

2015-01-08 Thread Ted Yu
Here it is:

[centos] $ 
/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn
-DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4
-Pyarn -Phive clean package


You can find the above in
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull


Cheers


On Thu, Jan 8, 2015 at 8:05 AM, Tony Reix  wrote:

>  Thanks !
>
> I've been able to see that there are 3745 tests for version 1.2.0 with
> profile Hadoop 2.4  .
> However, on my side, the maximum tests I've seen are 3485... About 300
> tests are missing on my side.
> Which Maven option has been used for producing the report file used for
> building the page:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/
>   ? (I'm not authorized to look at the "configuration" part)
>
> Thx !
>
> Tony
>
>  --
> *De :* Ted Yu [yuzhih...@gmail.com]
> *Envoyé :* jeudi 8 janvier 2015 16:11
> *À :* Tony Reix
> *Cc :* dev@spark.apache.org
> *Objet :* Re: Results of tests
>
>   Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/
>
> On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix  wrote:
>
>> Hi,
>> I'm checking that Spark works fine on a new environment (PPC64 hardware).
>> I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when
>> running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can
>> find the results of the tests of Spark, for each version and for the
>> different versions, in order to have a reference to compare my results
>> with. I cannot find them on Spark web-site.
>> Thx
>> Tony
>>
>>
>


Re: Registering custom metrics

2015-01-08 Thread Gerard Maas
Very interesting approach. Thanks for sharing it!

On Thu, Jan 8, 2015 at 5:30 PM, Enno Shioji  wrote:

> FYI I found this approach by Ooyala.
>
> /** Instrumentation for Spark based on accumulators.
>   *
>   * Usage:
>   * val instrumentation = new SparkInstrumentation("example.metrics")
>   * val numReqs = sc.accumulator(0L)
>   * instrumentation.source.registerDailyAccumulator(numReqs, "numReqs")
>   * instrumentation.register()
>   *
>   * Will create and report the following metrics:
>   * - Gauge with total number of requests (daily)
>   * - Meter with rate of requests
>   *
>   * @param prefix prefix for all metrics that will be reported by this 
> Instrumentation
>   */
>
> https://gist.github.com/ibuenros/9b94736c2bad2f4b8e23
> ᐧ
>
> On Mon, Jan 5, 2015 at 2:56 PM, Enno Shioji  wrote:
>
>> Hi Gerard,
>>
>> Thanks for the answer! I had a good look at it, but I couldn't figure out
>> whether one can use that to emit metrics from your application code.
>>
>> Suppose I wanted to monitor the rate of bytes I produce, like so:
>>
>> stream
>> .map { input =>
>>   val bytes = produce(input)
>>   // metricRegistry.meter("some.metrics").mark(bytes.length)
>>   bytes
>> }
>> .saveAsTextFile("text")
>>
>> Is there a way to achieve this with the MetricSystem?
>>
>>
>> ᐧ
>>
>> On Mon, Jan 5, 2015 at 10:24 AM, Gerard Maas 
>> wrote:
>>
>>> Hi,
>>>
>>> Yes, I managed to create a register custom metrics by creating an
>>>  implementation  of org.apache.spark.metrics.source.Source and
>>> registering it to the metrics subsystem.
>>> Source is [Spark] private, so you need to create it under a org.apache.spark
>>> package. In my case, I'm dealing with Spark Streaming metrics, and I
>>> created my CustomStreamingSource under org.apache.spark.streaming as I
>>> also needed access to some [Streaming] private components.
>>>
>>> Then, you register your new metric Source on the Spark's metric system,
>>> like so:
>>>
>>> SparkEnv.get.metricsSystem.registerSource(customStreamingSource)
>>>
>>> And it will get reported to the metrics Sync active on your system. By
>>> default, you can access them through the metric endpoint:
>>> http://:/metrics/json
>>>
>>> I hope this helps.
>>>
>>> -kr, Gerard.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 30, 2014 at 3:32 PM, eshioji  wrote:
>>>
 Hi,

 Did you find a way to do this / working on this?
 Am trying to find a way to do this as well, but haven't been able to
 find a
 way.



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Registering-custom-metrics-tp9030p9968.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


>>>
>>
>


Re: Maintainer for Mesos

2015-01-08 Thread RJ Nowling
Hi Andrew,

Patrick Wendell and Andrew Or have committed previous patches related to
Mesos. Maybe they would be good committers to look at it?

RJ

On Mon, Jan 5, 2015 at 6:40 PM, Andrew Ash  wrote:

> Hi Spark devs,
>
> I'm interested in having a committer look at a PR [1] for Mesos, but
> there's not an entry for Mesos in the maintainers specialties on the wiki
> [2].  Which Spark committers have expertise in the Mesos features?
>
> Thanks!
> Andrew
>
>
> [1] https://github.com/apache/spark/pull/3074
> [2]
>
> https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers
>


Re: Spark on teradata?

2015-01-08 Thread Reynold Xin
Depending on your use cases. If the use case is to extract small amount of
data out of teradata, then you can use the JdbcRDD and soon a jdbc input
source based on the new Spark SQL external data source API.



On Wed, Jan 7, 2015 at 7:14 AM, gen tang  wrote:

> Hi,
>
> I have a stupid question:
> Is it possible to use spark on Teradata data warehouse, please? I read
> some news on internet which say yes. However, I didn't find any example
> about this issue
>
> Thanks in advance.
>
> Cheers
> Gen
>
>


Re: K-Means And Class Tags

2015-01-08 Thread devl.development
Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does going from JavaRDD.rdd() and
then a collect() result in this exception?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-tp10038p10047.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: K-Means And Class Tags

2015-01-08 Thread Devl Devel
Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
collect() result in this exception?

On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska 
wrote:

> How about
>
> data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]),
> Double.parseDouble(good_entry[1]))
> ​
> (full disclosure, I didn't actually run this). But after the first map you
> should have an RDD[Array[String]], then you'd discard everything shorter
> than 2, and convert the rest to dense vectors?...In fact if you're
> expecting length exactly 2 might want to filter ==2...
>
>
> On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel 
> wrote:
>
>> Hi All,
>>
>> I'm trying a simple K-Means example as per the website:
>>
>> val parsedData = data.map(s =>
>> Vectors.dense(s.split(',').map(_.toDouble)))
>>
>> but I'm trying to write a Java based validation method first so that
>> missing values are omitted or replaced with 0.
>>
>> public RDD prepareKMeans(JavaRDD data) {
>> JavaRDD words = data.flatMap(new FlatMapFunction> Vector>() {
>> public Iterable call(String s) {
>> String[] split = s.split(",");
>> ArrayList add = new ArrayList();
>> if (split.length != 2) {
>> add.add(Vectors.dense(0, 0));
>> } else
>> {
>> add.add(Vectors.dense(Double.parseDouble(split[0]),
>>Double.parseDouble(split[1])));
>> }
>>
>> return add;
>> }
>> });
>>
>> return words.rdd();
>> }
>>
>> When I then call from scala:
>>
>> val parsedData=dc.prepareKMeans(data);
>> val p=parsedData.collect();
>>
>> I get Exception in thread "main" java.lang.ClassCastException:
>> [Ljava.lang.Object; cannot be cast to
>> [Lorg.apache.spark.mllib.linalg.Vector;
>>
>> Why is the class tag is object rather than vector?
>>
>> 1) How do I get this working correctly using the Java validation example
>> above or
>> 2) How can I modify val parsedData = data.map(s =>
>> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size <2
>> I
>> ignore the line? or
>> 3) Is there a better way to do input validation first?
>>
>> Using spark and mlib:
>> libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"
>>
>> Many thanks in advance
>> Dev
>>
>
>


Re: K-Means And Class Tags

2015-01-08 Thread Joseph Bradley
I believe you're running into an erasure issue which we found in
DecisionTree too.  Check out:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala#L134

That retags RDDs which were created from Java to prevent the exception
you're running into.

Hope this helps!
Joseph

On Thu, Jan 8, 2015 at 12:48 PM, Devl Devel 
wrote:

> Thanks for the suggestion, can anyone offer any advice on the ClassCast
> Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
> collect() result in this exception?
>
> On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska 
> wrote:
>
> > How about
> >
> >
> data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]),
> > Double.parseDouble(good_entry[1]))
> > ​
> > (full disclosure, I didn't actually run this). But after the first map
> you
> > should have an RDD[Array[String]], then you'd discard everything shorter
> > than 2, and convert the rest to dense vectors?...In fact if you're
> > expecting length exactly 2 might want to filter ==2...
> >
> >
> > On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel 
> > wrote:
> >
> >> Hi All,
> >>
> >> I'm trying a simple K-Means example as per the website:
> >>
> >> val parsedData = data.map(s =>
> >> Vectors.dense(s.split(',').map(_.toDouble)))
> >>
> >> but I'm trying to write a Java based validation method first so that
> >> missing values are omitted or replaced with 0.
> >>
> >> public RDD prepareKMeans(JavaRDD data) {
> >> JavaRDD words = data.flatMap(new FlatMapFunction >> Vector>() {
> >> public Iterable call(String s) {
> >> String[] split = s.split(",");
> >> ArrayList add = new ArrayList();
> >> if (split.length != 2) {
> >> add.add(Vectors.dense(0, 0));
> >> } else
> >> {
> >> add.add(Vectors.dense(Double.parseDouble(split[0]),
> >>Double.parseDouble(split[1])));
> >> }
> >>
> >> return add;
> >> }
> >> });
> >>
> >> return words.rdd();
> >> }
> >>
> >> When I then call from scala:
> >>
> >> val parsedData=dc.prepareKMeans(data);
> >> val p=parsedData.collect();
> >>
> >> I get Exception in thread "main" java.lang.ClassCastException:
> >> [Ljava.lang.Object; cannot be cast to
> >> [Lorg.apache.spark.mllib.linalg.Vector;
> >>
> >> Why is the class tag is object rather than vector?
> >>
> >> 1) How do I get this working correctly using the Java validation example
> >> above or
> >> 2) How can I modify val parsedData = data.map(s =>
> >> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size
> <2
> >> I
> >> ignore the line? or
> >> 3) Is there a better way to do input validation first?
> >>
> >> Using spark and mlib:
> >> libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
> >> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"
> >>
> >> Many thanks in advance
> >> Dev
> >>
> >
> >
>


Re: Spark development with IntelliJ

2015-01-08 Thread Bill Bejeck
I was having the same issue and that helped.  But now I get the following
compilation error when trying to run a test from within Intellij (v 14)

/Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
Error:(308, 109) polymorphic expression cannot be instantiated to expected
type;
 found   : [T(in method
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
method functionToUdfBuilder)]
  implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)

Any thoughts?

^

On Thu, Jan 8, 2015 at 6:33 AM, Jakub Dubovsky <
spark.dubovsky.ja...@seznam.cz> wrote:

> Thanks that helped.
>
> I vote for wiki as well. More fine graned documentation should be on wiki
> and linked,
>
> Jakub
>
>
> -- Původní zpráva --
> Od: Sean Owen 
> Komu: Jakub Dubovsky 
> Datum: 8. 1. 2015 11:29:22
> Předmět: Re: Spark development with IntelliJ
>
> "Yeah, I hit this too. IntelliJ picks this up from the build but then
> it can't run its own scalac with this plugin added.
>
> Go to Preferences > Build, Execution, Deployment > Scala Compiler and
> clear the "Additional compiler options" field. It will work then
> although the option will come back when the project reimports.
>
> Right now I don't know of a better fix.
>
> There's another recent open question about updating IntelliJ docs:
> https://issues.apache.org/jira/browse/SPARK-5136 Should this stuff go
> in the site docs, or wiki? I vote for wiki I suppose and make the site
> docs point to the wiki. I'd be happy to make wiki edits if I can get
> permission, or propose this text along with other new text on the
> JIRA.
>
> On Thu, Jan 8, 2015 at 10:00 AM, Jakub Dubovsky
>  wrote:
> > Hi devs,
> >
> > I'd like to ask if anybody has experience with using intellij 14 to step
> > into spark code. Whatever I try I get compilation error:
> >
> > Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
> > paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar
> >
> > Project is set up by Patrick's instruction [1] and packaged by mvn -
> > DskipTests clean install. Compilation works fine. Then I just created
> > breakpoint in test code and run debug with the error.
> >
> > Thanks for any hints
> >
> > Jakub
> >
> > [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
> > Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org"
>


PR #3872

2015-01-08 Thread Bill Bejeck
Could one of the admins take a look at PR 3872 (JIRA 3299) submitted on 1/1


Re: Spark development with IntelliJ

2015-01-08 Thread Sean Owen
I remember seeing this too, but it seemed to be transient. Try
compiling again. In my case I recall that IJ was still reimporting
some modules when I tried to build. I don't see this error in general.

On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck  wrote:
> I was having the same issue and that helped.  But now I get the following
> compilation error when trying to run a test from within Intellij (v 14)
>
> /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
> Error:(308, 109) polymorphic expression cannot be instantiated to expected
> type;
>  found   : [T(in method
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
>  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
> method functionToUdfBuilder)]
>   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
> ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
>
> Any thoughts?
>
> ^

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark development with IntelliJ

2015-01-08 Thread Nicholas Chammas
Side question: Should this section

in
the wiki link to Useful Developer Tools
?

On Thu Jan 08 2015 at 6:19:55 PM Sean Owen  wrote:

> I remember seeing this too, but it seemed to be transient. Try
> compiling again. In my case I recall that IJ was still reimporting
> some modules when I tried to build. I don't see this error in general.
>
> On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck  wrote:
> > I was having the same issue and that helped.  But now I get the following
> > compilation error when trying to run a test from within Intellij (v 14)
> >
> > /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
> > Error:(308, 109) polymorphic expression cannot be instantiated to
> expected
> > type;
> >  found   : [T(in method
> > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
> apply)]
> >  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
> in
> > method functionToUdfBuilder)]
> >   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
> > ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> >
> > Any thoughts?
> >
> > ^
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Spark development with IntelliJ

2015-01-08 Thread Bill Bejeck
That worked, thx

On Thu, Jan 8, 2015 at 6:17 PM, Sean Owen  wrote:

> I remember seeing this too, but it seemed to be transient. Try
> compiling again. In my case I recall that IJ was still reimporting
> some modules when I tried to build. I don't see this error in general.
>
> On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck  wrote:
> > I was having the same issue and that helped.  But now I get the following
> > compilation error when trying to run a test from within Intellij (v 14)
> >
> >
> /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
> > Error:(308, 109) polymorphic expression cannot be instantiated to
> expected
> > type;
> >  found   : [T(in method
> > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
> apply)]
> >  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
> > method functionToUdfBuilder)]
> >   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
> > ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> >
> > Any thoughts?
> >
> > ^
>


missing document of several messages in actor-based receiver?

2015-01-08 Thread Nan Zhu
Hi, TD and other streaming developers,

When I look at the implementation of actor-based receiver 
(ActorReceiver.scala), I found that there are several messages which are not 
mentioned in the document  

case props: Props =>
val worker = context.actorOf(props)
logInfo("Started receiver worker at:" + worker.path)
sender ! worker

case (props: Props, name: String) =>
val worker = context.actorOf(props, name)
logInfo("Started receiver worker at:" + worker.path)
sender ! worker

case _: PossiblyHarmful => hiccups.incrementAndGet()

case _: Statistics =>
val workers = context.children
sender ! Statistics(n.get, workers.size, hiccups.get, workers.mkString("\n”))

Is it hided with intention or incomplete document, or I missed something?
And the handler of these messages are “buggy"? e.g. when we start a new worker, 
we didn’t increase n (counter of children), and n and hiccups are unnecessarily 
set to AtomicInteger ?

Best,

--  
Nan Zhu
http://codingcat.me



[ANNOUNCE] Apache Science and Healthcare Track @ApacheCon NA 2015

2015-01-08 Thread Lewis John Mcgibbney
Hi Folks,

Apologies for cross posting :(

As some of you may already know, @ApacheCon NA 2015 is happening in Austin,
TX April 13th-16th.

This email is specifically written to attract all folks interested in
Science and Healthcare... this is an official call to arms! I am aware that
there are many Science and Healthcare-type people also lingering in the
Apache Semantic Web communities so this one is for all of you folks as well.

Over a number of years the Science track has been emerging as an attractive
and exciting, at times mind blowing non-traditional track running alongside
the resident HTTP server, Big Data, etc tracks. The Semantic Web Track is
another such emerging track which has proved popular. This year we want to
really get the message out there about how much Apache technology is
actually being used in Science and Healthcare. This is not *only* aimed at
attracting members of the communities below

but also at potentially attracting a brand new breed of conference
participants to ApacheCon  and
the Foundation e.g. Scientists who love Apache. We are looking for
exciting, invigorating, obscure, half-baked, funky, academic, practical and
impractical stories, use cases, experiments and down right successes alike
from within the Science domain. The only thing they need to have in common
is that they consume, contribute towards, advocate, disseminate or even
commercialize Apache technology within the Scientific domain and would be
relevant to that audience. It is fully open to interest whether this track
be combined with the proposed *healthcare track*... if there is interest to
do this then we can rename this track to Science and Healthcare. In essence
one could argue that they are one and the same however I digress [image: :)]

What I would like those of you that are interested to do, is to merely
check out the scope and intent of the Apache in Science content curation
which is currently ongoing and to potentially register your interest.

https://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Apache_in_Science

I would love to see the Science and Healthcare track be THE BIGGEST track
@ApacheCon, and although we have some way to go, I'm sure many previous
track participants will tell you this is not to missed.

We are looking for content from a wide variety of Scientific use cases all
related to Apache technology.
Thanks in advance and I look forward to seeing you in Austin.
Lewis

-- 
*Lewis*


Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Nick - yes. Do you mind moving it? I should have put it in the
"Contributing to Spark" page.

On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas
 wrote:
> Side question: Should this section
> 
> in
> the wiki link to Useful Developer Tools
> ?
>
> On Thu Jan 08 2015 at 6:19:55 PM Sean Owen  wrote:
>
>> I remember seeing this too, but it seemed to be transient. Try
>> compiling again. In my case I recall that IJ was still reimporting
>> some modules when I tried to build. I don't see this error in general.
>>
>> On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck  wrote:
>> > I was having the same issue and that helped.  But now I get the following
>> > compilation error when trying to run a test from within Intellij (v 14)
>> >
>> > /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
>> catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
>> > Error:(308, 109) polymorphic expression cannot be instantiated to
>> expected
>> > type;
>> >  found   : [T(in method
>> > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
>> apply)]
>> >  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
>> in
>> > method functionToUdfBuilder)]
>> >   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
>> > ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
>> >
>> > Any thoughts?
>> >
>> > ^
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Actually I went ahead and did it.

On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell  wrote:
> Nick - yes. Do you mind moving it? I should have put it in the
> "Contributing to Spark" page.
>
> On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas
>  wrote:
>> Side question: Should this section
>> 
>> in
>> the wiki link to Useful Developer Tools
>> ?
>>
>> On Thu Jan 08 2015 at 6:19:55 PM Sean Owen  wrote:
>>
>>> I remember seeing this too, but it seemed to be transient. Try
>>> compiling again. In my case I recall that IJ was still reimporting
>>> some modules when I tried to build. I don't see this error in general.
>>>
>>> On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck  wrote:
>>> > I was having the same issue and that helped.  But now I get the following
>>> > compilation error when trying to run a test from within Intellij (v 14)
>>> >
>>> > /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
>>> catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
>>> > Error:(308, 109) polymorphic expression cannot be instantiated to
>>> expected
>>> > type;
>>> >  found   : [T(in method
>>> > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
>>> apply)]
>>> >  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
>>> in
>>> > method functionToUdfBuilder)]
>>> >   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
>>> > ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
>>> >
>>> > Any thoughts?
>>> >
>>> > ^
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org