BlockManager WARNINGS and ERRORS

2016-03-27 Thread salexln
HI all,

I started testing my code (https://github.com/salexln/FinalProject_FCM) 
with the latest Spark available in GitHub, 
and when I run it I get the following errors:

*scala> val clusters = FuzzyCMeans.train(parsedData, 2, 20, 2.0)*

16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
32:
[rdd_8_0, rdd_35_0]
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
35:
[rdd_8_0, rdd_35_0]
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
38:
[rdd_8_0, rdd_35_0]
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
41:
[rdd_8_0, rdd_35_0]
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
44:
[rdd_8_0, rdd_35_0]
16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
machine; not re-adding it
16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
machine; not re-adding it

I did not get these previously, is it something new?





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BlockManager-WARNINGS-and-ERRORS-tp16878.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: BlockManager WARNINGS and ERRORS

2016-03-27 Thread Ted Yu
The warning was added by:

SPARK-12757 Add block-level read/write locks to BlockManager

On Sun, Mar 27, 2016 at 12:24 PM, salexln  wrote:

> HI all,
>
> I started testing my code (https://github.com/salexln/FinalProject_FCM)
> with the latest Spark available in GitHub,
> and when I run it I get the following errors:
>
> *scala> val clusters = FuzzyCMeans.train(parsedData, 2, 20, 2.0)*
>
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
> 32:
> [rdd_8_0, rdd_35_0]
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
> 35:
> [rdd_8_0, rdd_35_0]
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
> 38:
> [rdd_8_0, rdd_35_0]
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
> 41:
> [rdd_8_0, rdd_35_0]
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 ERROR Executor: 2 block locks were not released by TID =
> 44:
> [rdd_8_0, rdd_35_0]
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_8_0 already exists on this
> machine; not re-adding it
> 16/03/27 22:24:10 WARN BlockManager: Block rdd_35_0 already exists on this
> machine; not re-adding it
>
> I did not get these previously, is it something new?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/BlockManager-WARNINGS-and-ERRORS-tp16878.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-27 Thread Nicholas Chammas
Pingity-ping-pong since this is still a problem.

On Thu, Mar 24, 2016 at 4:08 PM Michael Armbrust 
wrote:

> Patrick is investigating.
>
> On Thu, Mar 24, 2016 at 7:25 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Just checking in on this again as the builds on S3 are still broken. :/
>>
>> Could it have something to do with us moving release-build.sh
>> 
>> ?
>> ​
>>
>> On Mon, Mar 21, 2016 at 1:43 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Is someone going to retry fixing these packages? It's still a problem.
>>>
>>> Also, it would be good to understand why this is happening.
>>>
>>> On Fri, Mar 18, 2016 at 6:49 PM Jakob Odersky  wrote:
>>>
 I just realized you're using a different download site. Sorry for the
 confusion, the link I get for a direct download of Spark 1.6.1 /
 Hadoop 2.6 is
 http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz

 On Fri, Mar 18, 2016 at 3:20 PM, Nicholas Chammas
  wrote:
 > I just retried the Spark 1.6.1 / Hadoop 2.6 download and got a
 corrupt ZIP
 > file.
 >
 > Jakob, are you sure the ZIP unpacks correctly for you? Is it the same
 Spark
 > 1.6.1/Hadoop 2.6 package you had a success with?
 >
 > On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersky 
 wrote:
 >>
 >> I just experienced the issue, however retrying the download a second
 >> time worked. Could it be that there is some load balancer/cache in
 >> front of the archive and some nodes still serve the corrupt packages?
 >>
 >> On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas
 >>  wrote:
 >> > I'm seeing the same. :(
 >> >
 >> > On Fri, Mar 18, 2016 at 10:57 AM Ted Yu 
 wrote:
 >> >>
 >> >> I tried again this morning :
 >> >>
 >> >> $ wget
 >> >>
 >> >>
 https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
 >> >> --2016-03-18 07:55:30--
 >> >>
 >> >>
 https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
 >> >> Resolving s3.amazonaws.com... 54.231.19.163
 >> >> ...
 >> >> $ tar zxf spark-1.6.1-bin-hadoop2.6.tgz
 >> >>
 >> >> gzip: stdin: unexpected end of file
 >> >> tar: Unexpected EOF in archive
 >> >> tar: Unexpected EOF in archive
 >> >> tar: Error is not recoverable: exiting now
 >> >>
 >> >> On Thu, Mar 17, 2016 at 8:57 AM, Michael Armbrust
 >> >> 
 >> >> wrote:
 >> >>>
 >> >>> Patrick reuploaded the artifacts, so it should be fixed now.
 >> >>>
 >> >>> On Mar 16, 2016 5:48 PM, "Nicholas Chammas"
 >> >>> 
 >> >>> wrote:
 >> 
 >>  Looks like the other packages may also be corrupt. I’m getting
 the
 >>  same
 >>  error for the Spark 1.6.1 / Hadoop 2.4 package.
 >> 
 >> 
 >> 
 >> 
 https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz
 >> 
 >>  Nick
 >> 
 >> 
 >>  On Wed, Mar 16, 2016 at 8:28 PM Ted Yu 
 wrote:
 >> >
 >> > On Linux, I got:
 >> >
 >> > $ tar zxf spark-1.6.1-bin-hadoop2.6.tgz
 >> >
 >> > gzip: stdin: unexpected end of file
 >> > tar: Unexpected EOF in archive
 >> > tar: Unexpected EOF in archive
 >> > tar: Error is not recoverable: exiting now
 >> >
 >> > On Wed, Mar 16, 2016 at 5:15 PM, Nicholas Chammas
 >> >  wrote:
 >> >>
 >> >>
 >> >>
 >> >>
 https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
 >> >>
 >> >> Does anyone else have trouble unzipping this? How did this
 happen?
 >> >>
 >> >> What I get is:
 >> >>
 >> >> $ gzip -t spark-1.6.1-bin-hadoop2.6.tgz
 >> >> gzip: spark-1.6.1-bin-hadoop2.6.tgz: unexpected end of file
 >> >> gzip: spark-1.6.1-bin-hadoop2.6.tgz: uncompress failed
 >> >>
 >> >> Seems like a strange type of problem to come across.
 >> >>
 >> >> Nick
 >> >
 >> >
 >> >>
 >> >

>>>
>


Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-27 Thread Maciej Szymkiewicz
Hi Jacek,

In this context, don't you think it would be useful, if at least some
traits from org.apache.spark.ml.param.shared.sharedParams were
public?HasInputCol(s) and HasOutputCol for example. These are useful
pretty much every time you create custom Transformer. 

-- 
Pozdrawiam,
Maciej Szymkiewicz


On 03/26/2016 10:26 AM, Jacek Laskowski wrote:
> Hi Joseph,
>
> Thanks for the response. I'm one who doesn't understand all the
> hype/need for Machine Learning...yet and through Spark ML(lib) glasses
> I'm looking at ML space. In the meantime I've got few assignments (in
> a project with Spark and Scala) that have required quite extensive
> dataset manipulation.
>
> It was when I sinked into using DataFrame/Dataset for data
> manipulation not RDD (I remember talking to Brian about how RDD is an
> "assembly" language comparing to the higher-level concept of
> DataFrames with Catalysts and other optimizations). After few days
> with DataFrame I learnt he was so right! (sorry Brian, it took me
> longer to understand your point).
>
> I started using DataFrames in far too many places than one could ever
> accept :-) I was so...carried away with DataFrames (esp. show vs
> foreach(println) and UDFs via udf() function)
>
> And then, when I moved to Pipeline API and discovered Transformers.
> And PipelineStage that can create pipelines of DataFrame manipulation.
> They read so well that I'm pretty sure people would love using them
> more often, but...they belong to MLlib so they are part of ML space
> (not many devs tackled yet). I applied the approach to using
> withColumn to have better debugging experience (if I ever need it). I
> learnt it after having watched your presentation about Pipeline API.
> It was so helpful in my RDD/DataFrame space.
>
> So, to promote a more extensive use of Pipelines, PipelineStages, and
> Transformers, I was thinking about moving that part to SQL/DataFrame
> API where they really belong. If not, I think people might miss the
> beauty of the very fine and so helpful Transformers.
>
> Transformers are *not* a ML thing -- they are DataFrame thing and
> should be where they really belong (for their greater adoption).
>
> What do you think?
>
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Mar 26, 2016 at 3:23 AM, Joseph Bradley  wrote:
>> There have been some comments about using Pipelines outside of ML, but I
>> have not yet seen a real need for it.  If a user does want to use Pipelines
>> for non-ML tasks, they still can use Transformers + PipelineModels.  Will
>> that work?
>>
>> On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski  wrote:
>>> Hi,
>>>
>>> After few weeks with spark.ml now, I came to conclusion that
>>> Transformer concept from Pipeline API (spark.ml/MLlib) should be part
>>> of DataFrame (SQL) where they fit better. Are there any plans to
>>> migrate Transformer API (ML) to DataFrame (SQL)?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>




signature.asc
Description: OpenPGP digital signature


Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-27 Thread Jacek Laskowski
Hi,

Never develop any custom Transformer (or UnaryTransformer in particular),
but I'd be for it if that's the case.

Jacek
28.03.2016 6:54 AM "Maciej Szymkiewicz"  napisał(a):

> Hi Jacek,
>
> In this context, don't you think it would be useful, if at least some
> traits from org.apache.spark.ml.param.shared.sharedParams were
> public?HasInputCol(s) and HasOutputCol for example. These are useful
> pretty much every time you create custom Transformer.
>
> --
> Pozdrawiam,
> Maciej Szymkiewicz
>
>
> On 03/26/2016 10:26 AM, Jacek Laskowski wrote:
> > Hi Joseph,
> >
> > Thanks for the response. I'm one who doesn't understand all the
> > hype/need for Machine Learning...yet and through Spark ML(lib) glasses
> > I'm looking at ML space. In the meantime I've got few assignments (in
> > a project with Spark and Scala) that have required quite extensive
> > dataset manipulation.
> >
> > It was when I sinked into using DataFrame/Dataset for data
> > manipulation not RDD (I remember talking to Brian about how RDD is an
> > "assembly" language comparing to the higher-level concept of
> > DataFrames with Catalysts and other optimizations). After few days
> > with DataFrame I learnt he was so right! (sorry Brian, it took me
> > longer to understand your point).
> >
> > I started using DataFrames in far too many places than one could ever
> > accept :-) I was so...carried away with DataFrames (esp. show vs
> > foreach(println) and UDFs via udf() function)
> >
> > And then, when I moved to Pipeline API and discovered Transformers.
> > And PipelineStage that can create pipelines of DataFrame manipulation.
> > They read so well that I'm pretty sure people would love using them
> > more often, but...they belong to MLlib so they are part of ML space
> > (not many devs tackled yet). I applied the approach to using
> > withColumn to have better debugging experience (if I ever need it). I
> > learnt it after having watched your presentation about Pipeline API.
> > It was so helpful in my RDD/DataFrame space.
> >
> > So, to promote a more extensive use of Pipelines, PipelineStages, and
> > Transformers, I was thinking about moving that part to SQL/DataFrame
> > API where they really belong. If not, I think people might miss the
> > beauty of the very fine and so helpful Transformers.
> >
> > Transformers are *not* a ML thing -- they are DataFrame thing and
> > should be where they really belong (for their greater adoption).
> >
> > What do you think?
> >
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Sat, Mar 26, 2016 at 3:23 AM, Joseph Bradley 
> wrote:
> >> There have been some comments about using Pipelines outside of ML, but I
> >> have not yet seen a real need for it.  If a user does want to use
> Pipelines
> >> for non-ML tasks, they still can use Transformers + PipelineModels.
> Will
> >> that work?
> >>
> >> On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski 
> wrote:
> >>> Hi,
> >>>
> >>> After few weeks with spark.ml now, I came to conclusion that
> >>> Transformer concept from Pipeline API (spark.ml/MLlib) should be part
> >>> of DataFrame (SQL) where they fit better. Are there any plans to
> >>> migrate Transformer API (ML) to DataFrame (SQL)?
> >>>
> >>> Pozdrawiam,
> >>> Jacek Laskowski
> >>> 
> >>> https://medium.com/@jaceklaskowski/
> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> Follow me at https://twitter.com/jaceklaskowski
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
>
>