Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
1", "in2", "in3")) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev

Re: Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or wrote: > Hi Konstatin, > > We use hadoop as a library in a few places in Spark. I wonder why the path > includes "null" though. > > Could you provide the full stack trace? > > Andrew > >

NullPointerException on ExternalAppendOnlyMap

2014-07-02 Thread Konstantin Kudryavtsev
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Do you have any idea what is it? how can I debug this issue or perhaps access another log? Thank you, Konstantin Kudryavtsev

Re: Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty("hadoop.home.dir", "d:\\winutil\\") after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee wrote:

Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-04 Thread Konstantin Kudryavtsev
ion's main class (required) ...bla-bla-bla any ideas? how can I make it works? Thank you, Konstantin Kudryavtsev

Spark 1.0 failed on HDP 2.0 with absurd exception

2014-07-05 Thread Konstantin Kudryavtsev
ers (Default: 1) --worker-memory MEM Memory per Worker (e.g. 1000M, 2G) (Default: 1G) Seems like the old spark notation any ideas? Thank you, Konstantin Kudryavtsev

[no subject]

2014-07-05 Thread Konstantin Kudryavtsev
and how can it be fixed? Thank you, Konstantin Kudryavtsev

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
Hello, thanks for your message... I'm confused, Hortonworhs suggest install spark rpm on each node, but on Spark main page said that yarn enough and I don't need to install it... What the difference? sent from my HTC On Jul 6, 2014 8:34 PM, "vs" wrote: > Konstantin, > > HWRK provides a Tech Prev

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
with Hadoop Spark can run on Hadoop 2's YARN cluster manager, and can read any existing Hadoop data. If you have a Hadoop 2 cluster, you can run Spark without any installation needed. " And this is confusing for me... do I need rpm installation on not?... Thank you, Konstantin Kudry

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
rk on yarn (do I need RPMs installations or only build spark on edge node?) Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 4:34 AM, Robert James wrote: > I can say from my experience that getting Spark to work with Hadoop 2 > is not for the beginner; after solving one probl

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar wrote: > Konstantin, > >1. You need to install the hadoop rpms on all nodes. If it is Hadoop >2, the nodes would have hdfs & YARN. >2. Then you need to install Spark on all nodes. I hav

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
Hi Chester, Thank you very much, it is clear now - just two different way to support spark on acluster Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 3:22 PM, Chester @work wrote: > In Yarn cluster mode, you can either have spark on all the cluster nodes > or supply the spa

Control number of tasks per stage

2014-07-07 Thread Konstantin Kudryavtsev
Hi all, is it any way to control the number tasks per stage? currently I see situation when only 2 tasks are created per stage and each of them is very slow, at the same time cluster has a huge number of unused nodes Thank you, Konstantin Kudryavtsev

java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Konstantin Kudryavtsev
using Spark 1.0In map I create new object each time, as I understand I can't reuse object similar to MapReduce development? I wondered, if you could point me how is it possible to avoid GC overhead...thank you in advance Thank you, Konstantin Kudryavtsev

how to convert RDD to PairRDDFunctions ?

2014-07-08 Thread Konstantin Kudryavtsev
hank you, Konstantin Kudryavtsev

Filtering data during the read

2014-07-09 Thread Konstantin Kudryavtsev
is applying. Is it any way to apply filtering during the read step? and don't put all objects into memory? Thank you, Konstantin Kudryavtsev

Spark scheduling with Capacity scheduler

2014-07-17 Thread Konstantin Kudryavtsev
advance Thank you, Konstantin Kudryavtsev

Ports required for running spark

2014-07-31 Thread Konstantin Kudryavtsev
in advance, Konstantin Kudryavtsev

Re: Ports required for running spark

2014-07-31 Thread Konstantin Kudryavtsev
Hi Larry, I'm afraid this is standalone mode, I'm interesting in YARN Also, I don't see port-in-trouble 33007 which i believe related to Akka Thank you, Konstantin Kudryavtsev On Thu, Jul 31, 2014 at 1:11 PM, Larry Xiao wrote: > Hi Konstantin, > > I think yo

Re: Ports required for running spark

2014-07-31 Thread Konstantin Kudryavtsev
> On Thu, Jul 31, 2014 at 6:17 PM, Konstantin Kudryavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Hi Larry, >> >> I'm afraid this is standalone mode, I'm interesting in YARN >> >> Also, I don't see port-in-trouble 33007 which i

reduceByKey to get all associated values

2014-08-07 Thread Konstantin Kudryavtsev
sort it in particular way and apply some business logic. Thank you in advance, Konstantin Kudryavtsev

Re: Spark output compression on HDFS

2014-04-03 Thread Konstantin Kudryavtsev
gt;>>> :21: error: type mismatch; >>>>>> found : >>>>>> Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>>> required: Option[Class[_ <: >>>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>>> counts.saveAsSequenceFile(output, >>>>>> classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>>> >>>>>> and it doesn't work even for Gzip: >>>>>> >>>>>> >>>>>> >>>>>> counts.saveAsSequenceFile(output, >>>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>> :21: error: type mismatch; >>>>>> found : >>>>>> Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>> required: Option[Class[_ <: >>>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>>> counts.saveAsSequenceFile(output, >>>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>> >>>>>> Could you please suggest solution? also, I didn't find how is it >>>>>> possible to specify compression parameters (i.e. compression type for >>>>>> Snappy). I wondered if you could share code snippets for writing/reading >>>>>> RDD with compression? >>>>>> >>>>>> Thank you in advance, >>>>>> Konstantin Kudryavtsev >>>>>> >>>>> >>>>> >>>> >>> >> >

Re: how to save RDD partitions in different folders?

2014-04-04 Thread Konstantin Kudryavtsev
Hi Evan, Could you please provide a code-snippet? Because it not clear for me, in Hadoop you need to engage addNamedOutput method and I'm in stuck how to use it from Spark Thank you, Konstantin Kudryavtsev On Fri, Apr 4, 2014 at 5:27 PM, Evan Sparks wrote: > Have a look at MultipleOu

Re: Spark output compression on HDFS

2014-04-04 Thread Konstantin Kudryavtsev
Can anybody suggest how to change compression level (Record, Block) for Snappy? if it possible, of course thank you in advance Thank you, Konstantin Kudryavtsev On Thu, Apr 3, 2014 at 10:28 PM, Konstantin Kudryavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Thanks all, it works

Re: is it possible to initiate Spark jobs from Oozie?

2014-04-10 Thread Konstantin Kudryavtsev
I believe you need to write custom action or engage java action On Apr 10, 2014 12:11 AM, "Segerlind, Nathan L" < nathan.l.segerl...@intel.com> wrote: > Howdy. > > > > Is it possible to initiate Spark jobs from Oozie (presumably as a java > action)? If so, are there known limitations to this? An

Re: Pig on Spark

2014-04-10 Thread Konstantin Kudryavtsev
Hi Mayur, I wondered if you could share your findings in some way (github, blog post, etc). I guess your experience will be very interesting/useful for many people sent from Lenovo YogaTablet On Apr 8, 2014 8:48 PM, "Mayur Rustagi" wrote: > Hi Ankit, > Thanx for all the work on Pig. > Finally g

unsibscribe

2014-05-05 Thread Konstantin Kudryavtsev
unsibscribe Thank you, Konstantin Kudryavtsev