Re: Spark Version upgrade isue:Exception in thread "main" java.lang.NoSuchMethodError

2015-08-29 Thread Raghavendra Pandey
Looks like ur version n spark's Jackson package are at different versions. Raghav On Aug 28, 2015 4:01 PM, "Manohar753" wrote: > Hi Team, > I upgraded spark older versions to 1.4.1 after maven build i tried to ran > my > simple application but it failed and giving the below stacktrace. > > Excep

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Feynman Liang
I think the spark.ml logistic regression currently only supports 0/1 labels. If you need multiclass, I would suggest to look at either the spark.ml decision trees. If you don't care too much for pipelines, then you could use the spark.mllib logistic regression after featurizing. On Sat, Aug 29, 20

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Zsombor Egyed
Thank you, I saw this before, but it is "just" a binary classification, so how can I extract this to multiple classification. Simply add different labels? e.g.: new LabeledDocument(0L, "a b c d e spark", 1.0), new LabeledDocument(1L, "b d", 0.0), new LabeledDocument(2L, "hadoop f g h", 2.0)

Re: How to remove worker node but let it finish first?

2015-08-29 Thread Romi Kuntsman
It's only available in Mesos? I'm using spark standalone cluster, is there anything about it there? On Fri, Aug 28, 2015 at 8:51 AM Akhil Das wrote: > You can create a custom mesos framework for your requirement, to get you > started you can check this out > http://mesos.apache.org/documentation

Re: How to generate spark assembly (jar file) using Intellij

2015-08-29 Thread Feynman Liang
Have you tried `build/sbt assembly`? On Sat, Aug 29, 2015 at 9:03 PM, Muler wrote: > Hi guys, > > I can successfully build Spark using Intellij, but I'm not able to > locate/generate spark assembly (jar file) in the assembly/target directly) > How do I generate one? I have attached the screensho

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Feynman Liang
I would check out the Pipeline code example On Sat, Aug 29, 2015 at 9:23 PM, Zsombor Egyed wrote: > Hi! > > I want to implement a multiclass classification for documents. > So I have different kinds of text files, and I want t

Spark MLLIB multiclass calssification

2015-08-29 Thread Zsombor Egyed
Hi! I want to implement a multiclass classification for documents. So I have different kinds of text files, and I want to classificate them with spark mllib in java. Do you have any code examples? Thanks! -- *Egyed Zsombor * Junior Big Data Engineer Mobile: +36 70 320 65 81 | Twitter:@sta

Re: Is there a way to store RDD and load it with its original format?

2015-08-29 Thread Akhil Das
You can do a rdd.saveAsObjectFile for storing and for reading you can do a sc.objectFile Thanks Best Regards On Thu, Aug 27, 2015 at 9:29 PM, wrote: > Hi, > > Any way to store/load RDDs keeping their original object instead of string? > > I am having trouble with parquet (there is always some e

Re: commit DB Transaction for each partition

2015-08-29 Thread Akhil Das
What problem are you having? you will have to trigger an action at the end to execute this piece of code. Like: rdd.mapPartitions(partitionOfRecords => { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results })*.count()* Thanks Best Regards On Thu,

Re: spark-submit issue

2015-08-29 Thread Akhil Das
Did you try putting a sc.stop at the end of your pipeline? Thanks Best Regards On Thu, Aug 27, 2015 at 6:41 PM, pranay wrote: > I have a java program that does this - (using Spark 1.3.1 ) Create a > command > string that uses "spark-submit" in it ( with my Class file etc ), and i > store this s

Re: Setting number of CORES from inside the Topology (JAVA code )

2015-08-29 Thread Akhil Das
When you set .setMaster to local[4], it means that you are allocating 4 threads on your local machine. You can change it to local[1] to run it on a single thread. If you are submitting the job to a standalone spark cluster and you wanted to limit the # cores for your job, then you can do it like *

Spark shell and StackOverFlowError

2015-08-29 Thread ashrowty
I am running the Spark shell (1.2.1) in local mode and I have a simple RDD[(String,String,Double)] with about 10,000 objects in it. I get a StackOverFlowError each time I try to run the following code (the code itself is just representative of other logic where I need to pass in a variable). I trie

Re: Spark-on-YARN LOCAL_DIRS location

2015-08-29 Thread Akhil Das
Yes, you can set the SPARK_LOCAL_DIR in the spark-env.sh or spark.local.dir in the spark-defaults.conf file, then it would use this location for the shuffle writes etc. Thanks Best Regards On Wed, Aug 26, 2015 at 6:56 PM, wrote: > Hi, > > > > I am having issues with /tmp space filling up during

Re: Alternative to Large Broadcast Variables

2015-08-29 Thread Hemminger Jeff
Thanks for the recommendations. I had been focused on solving the problem "within Spark" but a distributed database sounds like a better solution. Jeff On Sat, Aug 29, 2015 at 11:47 PM, Ted Yu wrote: > Not sure if the race condition you mentioned is related to Cassandra's > data consistency mod

Re: How to avoid shuffle errors for a large join ?

2015-08-29 Thread Reynold Xin
Can you try 1.5? This should work much, much better in 1.5 out of the box. For 1.4, I think you'd want to turn on sort-merge-join, which is off by default. However, the sort-merge join in 1.4 can still trigger a lot of garbage, making it slower. SMJ performance is probably 5x - 1000x better in 1.5

Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-29 Thread timothy22000
I am doing some memory tuning on my Spark job on YARN and I notice different settings would give different results and affect the outcome of the Spark job run. However, I am confused and do not understand completely why it happens and would appreciate if someone can provide me with some guidance an

Re: Scala: Overload method by its class type

2015-08-29 Thread Akhil Das
This is more of a scala related question, have a look at the case classes in scala http://www.scala-lang.org/old/node/107 Thanks Best Regards On Tue, Aug 25, 2015 at 6:55 PM, wrote: > Hi all, > > I have SomeClass[TYPE] { def some_method(args: fixed_type_args): TYPE } > > And on runtime, I creat

Re: Invalid environment variable name when submitting job from windows

2015-08-29 Thread Akhil Das
I think you have to use the keyword *set* to set an environment variable in windows. Check the section Setting environment variables from http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true Thanks Best Regards On Tue, Aug 25, 2015 at 1

Re: Build k-NN graph for large dataset

2015-08-29 Thread Maruf Aytekin
Yes you need to use dimensionality reduction and/or locality sensitive hashing to reduce number of pairs to compare. There is also LSH implementation for collection of vectors I have just published here: https://github.com/marufaytekin/lsh-spark. Implementation i based on this paper: http://www.cs.

Re: How to set environment of worker applications

2015-08-29 Thread Jan Algermissen
Finally, I found the solution: on the spark context you can set spark.executorEnv.[EnvironmentVariableName] and these will be available in the environment of the executors This is in fact documented, but somehow I missed it. https://spark.apache.org/docs/latest/configuration.html#runtime-enviro

Re: Alternative to Large Broadcast Variables

2015-08-29 Thread Raghavendra Pandey
We are using Cassandra for similar kind of problem and it works well... You need to take care of race condition between updating the store and looking up the store... On Aug 29, 2015 1:31 AM, "Ted Yu" wrote: > +1 on Jason's suggestion. > > bq. this large variable is broadcast many times during th

Re: Array Out OF Bound Exception

2015-08-29 Thread Raghavendra Pandey
So either you empty line at the end or when you use string.split you dont specify -1 as second parameter... On Aug 29, 2015 1:18 PM, "Akhil Das" wrote: > I suspect in the last scenario you are having an empty new line at the > last line. If you put a try..catch you'd definitely know. > > Thanks >

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
It depends, if HDFS is running under windows, FUSE won't work, but if HDFS is on a linux VM, Box, or cluster, then you can have the linux box/vm mount HDFS through FUSE and at the same time export its mount point on samba. At that point, your windows machine can just connect to the samba share. R.

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Dino Fancellu
I'm using Windows. Are you saying it works with Windows? Dino. On 29 August 2015 at 09:04, Akhil Das wrote: > You can also mount HDFS through the NFS gateway and access i think. > > Thanks > Best Regards > > On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu wrote: >> >> http://hortonworks.com/blo

Apache Spark Suitable JDBC Driver not found

2015-08-29 Thread shawon
0 down vote favorite I am using Apache Spark for analyzing query log. I already faced some difficulties to setup spark. Now I am using a standalone cluster to process queries. First I used example code in java to count words that worked fine. But when I try to connect it to a MySQL serv

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
If HDFS is on a linux VM, you could also mount it with FUSE and export it with samba 2015-08-29 2:26 GMT-07:00 Ted Yu : > See > https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html > > FYI > > On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das > wrote: > >> You can a

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Ted Yu
See https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html FYI On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das wrote: > You can also mount HDFS through the NFS gateway and access i think. > > Thanks > Best Regards > > On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Akhil Das
You can also mount HDFS through the NFS gateway and access i think. Thanks Best Regards On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu wrote: > http://hortonworks.com/blog/windows-explorer-experience-hdfs/ > > Seemed to exist, now now sign. > > Anything similar to tie HDFS into windows explorer

Re: Array Out OF Bound Exception

2015-08-29 Thread Akhil Das
I suspect in the last scenario you are having an empty new line at the last line. If you put a try..catch you'd definitely know. Thanks Best Regards On Tue, Aug 25, 2015 at 2:53 AM, Michael Armbrust wrote: > This top line here is indicating that the exception is being throw from > your code (i.

Re: History server is not receiving any event

2015-08-29 Thread Akhil Das
Are you starting your history server? ./sbin/start-history-server.sh You can read more here http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact Thanks Best Regards On Tue, Aug 25, 2015 at 1:07 AM, b.bhavesh wrote: > Hi, > > I am working on streaming application. > I t