JavaRDD with custom class?

2016-04-12 Thread Daniel Valdivia
Hi, I'm moving some code from Scala to Java and I just hit a wall where I'm trying to move an RDD with a custom data structure to java, but I'm not being able to do so: Scala Code: case class IncodentDoc(system_id: String, category: String, terms: Seq[String]) var incTup = inc_filtered.map(rec

How to deal with same class mismatch?

2016-02-01 Thread Daniel Valdivia
Hi, I'm having a couple of issues. I'm experiencing a known issue on the spark-shell where I'm getting a type mismatch for the right class :82: error: type mismatch; found : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.a

Set Hadoop User in Spark Shell

2016-01-14 Thread Daniel Valdivia
Hi, I'm trying to set the value of a hadoop parameter within spark-shell, and System.setProperty("HADOOP_USER_NAME", "hadoop") seem to not be doing the trick Does anything know how I can set the hadoop.job.ugi parameter from within spark-shell ? Cheers -

Re: Put all elements of RDD into array

2016-01-11 Thread Daniel Valdivia
our val. >> >> In case you ever want to append values iteratively, search for how to use >> scala "ArrayBuffer"s. Also, keep in mind that RDDs have a foreach method, so >> no need to call collect followed by foreach. >> >> regards, >> --Jakob >> &

Put all elements of RDD into array

2016-01-11 Thread Daniel Valdivia
Hello, I'm trying to put all the values in pair rdd into an array (or list) for later storing, however even if I'm collecting the data then pushing it to the array the array size after the run is 0. Any idea on what I'm missing? Thanks in advance scala> val tpdist: Array[Array[Double]] = Ar

Re: Monitor Job on Yarn

2016-01-04 Thread Daniel Valdivia
ning-on-yarn.html> > > Note spark.yarn.historyServer.address > FYI > > On Mon, Jan 4, 2016 at 2:49 PM, Daniel Valdivia <mailto:h...@danielvaldivia.com>> wrote: > Hello everyone, happy new year, > > I submitted an app to yarn, however I'm unable to

Monitor Job on Yarn

2016-01-04 Thread Daniel Valdivia
Hello everyone, happy new year, I submitted an app to yarn, however I'm unable to monitor it's progress on the driver node, not in :8080 or :4040 as documented, when submitting to the standalone mode I could monitor however seems liek its not the case right now. I submitted my app this way: s

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Daniel Valdivia
. driver is > NOT a thread in ApplicationMaster; use --packages to submit a jar > > > On Tuesday, December 29, 2015 1:54 PM, Andrew Or > wrote: > > > Hi Greg, > > It's actually intentional for standalone cluster mode to not upload jars. One > of the rea

Can't submit job to stand alone cluster

2015-12-28 Thread Daniel Valdivia
Hi, I'm trying to submit a job to a small spark cluster running in stand alone mode, however it seems like the jar file I'm submitting to the cluster is "not found" by the workers nodes. I might have understood wrong, but I though the Driver node would send this jar file to the worker nodes, o

Re: Missing dependencies when submitting scala app

2015-12-23 Thread Daniel Valdivia
pecify org.json4s.jackson in your sbt dependency but with a > different version ? > > On Wed, Dec 23, 2015 at 6:15 AM, Daniel Valdivia > wrote: > >> Hi, >> >> I'm trying to figure out how to bundle dependendies with a scala >> application, so far

Missing dependencies when submitting scala app

2015-12-22 Thread Daniel Valdivia
Hi, I'm trying to figure out how to bundle dependendies with a scala application, so far my code was tested successfully on the spark-shell however now that I'm trying to run it as a stand alone application which I'm compilin with sbt is yielding me the error: java.lang.NoSuchMethodError: org.

Access row column by field name

2015-12-16 Thread Daniel Valdivia
Hi, I'm processing the json I have in a text file using DataFrames, however right now I'm trying to figure out a way to access a certain value within the rows of my data frame if I only know the field name and not the respective field position in the schema. I noticed that row.schema and row.d

Scala VS Java VS Python

2015-12-16 Thread Daniel Valdivia
Hello, This is more of a "survey" question for the community, you can reply to me directly so we don't flood the mailing list. I'm having a hard time learning Spark using Python since the API seems to be slightly incomplete, so I'm looking at my options to start doing all my apps in either Sca

Re: PairRDD(K, L) to multiple files by key serializing each value in L before

2015-12-16 Thread Daniel Valdivia
iterate through the values of this key,value pair > >for ele in line[1]: > > 4. Write every ele into the file created. > 5. Close the file. > > Do you think this works? > > Thanks > Abhishek S > > > Thank you! > > With Regards, > Abhis

PairRDD(K, L) to multiple files by key serializing each value in L before

2015-12-15 Thread Daniel Valdivia
Hello everyone, I have a PairRDD with a set of key and list of values, each value in the list is a json which I already loaded beginning of my spark app, how can I iterate over each value of the list in my pair RDD to transform it to a string then save the whole content of the key to a file? on