notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal
am allowing in my cluster. so that if new kernel starts he will at least one container for master. it can be dynamic on priority based. if there is no container left then yarn can preempt some containers and provide them to new requests. -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Spark Streaming with Redis

2016-05-24 Thread Sachin Aggarwal
problem. > > Thanks in advance. > > Cheers, > Pari > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Re: Re: How to change output mode to Update

2016-05-17 Thread Sachin Aggarwal
sorry my mistake i gave wrong id here is correct one https://issues.apache.org/jira/browse/SPARK-15183 On Wed, May 18, 2016 at 11:19 AM, Todd wrote: > Hi Sachin, > > Could you please give the url of jira-15146? Thanks! > > > > > > At 2016-05-18 13:33:47, "S

Re: Re: How to change output mode to Update

2016-05-17 Thread Sachin Aggarwal
Hi, there is some code I have added in jira-15146 please have a look at it, I have not finished it. U can use the same code in ur example as of now On 18-May-2016 10:46 AM, "Saisai Shao" wrote: > > .mode(SaveMode.Overwrite) > > From my understanding mode is not supported in continuous query. > >

Re: Spark structured streaming is Micro batch?

2016-05-06 Thread Sachin Aggarwal
rts processing as and when the data appears. It's no more seems like >> micro batch processing. Is spark structured streaming will be an event >> based processing? >> >> -- >> Regards, >> Madhukara Phatak >> http://datamantra.io/ >> > > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
2, 3).toDS() ds: org.apache.spark.sql.Dataset[Int] = [value: int] scala> ds.map(_ + 1).collect() // Returns: Array(2, 3, 4) res0: Array[Int] = Array(2, 3, 4) On Wed, Apr 27, 2016 at 4:01 PM, shengshanzhang wrote: > 1.6.1 > > 在 2016年4月27日,下午6:28,Sachin Aggarwal 写道: > > what is

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
y and how to fix this? > > scala> val ds = Seq(1, 2, 3).toDS() > :35: error: value toDS is not a member of Seq[Int] >val ds = Seq(1, 2, 3).toDS() > > > > Thank you a lot! > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Need Streaming output to single HDFS File

2016-04-12 Thread Sachin Aggarwal
output to a single file. dstream.saveAsTexFiles() is creating > files in different folders. Is there a way to write to a single folder ? or > else if written to different folders, how do I merge them ? > Thanks, > Padma Ch > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 5 April 2016 at 09:06, Sachin Aggarwal > wrote: > >> Hey , >> >> I have changed your example itself try this , it should work in terminal >> >> val result = lin

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >

Re: multiple splits fails

2016-04-04 Thread Sachin Aggarwal
gt;>>>>>>>>> 3. Filter is expecting a Boolean result. So you can merge your >>>>>>>>>> contains filters to one with AND (&&) statement. >>>>>>>>>> 4. Correct. Each character in split() is used as a divider. >>>>>>>>>> >>>>>>>>>> Eliran Bivas >>>>>>>>>> >>>>>>>>>> *From:* Mich Talebzadeh >>>>>>>>>> *Sent:* Apr 3, 2016 15:06 >>>>>>>>>> *To:* Eliran Bivas >>>>>>>>>> *Cc:* user @spark >>>>>>>>>> *Subject:* Re: multiple splits fails >>>>>>>>>> >>>>>>>>>> Hi Eliran, >>>>>>>>>> >>>>>>>>>> Many thanks for your input on this. >>>>>>>>>> >>>>>>>>>> I thought about what I was trying to achieve so I rewrote the >>>>>>>>>> logic as follows: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>1. Read the text file in >>>>>>>>>>2. Filter out empty lines (well not really needed here) >>>>>>>>>>3. Search for lines that contain "ASE 15" and further have >>>>>>>>>>sentence "UPDATE INDEX STATISTICS" in the said line >>>>>>>>>>4. Split the text by "\t" and "," >>>>>>>>>>5. Print the outcome >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This was what I did with your suggestions included >>>>>>>>>> >>>>>>>>>> val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt") >>>>>>>>>> f.cache() >>>>>>>>>> f.filter(_.length > 0).filter(_ contains("ASE 15")).filter(_ >>>>>>>>>> contains("UPDATE INDEX STATISTICS")).flatMap(line => >>>>>>>>>> line.split("\t,")).map(word => (word, 1)).reduceByKey(_ + >>>>>>>>>> _).collect.foreach(println) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Couple of questions if I may >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>1. I take that "_" refers to content of the file read in by >>>>>>>>>>default? >>>>>>>>>>2. _.length > 0 basically filters out blank lines (not really >>>>>>>>>>needed here) >>>>>>>>>>3. Multiple filters are needed for each *contains* logic >>>>>>>>>>4. split"\t," splits the filter by carriage return AND ,? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3 April 2016 at 12:35, Eliran Bivas wrote: >>>>>>>>>> >>>>>>>>>>> Hi Mich, >>>>>>>>>>> >>>>>>>>>>> Few comments: >>>>>>>>>>> >>>>>>>>>>> When doing .filter(_ > “”) you’re actually doing a lexicographic >>>>>>>>>>> comparison and not filtering for empty lines (which could be >>>>>>>>>>> achieved with >>>>>>>>>>> _.notEmpty or _.length > 0). >>>>>>>>>>> I think that filtering with _.contains should be sufficient and >>>>>>>>>>> the first filter can be omitted. >>>>>>>>>>> >>>>>>>>>>> As for line => line.split(“\t”).split(“,”): >>>>>>>>>>> You have to do a second map or (since split() requires a regex >>>>>>>>>>> as input) .split(“\t,”). >>>>>>>>>>> The problem is that your first split() call will generate an >>>>>>>>>>> Array and then your second call will result in an error. >>>>>>>>>>> e.g. >>>>>>>>>>> >>>>>>>>>>> val lines: Array[String] = line.split(“\t”) >>>>>>>>>>> lines.split(“,”) // Compilation error - no method split() exists >>>>>>>>>>> for Array >>>>>>>>>>> >>>>>>>>>>> So either go with map(_.split(“\t”)).map(_.split(“,”)) or >>>>>>>>>>> map(_.split(“\t,”)) >>>>>>>>>>> >>>>>>>>>>> Hope that helps. >>>>>>>>>>> >>>>>>>>>>> *Eliran Bivas* >>>>>>>>>>> Data Team | iguaz.io >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3 Apr 2016, at 13:31, Mich Talebzadeh < >>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am not sure this is the correct approach >>>>>>>>>>> >>>>>>>>>>> Read a text file in >>>>>>>>>>> >>>>>>>>>>> val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt") >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Now I want to get rid of empty lines and filter only the lines >>>>>>>>>>> that contain "ASE15" >>>>>>>>>>> >>>>>>>>>>> f.filter(_ > "").filter(_ contains("ASE15")). >>>>>>>>>>> >>>>>>>>>>> The above works but I am not sure whether I need two filter >>>>>>>>>>> transformation above? Can it be done in one? >>>>>>>>>>> >>>>>>>>>>> Now I want to map the above filter to lines with carriage return >>>>>>>>>>> ans split them by "," >>>>>>>>>>> >>>>>>>>>>> f.filter(_ > "").filter(_ contains("ASE15")).map(line => >>>>>>>>>>> (line.split("\t"))) >>>>>>>>>>> res88: org.apache.spark.rdd.RDD[Array[String]] = >>>>>>>>>>> MapPartitionsRDD[131] at map at :30 >>>>>>>>>>> >>>>>>>>>>> Now I want to split the output by "," >>>>>>>>>>> >>>>>>>>>>> scala> f.filter(_ > "").filter(_ contains("ASE15")).map(line => >>>>>>>>>>> (line.split("\t").split(","))) >>>>>>>>>>> :30: error: value split is not a member of Array[String] >>>>>>>>>>> f.filter(_ > "").filter(_ >>>>>>>>>>> contains("ASE15")).map(line => (line.split("\t").split(","))) >>>>>>>>>>> >>>>>>>>>>> ^ >>>>>>>>>>> Any advice will be appreciated >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> LinkedIn * >>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Spark Streaming - graceful shutdown when stream has no more data

2016-03-01 Thread Sachin Aggarwal
conf = new > > > > SparkConf().setMaster(master).setAppName("StreamingLogEnhanced") > > > > // Create a StreamingContext with a n second batch size > > > > val ssc = new StreamingContext(conf, Seconds(10)) > > > > // Create a DStream from all the input on port > > > > val log = Logger.getLogger(getClass.getName) > > > > > > > > sys.ShutdownHookThread { > > > > log.info("Gracefully stopping Spark Streaming Application") > > > > ssc.stop(true, true) > > > > log.info("Application stopped") > > > > } > > > > val lines = ssc.socketTextStream("localhost", ) > > > > // Create a count of log hits by ip > > > > var ipCounts=countByIp(lines) > > > > ipCounts.print() > > > > > > > > // start our streaming context and wait for it to "finish" > > > > ssc.start() > > > > // Wait for 600 seconds then exit > > > > ssc.awaitTermination(1*600) > > > > ssc.stop() > > > > } > > > > > > > > def countByIp(lines: DStream[String]) = { > > > >val parser = new AccessLogParser > > > >val accessLogDStream = lines.map(line => parser.parseRecord(line)) > > > >val ipDStream = accessLogDStream.map(entry => > > > > (entry.get.clientIpAddress, 1)) > > > >ipDStream.reduceByKey((x, y) => x + y) > > > > } > > > > > > > > } > > > > Thanks for any suggestions in advance. > > > > > > > > > > > > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Thanks & Regards Sachin Aggarwal 7760502772

explaination for parent.slideDuration in ReducedWindowedDStream

2016-02-18 Thread Sachin Aggarwal
ot;# old RDDs = " + oldRDDs.size) // Get the RDDs of the reduced values in "new time steps" val newRDDs = reducedStream.slice(previousWindow.endTime + parent.slideDuration, currentWindow.endTime) logDebug("# new RDDs = " + newRDDs.size) -- Thanks & Regards Sachin Aggarwal 7760502772

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
I am sorry for spam, I replied in wrong thread sleepy head :-( On Fri, Feb 5, 2016 at 1:15 AM, Sachin Aggarwal wrote: > > http://coenraets.org/blog/2011/11/set-up-an-amazon-ec2-instance-with-tomcat-and-mysql-5-minutes-tutorial/ > > The default Tomcat server uses port 8080. You need

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
AWS Management Console, select Security Groups (left navigation bar), select the quick-start group, the Inbound tab and add port 8080. Make sure you click “Add Rule” and then “Apply Rule Changes”. On Fri, Feb 5, 2016 at 1:14 AM, Sachin Aggarwal wrote: > i think we need to add port >

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
gt;>> } >>>>> >>>>> def onStop() { >>>>> // There is nothing much to do as the thread calling receive() >>>>> // is designed to stop by itself isStopped() returns false >>>>> } >>>>> >>>>> /** Create a socket connection and receive data until receiver is >>>>> stopped >>>>> */ >>>>> private def receive() { >>>>> while(!isStopped()) { >>>>> store("I am a dummy source " + Random.nextInt(10)) >>>>> Thread.sleep((1000.toDouble / ratePerSec).toInt) >>>>> } >>>>> } >>>>> } >>>>> {code} >>>>> >>>>> The given issue resides in the following >>>>> *MapWithStateRDDRecord.updateRecordWithData*, starting line 55, in the >>>>> following code block: >>>>> >>>>> {code} >>>>> dataIterator.foreach { case (key, value) => >>>>> wrappedState.wrap(newStateMap.get(key)) >>>>> val returned = mappingFunction(batchTime, key, Some(value), >>>>> wrappedState) >>>>> if (wrappedState.isRemoved) { >>>>> newStateMap.remove(key) >>>>> } else if (wrappedState.isUpdated || >>>>> timeoutThresholdTime.isDefined) >>>>> /* <--- problem is here */ { >>>>> newStateMap.put(key, wrappedState.get(), >>>>> batchTime.milliseconds) >>>>> } >>>>> mappedData ++= returned >>>>> } >>>>> {code} >>>>> >>>>> In case the stream has a timeout set, but the state wasn't set at all, >>>>> the >>>>> "else-if" will still follow through because the timeout is defined but >>>>> "wrappedState" is empty and wasn't set. >>>>> >>>>> If it is mandatory to update state for each entry of *mapWithState*, >>>>> then >>>>> this code should throw a better exception than >>>>> "NoSuchElementException", >>>>> which doesn't really saw anything to the developer. >>>>> >>>>> I haven't provided a fix myself because I'm not familiar with the spark >>>>> implementation, but it seems to be there needs to either be an extra >>>>> check >>>>> if the state is set, or as previously stated a better exception >>>>> message. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/PairDStreamFunctions-mapWithState-fails-in-case-timeout-is-set-without-updating-State-S-tp26147.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> - >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Yuval Itzchakov. >>> >> >> -- Thanks & Regards Sachin Aggarwal 7760502772

Explaination for info shown in UI

2016-01-28 Thread Sachin Aggarwal
StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped) 4/4 (3 skipped) 219 Streaming job from [output operation 0, batch time 02:51:00] print at StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2 4/4 -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
Hi, has anyone faced this error, is there any workaround to this issue? thanks On Thu, Dec 3, 2015 at 4:28 PM, Sahil Sareen wrote: > Attaching the JIRA as well for completeness: > https://issues.apache.org/jira/browse/SPARK-12117 > > On Thu, Dec 3, 2015 at 4:13 PM, Sac

Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
Hi All, need help guys, I need a work around for this situation *case where this works:* val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"), ("Rishabh", "2"))).toDF("myText", "id") TestDoc1.select(callUDF(&quo