am allowing in my cluster. so that if new kernel starts he will at least
one container for master. it can be dynamic on priority based. if there is
no container left then yarn can preempt some containers and provide them to
new requests.
--
Thanks & Regards
Sachin Aggarwal
7760502772
problem.
>
> Thanks in advance.
>
> Cheers,
> Pari
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
sorry my mistake i gave wrong id
here is correct one
https://issues.apache.org/jira/browse/SPARK-15183
On Wed, May 18, 2016 at 11:19 AM, Todd wrote:
> Hi Sachin,
>
> Could you please give the url of jira-15146? Thanks!
>
>
>
>
>
> At 2016-05-18 13:33:47, "S
Hi, there is some code I have added in jira-15146 please have a look at it,
I have not finished it. U can use the same code in ur example as of now
On 18-May-2016 10:46 AM, "Saisai Shao" wrote:
> > .mode(SaveMode.Overwrite)
>
> From my understanding mode is not supported in continuous query.
>
>
rts processing as and when the data appears. It's no more seems like
>> micro batch processing. Is spark structured streaming will be an event
>> based processing?
>>
>> --
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
2, 3).toDS()
ds: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> ds.map(_ + 1).collect() // Returns: Array(2, 3, 4)
res0: Array[Int] = Array(2, 3, 4)
On Wed, Apr 27, 2016 at 4:01 PM, shengshanzhang
wrote:
> 1.6.1
>
> 在 2016年4月27日,下午6:28,Sachin Aggarwal 写道:
>
> what is
y and how to fix this?
>
> scala> val ds = Seq(1, 2, 3).toDS()
> :35: error: value toDS is not a member of Seq[Int]
>val ds = Seq(1, 2, 3).toDS()
>
>
>
> Thank you a lot!
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
output to a single file. dstream.saveAsTexFiles() is creating
> files in different folders. Is there a way to write to a single folder ? or
> else if written to different folders, how do I merge them ?
> Thanks,
> Padma Ch
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 April 2016 at 09:06, Sachin Aggarwal
> wrote:
>
>> Hey ,
>>
>> I have changed your example itself try this , it should work in terminal
>>
>> val result = lin
Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
gt;>>>>>>>>> 3. Filter is expecting a Boolean result. So you can merge your
>>>>>>>>>> contains filters to one with AND (&&) statement.
>>>>>>>>>> 4. Correct. Each character in split() is used as a divider.
>>>>>>>>>>
>>>>>>>>>> Eliran Bivas
>>>>>>>>>>
>>>>>>>>>> *From:* Mich Talebzadeh
>>>>>>>>>> *Sent:* Apr 3, 2016 15:06
>>>>>>>>>> *To:* Eliran Bivas
>>>>>>>>>> *Cc:* user @spark
>>>>>>>>>> *Subject:* Re: multiple splits fails
>>>>>>>>>>
>>>>>>>>>> Hi Eliran,
>>>>>>>>>>
>>>>>>>>>> Many thanks for your input on this.
>>>>>>>>>>
>>>>>>>>>> I thought about what I was trying to achieve so I rewrote the
>>>>>>>>>> logic as follows:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>1. Read the text file in
>>>>>>>>>>2. Filter out empty lines (well not really needed here)
>>>>>>>>>>3. Search for lines that contain "ASE 15" and further have
>>>>>>>>>>sentence "UPDATE INDEX STATISTICS" in the said line
>>>>>>>>>>4. Split the text by "\t" and ","
>>>>>>>>>>5. Print the outcome
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This was what I did with your suggestions included
>>>>>>>>>>
>>>>>>>>>> val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt")
>>>>>>>>>> f.cache()
>>>>>>>>>> f.filter(_.length > 0).filter(_ contains("ASE 15")).filter(_
>>>>>>>>>> contains("UPDATE INDEX STATISTICS")).flatMap(line =>
>>>>>>>>>> line.split("\t,")).map(word => (word, 1)).reduceByKey(_ +
>>>>>>>>>> _).collect.foreach(println)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Couple of questions if I may
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>1. I take that "_" refers to content of the file read in by
>>>>>>>>>>default?
>>>>>>>>>>2. _.length > 0 basically filters out blank lines (not really
>>>>>>>>>>needed here)
>>>>>>>>>>3. Multiple filters are needed for each *contains* logic
>>>>>>>>>>4. split"\t," splits the filter by carriage return AND ,?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn *
>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3 April 2016 at 12:35, Eliran Bivas wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mich,
>>>>>>>>>>>
>>>>>>>>>>> Few comments:
>>>>>>>>>>>
>>>>>>>>>>> When doing .filter(_ > “”) you’re actually doing a lexicographic
>>>>>>>>>>> comparison and not filtering for empty lines (which could be
>>>>>>>>>>> achieved with
>>>>>>>>>>> _.notEmpty or _.length > 0).
>>>>>>>>>>> I think that filtering with _.contains should be sufficient and
>>>>>>>>>>> the first filter can be omitted.
>>>>>>>>>>>
>>>>>>>>>>> As for line => line.split(“\t”).split(“,”):
>>>>>>>>>>> You have to do a second map or (since split() requires a regex
>>>>>>>>>>> as input) .split(“\t,”).
>>>>>>>>>>> The problem is that your first split() call will generate an
>>>>>>>>>>> Array and then your second call will result in an error.
>>>>>>>>>>> e.g.
>>>>>>>>>>>
>>>>>>>>>>> val lines: Array[String] = line.split(“\t”)
>>>>>>>>>>> lines.split(“,”) // Compilation error - no method split() exists
>>>>>>>>>>> for Array
>>>>>>>>>>>
>>>>>>>>>>> So either go with map(_.split(“\t”)).map(_.split(“,”)) or
>>>>>>>>>>> map(_.split(“\t,”))
>>>>>>>>>>>
>>>>>>>>>>> Hope that helps.
>>>>>>>>>>>
>>>>>>>>>>> *Eliran Bivas*
>>>>>>>>>>> Data Team | iguaz.io
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3 Apr 2016, at 13:31, Mich Talebzadeh <
>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am not sure this is the correct approach
>>>>>>>>>>>
>>>>>>>>>>> Read a text file in
>>>>>>>>>>>
>>>>>>>>>>> val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now I want to get rid of empty lines and filter only the lines
>>>>>>>>>>> that contain "ASE15"
>>>>>>>>>>>
>>>>>>>>>>> f.filter(_ > "").filter(_ contains("ASE15")).
>>>>>>>>>>>
>>>>>>>>>>> The above works but I am not sure whether I need two filter
>>>>>>>>>>> transformation above? Can it be done in one?
>>>>>>>>>>>
>>>>>>>>>>> Now I want to map the above filter to lines with carriage return
>>>>>>>>>>> ans split them by ","
>>>>>>>>>>>
>>>>>>>>>>> f.filter(_ > "").filter(_ contains("ASE15")).map(line =>
>>>>>>>>>>> (line.split("\t")))
>>>>>>>>>>> res88: org.apache.spark.rdd.RDD[Array[String]] =
>>>>>>>>>>> MapPartitionsRDD[131] at map at :30
>>>>>>>>>>>
>>>>>>>>>>> Now I want to split the output by ","
>>>>>>>>>>>
>>>>>>>>>>> scala> f.filter(_ > "").filter(_ contains("ASE15")).map(line =>
>>>>>>>>>>> (line.split("\t").split(",")))
>>>>>>>>>>> :30: error: value split is not a member of Array[String]
>>>>>>>>>>> f.filter(_ > "").filter(_
>>>>>>>>>>> contains("ASE15")).map(line => (line.split("\t").split(",")))
>>>>>>>>>>>
>>>>>>>>>>> ^
>>>>>>>>>>> Any advice will be appreciated
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn *
>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
conf = new
> >
> > SparkConf().setMaster(master).setAppName("StreamingLogEnhanced")
> >
> > // Create a StreamingContext with a n second batch size
> >
> > val ssc = new StreamingContext(conf, Seconds(10))
> >
> > // Create a DStream from all the input on port
> >
> > val log = Logger.getLogger(getClass.getName)
> >
> >
> >
> > sys.ShutdownHookThread {
> >
> > log.info("Gracefully stopping Spark Streaming Application")
> >
> > ssc.stop(true, true)
> >
> > log.info("Application stopped")
> >
> > }
> >
> > val lines = ssc.socketTextStream("localhost", )
> >
> > // Create a count of log hits by ip
> >
> > var ipCounts=countByIp(lines)
> >
> > ipCounts.print()
> >
> >
> >
> > // start our streaming context and wait for it to "finish"
> >
> > ssc.start()
> >
> > // Wait for 600 seconds then exit
> >
> > ssc.awaitTermination(1*600)
> >
> > ssc.stop()
> >
> > }
> >
> >
> >
> > def countByIp(lines: DStream[String]) = {
> >
> >val parser = new AccessLogParser
> >
> >val accessLogDStream = lines.map(line => parser.parseRecord(line))
> >
> >val ipDStream = accessLogDStream.map(entry =>
> >
> > (entry.get.clientIpAddress, 1))
> >
> >ipDStream.reduceByKey((x, y) => x + y)
> >
> > }
> >
> >
> >
> > }
> >
> > Thanks for any suggestions in advance.
> >
> >
> >
> >
> >
> >
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Thanks & Regards
Sachin Aggarwal
7760502772
ot;# old RDDs = " + oldRDDs.size)
// Get the RDDs of the reduced values in "new time steps"
val newRDDs =
reducedStream.slice(previousWindow.endTime +
parent.slideDuration, currentWindow.endTime)
logDebug("# new RDDs = " + newRDDs.size)
--
Thanks & Regards
Sachin Aggarwal
7760502772
I am sorry for spam, I replied in wrong thread sleepy head :-(
On Fri, Feb 5, 2016 at 1:15 AM, Sachin Aggarwal
wrote:
>
> http://coenraets.org/blog/2011/11/set-up-an-amazon-ec2-instance-with-tomcat-and-mysql-5-minutes-tutorial/
>
> The default Tomcat server uses port 8080. You need
AWS Management Console, select
Security Groups (left navigation bar), select the quick-start group, the
Inbound tab and add port 8080. Make sure you click “Add Rule” and then
“Apply Rule Changes”.
On Fri, Feb 5, 2016 at 1:14 AM, Sachin Aggarwal
wrote:
> i think we need to add port
>
gt;>> }
>>>>>
>>>>> def onStop() {
>>>>> // There is nothing much to do as the thread calling receive()
>>>>> // is designed to stop by itself isStopped() returns false
>>>>> }
>>>>>
>>>>> /** Create a socket connection and receive data until receiver is
>>>>> stopped
>>>>> */
>>>>> private def receive() {
>>>>> while(!isStopped()) {
>>>>> store("I am a dummy source " + Random.nextInt(10))
>>>>> Thread.sleep((1000.toDouble / ratePerSec).toInt)
>>>>> }
>>>>> }
>>>>> }
>>>>> {code}
>>>>>
>>>>> The given issue resides in the following
>>>>> *MapWithStateRDDRecord.updateRecordWithData*, starting line 55, in the
>>>>> following code block:
>>>>>
>>>>> {code}
>>>>> dataIterator.foreach { case (key, value) =>
>>>>> wrappedState.wrap(newStateMap.get(key))
>>>>> val returned = mappingFunction(batchTime, key, Some(value),
>>>>> wrappedState)
>>>>> if (wrappedState.isRemoved) {
>>>>> newStateMap.remove(key)
>>>>> } else if (wrappedState.isUpdated ||
>>>>> timeoutThresholdTime.isDefined)
>>>>> /* <--- problem is here */ {
>>>>> newStateMap.put(key, wrappedState.get(),
>>>>> batchTime.milliseconds)
>>>>> }
>>>>> mappedData ++= returned
>>>>> }
>>>>> {code}
>>>>>
>>>>> In case the stream has a timeout set, but the state wasn't set at all,
>>>>> the
>>>>> "else-if" will still follow through because the timeout is defined but
>>>>> "wrappedState" is empty and wasn't set.
>>>>>
>>>>> If it is mandatory to update state for each entry of *mapWithState*,
>>>>> then
>>>>> this code should throw a better exception than
>>>>> "NoSuchElementException",
>>>>> which doesn't really saw anything to the developer.
>>>>>
>>>>> I haven't provided a fix myself because I'm not familiar with the spark
>>>>> implementation, but it seems to be there needs to either be an extra
>>>>> check
>>>>> if the state is set, or as previously stated a better exception
>>>>> message.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/PairDStreamFunctions-mapWithState-fails-in-case-timeout-is-set-without-updating-State-S-tp26147.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Yuval Itzchakov.
>>>
>>
>>
--
Thanks & Regards
Sachin Aggarwal
7760502772
StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
4/4 (3 skipped)
219 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
4/4
--
Thanks & Regards
Sachin Aggarwal
7760502772
Hi,
has anyone faced this error, is there any workaround to this issue?
thanks
On Thu, Dec 3, 2015 at 4:28 PM, Sahil Sareen wrote:
> Attaching the JIRA as well for completeness:
> https://issues.apache.org/jira/browse/SPARK-12117
>
> On Thu, Dec 3, 2015 at 4:13 PM, Sac
Hi All,
need help guys, I need a work around for this situation
*case where this works:*
val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"),
("Rishabh", "2"))).toDF("myText", "id")
TestDoc1.select(callUDF(&quo
19 matches
Mail list logo