Phrase Search using Apache Spark in huge amount of text in files

2019-05-28 Thread Sandeep Giri
on this. This is in very early stages and hacky and probably would require more testing. Regards, Sandeep Giri, www.CloudxLab.com <http://www.cloudxlab.com/>

Does Spark shows logical or physical plan when executing job on the yarn cluster

2018-05-20 Thread giri ar
Hi, Good Day. Could you please let me know whether we can see spark logical or physical plan while running spark job on the yarn cluster( Eg: like number of stages) Thanks in advance. Thanks, Giri

Re: Not able pass 3rd party jars to mesos executors

2016-05-11 Thread Giri P
passed to spark-submit should be > URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically > upload local jars.**http://spark.apache.org/docs/latest/running-on-mesos.html > <http://spark.apache.org/docs/latest/running-on-mesos.html> * > > On Wed, May 11, 2016 at 10:

Re: Not able pass 3rd party jars to mesos executors

2016-05-11 Thread Giri P
I'm not using docker On Wed, May 11, 2016 at 8:47 AM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > By any chance, are you using docker to execute? > On 11 May 2016 21:16, "Raghavendra Pandey" > wrote: > >> On 11 May 2016 02:13, "gpatcham" wrote: >> >> > >> >> > Hi All, >> > >> >

Re: using spark context in map funciton TASk not serilizable error

2016-01-20 Thread Giri P
method1 looks like this reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) reRDD has userId's def method1(sc:SparkContext , userId: string){ sc.cassandraTable("Keyspace", "Table2").where("userid = ?" userId) ...do something return "Test" } On Wed, Jan 20, 2016 at 11:00 AM, Shixiong(Rya

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
; > Doesn't seem to be good practice. > > On Mon, Jan 18, 2016 at 1:27 PM, Giri P wrote: > >> Can we use @transient ? >> >> >> On Mon, Jan 18, 2016 at 12:44 PM, Giri P wrote: >> >>> I'm using spark cassandra connector to do this and the wa

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
Can we use @transient ? On Mon, Jan 18, 2016 at 12:44 PM, Giri P wrote: > I'm using spark cassandra connector to do this and the way we access > cassandra table is > > sc.cassandraTable("keySpace", "tableName") > > Thanks > Giri > > On Mon, J

Re: using spark context in map funciton TASk not serilizable error

2016-01-18 Thread Giri P
I'm using spark cassandra connector to do this and the way we access cassandra table is sc.cassandraTable("keySpace", "tableName") Thanks Giri On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu wrote: > Can you pass the properties which are needed for accessing Cassa

Re: Maintaining overall cumulative data in Spark Streaming

2015-10-30 Thread Sandeep Giri
How to we reset the aggregated statistics to null? Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other site

RE: Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
Yes, update state by key worked. Though there are some more complications. On Oct 30, 2015 8:27 AM, "skaarthik oss" wrote: > Did you consider UpdateStateByKey operation? > > > > *From:* Sandeep Giri [mailto:sand...@knowbigdata.com] > *Sent:* Thursday, October 29, 201

Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Sandeep Giri
StreamRDD with aggregated count and keep doing a fullouterjoin but didn't work. Seems like the StreamRDD gets reset. Kindly help. Regards, Sandeep Giri

Re: SPARK SQL Error

2015-10-15 Thread Giri
Main$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks & Regards, Giri. -- View this m

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
I think it should be possible by loading collections as RDD and then doing a union on them. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.co

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek wrote: > Hello , > > > > Is there any way to query multiple collections from mongodb using spark > and java. And i want to create only one Configuration Object. Please help > if anyone has something regarding this. > > > > > > Thank Y

Re: query avro hive table in spark sql

2015-08-28 Thread Giri P
Any idea what causing this error 15/08/28 21:03:03 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 9.0 (TID 20, dtord01hdw0228p.dc.dotomi.net): java.lang.RuntimeException: cannot find field message_campaign_id from [0:error_error_error_error_error_error_error, 1:cannot_determine_schema, 2:c

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
l > From: gpatc...@gmail.com > To: java8...@hotmail.com > CC: mich...@databricks.com; user@spark.apache.org > > > can we run hive queries using spark-avro ? > > In our case its not just reading the avro file. we have view in hive which > is based on multiple tables. > > On Thu, A

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P wrote: > we are using hive1.1 . > > I was able to fix below error when I used right version spark

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
ext to run queries in our application Any idea if this issue might be coz of querying across different schema version of data ? Thanks Giri On Thu, Aug 27, 2015 at 5:39 AM, java8964 wrote: > What version of the Hive you are using? And do you compile to the right > version of Hive when yo

Re: Spark Interview Questions

2015-08-18 Thread Sandeep Giri
Thank you All. I have updated it to a little better version. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image

Re: Spark Interview Questions

2015-08-17 Thread Sandeep Giri
This statement is from the Spark's website itself. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other

Re: Spark Interview Questions

2015-07-30 Thread Sandeep Giri
i have prepared some interview questions: http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2 please provide your feedback. On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez wrote: > You might look at the edx

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Sandeep Giri
Even for 2L records the MySQL will be better. Regards, Sandeep Giri, +1-253-397-1945 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other site icon] <http://knowbigdata.com>

Re: resource allocation spark on yarn

2014-12-12 Thread Giri P
but on spark 0.9 we don't have these options --num-executors: controls how many executors will be allocated --executor-memory: RAM for each executor --executor-cores: CPU cores for each executor On Fri, Dec 12, 2014 at 12:27 PM, Sameer Farooqui wrote: > > Hi, > > FYI - There are no Worker JVMs