unsubscribe

2024-05-10 Thread J UDAY KIRAN
unsubscribe

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-20 Thread Kiran Kumar Dusi
+1 On Thu, 21 Mar 2024 at 7:46 AM, Farshid Ashouri wrote: > +1 > > On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh, > wrote: > >> Some of you may be aware that Databricks community Home | Databricks >> have just launched a knowledge sharing hub. I thought it would be a >> good idea for the Apache Sp

Unsubscribe

2023-11-07 Thread Kiran Kumar Dusi
Unsubscribe

Autoscaling in Spark

2023-10-10 Thread Kiran Biswal
Hello Experts Is there any true auto scaling option for spark? The dynamic auto scaling works only for batch. Any guidelines on spark streaming autoscaling and how that will be tied to any cluster level autoscaling solutions? Thanks

Driver throws exception every few hours

2022-09-19 Thread Kiran Biswal
Hello Experts Seeing below exceptions thrown by the spark driver every few hours. Using spark 3.3.0 com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392 Caused by: com.fasterxml.jackson.databind.JsonMappingException: timeout (through reference chain: io

Structured streaming with protobuf proto3 schema registry

2022-06-06 Thread Kiran Biswal
ctured streaming consuming data from kafka topic+uses schema registry -> convert to spark data frame Thanks Kiran

Re: protobuf data as input to spark streaming

2022-05-30 Thread Kiran Biswal
Hello Stelios, friendly reminder if you could share any sample code/repo Are you using a schema registry? Thanks Kiran On Fri, Apr 8, 2022 at 4:37 PM Kiran Biswal wrote: > Hello Stelios > > Just a gentle follow up if you can share any sample code/repo > > Regards > Kiran

Re: protobuf data as input to spark streaming

2022-04-08 Thread Kiran Biswal
Hello Stelios Just a gentle follow up if you can share any sample code/repo Regards Kiran On Wed, Apr 6, 2022 at 3:19 PM Kiran Biswal wrote: > Hello Stelios > > Preferred language would have been Scala or pyspark but if Java is proven > I am open to using it > > Any s

Re: protobuf data as input to spark streaming

2022-04-06 Thread Kiran Biswal
Hello Stelios Preferred language would have been Scala or pyspark but if Java is proven I am open to using it Any sample reference or example code link? How are you handling the peotobuf to spark dataframe conversion (serialization federalization)? Thanks Kiran On Wed, Apr 6, 2022, 2:38 PM

protobuf data as input to spark streaming

2022-04-05 Thread Kiran Biswal
Hello Experts Has anyone used protobuf (proto3) encoded data (from kafka) as input source and been able to do spark structured streaming? I would appreciate if you can share any sample code/example Regards Kiran >

Re: Spark DStream application memory leak debugging

2021-09-27 Thread Kiran Biswal
=application_heap_dump.bin 16 bash: jmap: command not found bash-5.1$ jmap bash: jmap: command not found Thanks Kiran On Sat, Sep 25, 2021 at 5:28 AM Sean Owen wrote: > It could be 'normal' - executors won't GC unless they need to. > It could be state in your application, if

Spark DStream application memory leak debugging

2021-09-25 Thread Kiran Biswal
hours until it reaches max allocated memory and then it stays at that value. No matter how high I allocate to the executor this pattern is seen. I suspect memory leak Any guidance you may be able provide as to how to debug will be highly appreciated Thanks in advance Regards Kiran

java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes

2021-08-28 Thread Kiran Biswal
Hello Experts During a join operation, I see this error below (spark 3.0.2) Any suggestions on how to debug? Error: java.lang.AssertionError: assertion failed: Found duplicate rewrite attribute Source code: val dfFilteredFinal=dfFiltered .join(dfScenarioSite, Seq("tid","site"), "left_oute

Re: class KafkaCluster related errors

2021-06-07 Thread Kiran Biswal
*The getConsumerOffsets *method internally used KafkaCluter which is probably deprecated. Do You think I need to mimic the code shown here to get/set offsets rather than use kafkaCluster? https://spark.apache.org/docs/3.0.0-preview/streaming-kafka-0-10-integration.html Thanks Kiran On Mon, Jun 7, 2

class KafkaCluster related errors

2021-06-06 Thread Kiran Biswal
://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/streaming/kafka/KafkaCluster.html just looking for ideas how to achieve same functionality in spark 3.0.1. Any thoughts and examples will be highly appreciated. Thanks Kiran

unsubscribe

2020-05-12 Thread Kiran B
Thank you, Kiran,

Spark SQL Error

2018-10-25 Thread Sai Kiran Kodukula
Hi all, I am getting the following error message in one of my Spark SQL's. I realize this may be related to the version of Spark or a configuration change but want to know the details and resolution. Thanks spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current version of c

Re: RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows

2017-06-09 Thread Manjunath, Kiran
Can you post your code and sample input? That should help us understand if there is a bug in the code written or with the platform. Regards, Kiran From: "Barona, Ricardo" Date: Friday, June 9, 2017 at 10:47 PM To: "user@spark.apache.org" Subject: RDD saveAsText and

Re: Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)

2016-11-06 Thread Kiran Chitturi
+-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.156 seconds) > 0: jdbc:hive2://localhost:1> CREATE TABLE test_stored STORED AS PARQUET > LOCATION '/Users/kiran/spark/test5.parquet' AS SELECT * FROM jtest; > Error: java.lan

Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)

2016-11-06 Thread Kiran Chitturi
;id" ); > > CREATE TABLE test_stored STORED AS PARQUET LOCATION > '/Users/kiran/spark/test.parquet' AS SELECT * FROM test; but with Spark 2.0.x, the last statement throws this below error > CREATE TABLE test_stored1 STORED AS PARQUET LOCATION '

Re: GenericRowWithSchema cannot be cast to java.lang.Double : UDAF error

2016-11-04 Thread Manjunath, Kiran
scala.collection.mutable.WrappedArray.toArray(WrappedArray.scala:73) at GeometricMean.evaluate(:51) Regards, Kiran From: "Manjunath, Kiran" Date: Saturday, November 5, 2016 at 2:16 AM To: "user@spark.apache.org" Subject: GenericRowWithSchema cannot be cast to java.lang.Doubl

GenericRowWithSchema cannot be cast to java.lang.Double : UDAF error

2016-11-04 Thread Manjunath, Kiran
Exception: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to java.lang.Double at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:114) Regards, Kiran

spark dataframe rolling window for user define operation

2016-10-29 Thread Manjunath, Kiran
ndow.orderBy("c1").rowsBetween(-20, +20) var dfWithAlternate = df.withColumn( "alter",XYZ(df("c2")).over(wSpec1)) Where XYZ function can be - +,-,+,- alternatively PS : I have posted the same question at http://stackoverflow.com/questions/40318010/spark-dataframe-rolling-window-user-define-operation Regards, Kiran

Re: Any Dynamic Compilation of Scala Query

2016-10-26 Thread Manjunath, Kiran
Hi, Can you elaborate with sample example on why you would want to do so? Ideally there would be a better approach than solving such problems as mentioned below. A sample example would help to understand the problem. Regards, Kiran From: Mahender Sarangam Date: Wednesday, October 26, 2016 at

Spark Streaming Custom Receivers - How to use metadata store API during processing

2016-10-10 Thread Manjunath, Kiran
. However, it did go over my head in understanding the code and usage. https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala#L282 Any help is appreciated. Thanks! Regards, Kiran

Re: Spark metrics when running with YARN?

2016-08-30 Thread Vijay Kiran
y for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > &

Re: Spark metrics when running with YARN?

2016-08-30 Thread Vijay Kiran
From Yarm RM UI, find the spark application Id, and in the application details, you can click on the “Tracking URL” which should give you the Spark UI. ./Vijay > On 30 Aug 2016, at 07:53, Otis Gospodnetić wrote: > > Hi, > > When Spark is run on top of YARN, where/how can one get Spark metric

Re: 2.0.0: AnalysisException when reading csv/json files with dots in periods

2016-08-05 Thread Kiran Chitturi
Nevermind, there is already a Jira open for this https://issues.apache.org/jira/browse/SPARK-16698 On Fri, Aug 5, 2016 at 5:33 PM, Kiran Chitturi < kiran.chitt...@lucidworks.com> wrote: > Hi, > > During our upgrade to 2.0.0, we found this issue with one of our failing > tests

2.0.0: Hive metastore uses a different version of derby than the Spark package

2016-08-05 Thread Kiran Chitturi
or someone else. Would it make sense to update so that hive-metastore and Spark package are on the same derby version ? Thanks, -- Kiran Chitturi

2.0.0: AnalysisException when reading csv/json files with dots in periods

2016-08-05 Thread Kiran Chitturi
(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558) > at org.apache.spark.sql.Dataset.head(Dataset.scala:1924) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2139) > ... 48 elided > scala> The same happens for json files too. Is this a known issue in 2.0.0 ? Removing the field with dots from the csv/json file fixes the issue :) Thanks, -- Kiran Chitturi

Re: 2.0.0 packages for twitter streaming, flume and other connectors

2016-08-03 Thread Kiran Chitturi
oved from Spark, and can be > found at the Apache Bahir project: http://bahir.apache.org/ > > I don't think there's a release for Spark 2.0.0 yet, though (only for > the preview version). > > > On Wed, Aug 3, 2016 at 8:40 PM, Kiran Chitturi > wrote: > > Hi, >

2.0.0 packages for twitter streaming, flume and other connectors

2016-08-03 Thread Kiran Chitturi
ssing streaming packages ? If so, how can we get someone to release and publish new versions officially ? I would like to help in any way possible to get these packages released and published. Thanks, -- Kiran Chitturi

Fwd: PySpark : Filter based on resultant query without additional dataframe

2016-07-25 Thread kiran kumar
+ |US|248| |Europe| 40| +--+---+ >>> sqlsc.sql("Select _1,sum(_3) from t1 group by _1 where _c1 > 200").show() Traceback (most recent call last): File "/ghostcache/kimanjun/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJav

Spark executor crashes when the tasks are cancelled

2016-04-27 Thread Kiran Chitturi
t; (executor 2 exited caused by one of the running tasks) Reason: Remote RPC > client di Is it possible for executor to die when the jobs in the sparkContext are cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I could not find any Jiras that report this error. Sometimes,

Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
ps://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 14 April 2016 at 19:26, Josh Rosen wrot

Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
Thanks Hyukjin for the suggestion. I will take a look at implementing Solr datasource with CatalystScan. ​

Spark sql not pushing down timestamp range queries

2016-04-14 Thread Kiran Chitturi
e ranges, I would like for the timestamp filters to be pushed down to the Solr query. Are there limitations on the type of filters that are passed down with Timestamp types ? Is there something that I should do in my code to fix this ? Thanks, -- Kiran Chitturi

supporting adoc files in spark-packages.org

2016-02-10 Thread Kiran Chitturi
g if spark-packages.org can support ascii doc files in addition to README.md files. Thanks, -- Kiran Chitturi

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Vijay Kiran
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > David, > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > On Tue, Jan 12, 2

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Vijay Kiran
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > David, > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > On Tue, Jan 12, 2

Re: Fat jar can't find jdbc

2015-12-22 Thread Vijay Kiran
Can you paste your libraryDependencies from build.sbt ? ./Vijay > On 22 Dec 2015, at 06:12, David Yerrington wrote: > > Hi Everyone, > > I'm building a prototype that fundamentally grabs data from a MySQL instance, > crunches some numbers, and then moves it on down the pipeline. I've been >

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
So does not benefit from Project Tungsten right? On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > It's a completely different path. > > > On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > >> I would like to know if Hive on Spark uses or shares the execut

Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
? -Kiran

Fwd: Code generation for GPU

2015-09-03 Thread kiran lonikar
f rows in the ByteBuffer)? Is it through the property spark.sql.inMemoryColumnarStorage.batchSize? Thanks in anticipation, Kiran PS: Other things I found useful were: *Spark DataFrames*: https://www.brighttalk.com/webcast/12891/166495 *Apache Spark 1.5*: https://www.brighttalk.com/webcast/12

Re: Difference between Sort based and Hash based shuffle

2015-08-15 Thread Ravi Kiran
Have a look at this presentation. http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of help to you. On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > What are the major differences between how Sort based and Hash based > shuffle oper

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
e soon. > > On Tue, Jun 9, 2015 at 1:34 AM, kiran lonikar wrote: > >> Possibly in future, if and when spark architecture allows workers to >> launch spark jobs (the functions passed to transformation or action APIs of >> RDD), it will be possible to have RDD of RDD.

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
Possibly in future, if and when spark architecture allows workers to launch spark jobs (the functions passed to transformation or action APIs of RDD), it will be possible to have RDD of RDD. On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar wrote: > Simillar question was asked before: >

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
DD *as the code in the worker does not have access to "sc" and can not launch a spark job. Hope it helps. You need to consider List[RDD] or some other collection. -Kiran On Tue, Jun 9, 2015 at 2:25 AM, ping yan wrote: > Hi, > > > The problem I am looking at is as follo

Re: Optimisation advice for Avro->Parquet merge job

2015-06-08 Thread kiran lonikar
http://tachyon-project.org/) in the run() methods. The second for loop will also have to load from the intermediate parquet files. Then finally save the final dfInput[0] to the HDFS. I think this way of parallelizing will force the cluster to utilize the all the resources. -Kiran On Mon, Jun 8, 2015

Re: columnar structure of RDDs from Parquet or ORC files

2015-06-08 Thread kiran lonikar
tially the other parameter spark.sql.inMemoryColumnarStorage.compressed will have to be set to false since uncompressing on GPU is not so straightforward (issues of how much data each GPU thread should handle and uncoalesced memory access). -Kiran On Mon, Jun 8, 2015 at 8:25 PM, Cheng Lian wrote:

Re: columnar structure of RDDs from Parquet or ORC files

2015-06-08 Thread kiran lonikar
the forum assuming unionAll is a blocking call and said execution of multiple load and df.unionAll in different threads would benefit performance :) Kiran On 08-Jun-2015 4:37 pm, "Cheng Lian" wrote: > For DataFrame, there are also transformations and actions. And > transformations

Re: Column operation on Spark RDDs.

2015-06-08 Thread kiran lonikar
;)) val dt = dataRDD.*zipWithUniqueId*.map(_.swap) val newCol1 = *dt*.map {case (i, x) => (i, x(1)+x(18)) } val newCol2 = newCol1.join(dt).map(x=> function(.)) Hope this helps. Kiran On Fri, Jun 5, 2015 at 8:15 AM, Carter wrote: > Hi, I have a RDD with MANY columns (e.g., hundreds), and m

Re: Optimisation advice for Avro->Parquet merge job

2015-06-08 Thread kiran lonikar
// union of i and i+n/2 // showing [] only to bring out array access. Replace with dfInput(i) and dfInput(i+stride) in your code dfInput[i] = dfInput[i].unionAll(dfInput[i + stride]) } }); } executor.awaitTermination(0, TimeUnit.SECONDS) } Let

Re: columnar structure of RDDs from Parquet or ORC files

2015-06-07 Thread kiran lonikar
Thanks for replying twice :) I think I sent this question by email and somehow thought I did not sent it, hence created the other one on the web interface. Lets retain this thread since you have provided more details here. Great, it confirms my intuition about DataFrame. It's similar to Shark colu

columnar structure of RDDs from Parquet or ORC files

2015-06-03 Thread kiran lonikar
.cache().map{row => ...}? Is it a logical row which maintains an array of columns and each column in turn is an array of values for batchSize rows? -Kiran

Re: LATERAL VIEW explode issue

2015-05-20 Thread kiran mavatoor
...Which would make me guess different  context or different spark versio on the cluster you are submitting to... Sent on the new Sprint Network from my Samsung Galaxy S®4. Original message From: kiran mavatoor Date:05/20/2015 5:57 AM (GMT-05:00) To: User Subject: LATERAL VIEW

LATERAL VIEW explode issue

2015-05-20 Thread kiran mavatoor
Hi, When I use "LATERAL VIEW explode" on the registered temp table in spark shell, it works.  But when I use the same in spark-submit (as jar file) it is not working. its giving error -  "failure: ``union'' expected but identifier VIEW found" sql statement i am using is SELECT id,mapKey FROM loc

example code for current date in spark sql

2015-05-05 Thread kiran mavatoor
Hi, In Hive , I am using unix_timestamp() as 'update_on' to insert current date in 'update_on' column of the table. Now I am converting it into spark sql. Please suggest example code to insert current date and time into column of the table using spark sql.  CheersKiran.

spark sql LEFT OUTER JOIN java.lang.ClassCastException

2015-04-27 Thread kiran mavatoor
Hi There, I am using spark sql left out join query.  The sql query is  scala> val test = sqlContext.sql("SELECT e.departmentID FROM employee e LEFT OUTER JOIN department d ON d.departmentId = e.departmentId").toDF() In the spark 1.3.1 its working fine, but the latest pull is give the below error 1

Re: Hive/Hbase for low latency

2015-02-11 Thread Ravi Kiran
Hi Siddharth, With v 4.3 of Phoenix, you can use the PhoenixInputFormat and OutputFormat classes to pull/push to Phoenix from Spark. HTH Thanks Ravi On Wed, Feb 11, 2015 at 6:59 AM, Ted Yu wrote: > Connectivity to hbase is also avaliable. You can take a look at: > > examples//src/main/p

Re: NullPointerException on reading checkpoint files

2014-06-12 Thread Kiran
I am also seeing similar problem when trying to continue job using saved checkpoint. Can somebody help in solving this problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p7507.html Sent from the Ap