unsubscribe
+1
On Thu, 21 Mar 2024 at 7:46 AM, Farshid Ashouri
wrote:
> +1
>
> On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh,
> wrote:
>
>> Some of you may be aware that Databricks community Home | Databricks
>> have just launched a knowledge sharing hub. I thought it would be a
>> good idea for the Apache Sp
Unsubscribe
Hello Experts
Is there any true auto scaling option for spark? The dynamic auto scaling
works only for batch. Any guidelines on spark streaming autoscaling and
how that will be tied to any cluster level autoscaling solutions?
Thanks
Hello Experts
Seeing below exceptions thrown by the spark driver every few hours. Using
spark 3.3.0
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392
Caused by: com.fasterxml.jackson.databind.JsonMappingException:
timeout (through reference chain:
io
ctured streaming consuming data from kafka topic+uses schema registry
-> convert to spark data frame
Thanks
Kiran
Hello Stelios, friendly reminder if you could share any sample code/repo
Are you using a schema registry?
Thanks
Kiran
On Fri, Apr 8, 2022 at 4:37 PM Kiran Biswal wrote:
> Hello Stelios
>
> Just a gentle follow up if you can share any sample code/repo
>
> Regards
> Kiran
Hello Stelios
Just a gentle follow up if you can share any sample code/repo
Regards
Kiran
On Wed, Apr 6, 2022 at 3:19 PM Kiran Biswal wrote:
> Hello Stelios
>
> Preferred language would have been Scala or pyspark but if Java is proven
> I am open to using it
>
> Any s
Hello Stelios
Preferred language would have been Scala or pyspark but if Java is proven I
am open to using it
Any sample reference or example code link?
How are you handling the peotobuf to spark dataframe conversion
(serialization federalization)?
Thanks
Kiran
On Wed, Apr 6, 2022, 2:38 PM
Hello Experts
Has anyone used protobuf (proto3) encoded data (from kafka) as input source
and been able to do spark structured streaming?
I would appreciate if you can share any sample code/example
Regards
Kiran
>
=application_heap_dump.bin 16
bash: jmap: command not found
bash-5.1$ jmap
bash: jmap: command not found
Thanks
Kiran
On Sat, Sep 25, 2021 at 5:28 AM Sean Owen wrote:
> It could be 'normal' - executors won't GC unless they need to.
> It could be state in your application, if
hours
until it reaches max allocated memory and then it stays at that value. No
matter how high I allocate to the executor this pattern is seen. I suspect
memory leak
Any guidance you may be able provide as to how to debug will be highly
appreciated
Thanks in advance
Regards
Kiran
Hello Experts
During a join operation, I see this error below (spark 3.0.2)
Any suggestions on how to debug?
Error:
java.lang.AssertionError: assertion failed: Found duplicate rewrite attribute
Source code:
val dfFilteredFinal=dfFiltered
.join(dfScenarioSite, Seq("tid","site"), "left_oute
*The getConsumerOffsets *method internally used KafkaCluter which is
probably deprecated.
Do You think I need to mimic the code shown here to get/set offsets rather
than use kafkaCluster?
https://spark.apache.org/docs/3.0.0-preview/streaming-kafka-0-10-integration.html
Thanks
Kiran
On Mon, Jun 7, 2
://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/streaming/kafka/KafkaCluster.html
just looking for ideas how to achieve same functionality in spark 3.0.1.
Any thoughts and examples will be highly appreciated.
Thanks
Kiran
Thank you,
Kiran,
Hi all,
I am getting the following error message in one of my Spark SQL's. I
realize this may be related to the version of Spark or a configuration
change but want to know the details and resolution.
Thanks
spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but
current version of c
Can you post your code and sample input?
That should help us understand if there is a bug in the code written or with
the platform.
Regards,
Kiran
From: "Barona, Ricardo"
Date: Friday, June 9, 2017 at 10:47 PM
To: "user@spark.apache.org"
Subject: RDD saveAsText and
+-+--+
> | Result |
> +-+--+
> +-+--+
> No rows selected (0.156 seconds)
>
0: jdbc:hive2://localhost:1> CREATE TABLE test_stored STORED AS PARQUET
> LOCATION '/Users/kiran/spark/test5.parquet' AS SELECT * FROM jtest;
> Error: java.lan
;id" );
>
> CREATE TABLE test_stored STORED AS PARQUET LOCATION
> '/Users/kiran/spark/test.parquet' AS SELECT * FROM test;
but with Spark 2.0.x, the last statement throws this below error
> CREATE TABLE test_stored1 STORED AS PARQUET LOCATION
'
scala.collection.mutable.WrappedArray.toArray(WrappedArray.scala:73)
at GeometricMean.evaluate(:51)
Regards,
Kiran
From: "Manjunath, Kiran"
Date: Saturday, November 5, 2016 at 2:16 AM
To: "user@spark.apache.org"
Subject: GenericRowWithSchema cannot be cast to java.lang.Doubl
Exception: Job aborted due to stage failure: Task 0 in
stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID
2, localhost): java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast
to java.lang.Double
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:114)
Regards,
Kiran
ndow.orderBy("c1").rowsBetween(-20, +20)
var dfWithAlternate = df.withColumn( "alter",XYZ(df("c2")).over(wSpec1))
Where XYZ function can be - +,-,+,- alternatively
PS : I have posted the same question at
http://stackoverflow.com/questions/40318010/spark-dataframe-rolling-window-user-define-operation
Regards,
Kiran
Hi,
Can you elaborate with sample example on why you would want to do so?
Ideally there would be a better approach than solving such problems as
mentioned below.
A sample example would help to understand the problem.
Regards,
Kiran
From: Mahender Sarangam
Date: Wednesday, October 26, 2016 at
.
However, it did go over my head in understanding the code and usage.
https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala#L282
Any help is appreciated.
Thanks!
Regards,
Kiran
y for any loss,
> damage or destruction of data or any other property which may arise from
> relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
&
From Yarm RM UI, find the spark application Id, and in the application details,
you can click on the “Tracking URL” which should give you the Spark UI.
./Vijay
> On 30 Aug 2016, at 07:53, Otis Gospodnetić wrote:
>
> Hi,
>
> When Spark is run on top of YARN, where/how can one get Spark metric
Nevermind, there is already a Jira open for this
https://issues.apache.org/jira/browse/SPARK-16698
On Fri, Aug 5, 2016 at 5:33 PM, Kiran Chitturi <
kiran.chitt...@lucidworks.com> wrote:
> Hi,
>
> During our upgrade to 2.0.0, we found this issue with one of our failing
> tests
or someone else. Would it make sense to update so
that hive-metastore and Spark package are on the same derby version ?
Thanks,
--
Kiran Chitturi
(QueryExecution.scala:83)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
> at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558)
> at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
> ... 48 elided
> scala>
The same happens for json files too. Is this a known issue in 2.0.0 ?
Removing the field with dots from the csv/json file fixes the issue :)
Thanks,
--
Kiran Chitturi
oved from Spark, and can be
> found at the Apache Bahir project: http://bahir.apache.org/
>
> I don't think there's a release for Spark 2.0.0 yet, though (only for
> the preview version).
>
>
> On Wed, Aug 3, 2016 at 8:40 PM, Kiran Chitturi
> wrote:
> > Hi,
>
ssing streaming packages ?
If so, how can we get someone to release and publish new versions
officially ?
I would like to help in any way possible to get these packages released and
published.
Thanks,
--
Kiran Chitturi
+
|US|248|
|Europe| 40|
+--+---+
>>> sqlsc.sql("Select _1,sum(_3) from t1 group by _1 where _c1 > 200").show()
Traceback (most recent call last):
File
"/ghostcache/kimanjun/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py",
line 308, in get_return_value
py4j.protocol.Py4JJav
t; (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di
Is it possible for executor to die when the jobs in the sparkContext are
cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I
could not find any Jiras that report this error.
Sometimes,
ps://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 14 April 2016 at 19:26, Josh Rosen wrot
Thanks Hyukjin for the suggestion. I will take a look at implementing Solr
datasource with CatalystScan.
e ranges, I would like for the timestamp filters to
be pushed down to the Solr query.
Are there limitations on the type of filters that are passed down with
Timestamp types ?
Is there something that I should do in my code to fix this ?
Thanks,
--
Kiran Chitturi
g if spark-packages.org can support ascii doc files in addition to
README.md files.
Thanks,
--
Kiran Chitturi
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor
> On 12 Jan 2016, at 18:32, Corey Nolet wrote:
>
> David,
>
> Thank you very much for announcing this! It looks like it could be very
> useful. Would you mind providing a link to the github?
>
> On Tue, Jan 12, 2
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor
> On 12 Jan 2016, at 18:32, Corey Nolet wrote:
>
> David,
>
> Thank you very much for announcing this! It looks like it could be very
> useful. Would you mind providing a link to the github?
>
> On Tue, Jan 12, 2
Can you paste your libraryDependencies from build.sbt ?
./Vijay
> On 22 Dec 2015, at 06:12, David Yerrington wrote:
>
> Hi Everyone,
>
> I'm building a prototype that fundamentally grabs data from a MySQL instance,
> crunches some numbers, and then moves it on down the pipeline. I've been
>
So does not benefit from Project Tungsten right?
On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote:
> It's a completely different path.
>
>
> On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote:
>
>> I would like to know if Hive on Spark uses or shares the execut
?
-Kiran
f rows in the ByteBuffer)? Is
it through the property spark.sql.inMemoryColumnarStorage.batchSize?
Thanks in anticipation,
Kiran
PS:
Other things I found useful were:
*Spark DataFrames*: https://www.brighttalk.com/webcast/12891/166495
*Apache Spark 1.5*: https://www.brighttalk.com/webcast/12
Have a look at this presentation.
http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of
help to you.
On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed <
11besemja...@seecs.edu.pk> wrote:
> What are the major differences between how Sort based and Hash based
> shuffle oper
e soon.
>
> On Tue, Jun 9, 2015 at 1:34 AM, kiran lonikar wrote:
>
>> Possibly in future, if and when spark architecture allows workers to
>> launch spark jobs (the functions passed to transformation or action APIs of
>> RDD), it will be possible to have RDD of RDD.
Possibly in future, if and when spark architecture allows workers to launch
spark jobs (the functions passed to transformation or action APIs of RDD),
it will be possible to have RDD of RDD.
On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar wrote:
> Simillar question was asked before:
>
DD *as the code in the worker does not
have access to "sc" and can not launch a spark job.
Hope it helps. You need to consider List[RDD] or some other collection.
-Kiran
On Tue, Jun 9, 2015 at 2:25 AM, ping yan wrote:
> Hi,
>
>
> The problem I am looking at is as follo
http://tachyon-project.org/) in the run() methods. The second for loop
will also have to load from the intermediate parquet files. Then finally
save the final dfInput[0] to the HDFS.
I think this way of parallelizing will force the cluster to utilize the all
the resources.
-Kiran
On Mon, Jun 8, 2015
tially the
other parameter spark.sql.inMemoryColumnarStorage.compressed will have to
be set to false since uncompressing on GPU is not so straightforward
(issues of how much data each GPU thread should handle and uncoalesced
memory access).
-Kiran
On Mon, Jun 8, 2015 at 8:25 PM, Cheng Lian wrote:
the forum assuming unionAll is a blocking call and
said execution of multiple load and df.unionAll in different threads would
benefit performance :)
Kiran
On 08-Jun-2015 4:37 pm, "Cheng Lian" wrote:
> For DataFrame, there are also transformations and actions. And
> transformations
;))
val dt = dataRDD.*zipWithUniqueId*.map(_.swap)
val newCol1 = *dt*.map {case (i, x) => (i, x(1)+x(18)) }
val newCol2 = newCol1.join(dt).map(x=> function(.))
Hope this helps.
Kiran
On Fri, Jun 5, 2015 at 8:15 AM, Carter wrote:
> Hi, I have a RDD with MANY columns (e.g., hundreds), and m
// union of i and i+n/2
// showing [] only to bring out array access. Replace with
dfInput(i) and dfInput(i+stride) in your code
dfInput[i] = dfInput[i].unionAll(dfInput[i + stride])
}
});
}
executor.awaitTermination(0, TimeUnit.SECONDS)
}
Let
Thanks for replying twice :) I think I sent this question by email and
somehow thought I did not sent it, hence created the other one on the web
interface. Lets retain this thread since you have provided more details
here.
Great, it confirms my intuition about DataFrame. It's similar to Shark
colu
.cache().map{row => ...}?
Is it a logical row which maintains an array of columns and each column in
turn is an array of values for batchSize rows?
-Kiran
...Which
would make me guess different context or different spark versio on the cluster
you are submitting to...
Sent on the new Sprint Network from my Samsung Galaxy S®4.
Original message From: kiran mavatoor Date:05/20/2015 5:57 AM
(GMT-05:00) To: User Subject: LATERAL VIEW
Hi,
When I use "LATERAL VIEW explode" on the registered temp table in spark shell,
it works. But when I use the same in spark-submit (as jar file) it is not
working. its giving error - "failure: ``union'' expected but identifier VIEW
found"
sql statement i am using is
SELECT id,mapKey FROM loc
Hi,
In Hive , I am using unix_timestamp() as 'update_on' to insert current date in
'update_on' column of the table. Now I am converting it into spark sql. Please
suggest example code to insert current date and time into column of the table
using spark sql.
CheersKiran.
Hi There,
I am using spark sql left out join query.
The sql query is
scala> val test = sqlContext.sql("SELECT e.departmentID FROM employee e LEFT
OUTER JOIN department d ON d.departmentId = e.departmentId").toDF()
In the spark 1.3.1 its working fine, but the latest pull is give the below error
1
Hi Siddharth,
With v 4.3 of Phoenix, you can use the PhoenixInputFormat and
OutputFormat classes to pull/push to Phoenix from Spark.
HTH
Thanks
Ravi
On Wed, Feb 11, 2015 at 6:59 AM, Ted Yu wrote:
> Connectivity to hbase is also avaliable. You can take a look at:
>
> examples//src/main/p
I am also seeing similar problem when trying to continue job using saved
checkpoint. Can somebody help in solving this problem?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p7507.html
Sent from the Ap
61 matches
Mail list logo