s), and we have switched
>
> from RDD to Dataset recently.
>
>
> We've found that the unit test takes much longer. We profiled it and
>
> have found that it's the planning phase that is slow, not execution.
>
>
> I wonder if anyone has encountered this issue b
is helps,
Enrico
Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn:
Hi All,
Our data job is very complex (e.g. 100+ joins), and we have switched
from RDD to Dataset recently.
We've found that the unit test takes much longer. We profiled it and
have found that it's the planning phase th
Hi All,
Our data job is very complex (e.g. 100+ joins), and we have switched from
RDD to Dataset recently.
We've found that the unit test takes much longer. We profiled it and have
found that it's the planning phase that is slow, not execution.
I wonder if anyone has encountered
Are you using IvyVPN which causes this problem? If the VPN software changes
the network URL silently you should avoid using them.
Regards.
On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar
wrote:
> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
You would have to make it available? This doesn't seem like a spark issue.
On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar wrote:
> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is pointing to ivy location which cannot be reached
Hi Spark Team
I am building a spark in VPN . But the unit test case below is failing.
This is pointing to ivy location which cannot be reached within VPN . Any
help would be appreciated
test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") {
*sc *= new SparkC
7;first', 'second'])
print(df.show())
df2 = spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])
assert df.subtract(df2).count() == 0
On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka
wrote:
> Hi Users,
>
> I have to write Un
Hi Users,
I have to write Unit Test cases for PySpark.
I think pytest-spark and "spark testing base" are good test libraries.
Can anyone please provide full reference for writing the test cases in
Python using these?
Kind Regards,
Sachit Murarka
en can be apply to.
>>>
>>> So, in your test case, whole-stage codegen has been already enabled!!
>>>
>>> FYI. I think that it is a good topic for d...@spark.apache.org.
>>>
>>> Kazuaki Ishizaki
>>>
>>>
>>>
>>> Fro
a good topic for d...@spark.apache.org.
>>
>> Kazuaki Ishizaki
>>
>>
>>
>> From:Koert Kuipers
>> To:"user@spark.apache.org"
>> Date:2017/04/05 05:12
>> Subject:how do i force unit test to do whole stage codegen
&
;
> FYI. I think that it is a good topic for d...@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:Koert Kuipers
> To:"user@spark.apache.org"
> Date:2017/04/05 05:12
> Subject:how do i force unit test to do whole stage codegen
ly to.
>
> So, in your test case, whole-stage codegen has been already enabled!!
>
> FYI. I think that it is a good topic for d...@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:Koert Kuipers
> To:"user@spark.apache.org"
> Date:
opic for d...@spark.apache.org.
Kazuaki Ishizaki
From: Koert Kuipers
To: "user@spark.apache.org"
Date: 2017/04/05 05:12
Subject: how do i force unit test to do whole stage codegen
i wrote my own expression with eval and doGenCode, but doGenCode never
gets called in test
i wrote my own expression with eval and doGenCode, but doGenCode never gets
called in tests.
also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
someth
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>
+1 to this suggestion. This is why we wanted streaming and batch
dataframes to share the same API.
ali wrote:
>
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to spawn
> master and worker JVM's either in one or mult
dataframe and
return one, then you assert on the returned df
Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali wrote:
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need
Hi All,
How to unit test spark streaming or spark in general? How do I test the
results of my transformations? Also, more importantly don't we need to
spawn master and worker JVM's either in one or multiple nodes?
Thanks!
kant
After I created two test case that FlatSpec with DataFrameSuiteBase. But I
got errors when do sbt test. I was able to run each of them separately. My
test cases does use sqlContext to read files. Here is the exception stack.
Judging from the exception, I may need to unregister RpcEndpoint after ea
Subject: Re: How this unit test passed on master trunk?
From: zzh...@hortonworks.com
To: java8...@hotmail.com; gatorsm...@gmail.com
CC: user@spark.apache.org
Date: Sun, 24 Apr 2016 04:37:11 +
There are multiple records for the DF
scala> structDF.groupBy($"a").agg(min(st
uct(1, 2). Please check how the Ordering is
implemented in InterpretedOrdering.
The output itself does not have any ordering. I am not sure why the unit test
and the real env have different environment.
Xiao,
I do see the difference between unit test and local cluster run. Do you know
the reaso
"))).first()
first: org.apache.spark.sql.Row = [1,[1,1]]
BTW
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/715/consoleFull
shows this test passing.
On Fri, Apr 22, 2016 at 11:23 AM, Yong Zhang wrote:
> Hi,
>
> I was trying to find out why this unit test can pass
Hi,
I was trying to find out why this unit test can pass in Spark code.
inhttps://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
for this unit test:
test("Star Expansion - CreateStruct and CreateArray") {
val structDf = testDa
rk-testing-base
>>>>
>>>> DataFrame examples are here:
>>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>>
>>>> Thanks,
>>>> Silvi
m/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>
>>> Thanks,
>>> Silvio
>>>
>>> From: Steve Annessa
>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>> To: "user@spark.apache.org"
>>> Subject: Unit test
/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>
>> Thanks,
>> Silvio
>>
>> From: Steve Annessa
>> Date: Thursday, February 4, 2016 at 8:36 PM
>> To: "user@spark.apache.org"
>> Subject: Unit test with sqlContext
>
e/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>
> Thanks,
> Silvio
>
> From: Steve Annessa
> Date: Thursday, February 4, 2016 at 8:36 PM
> To: "user@spark.apache.org"
> Subject: Unit test with sqlContext
>
> I'
/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
Thanks,
Silvio
From: Steve Annessa mailto:steve.anne...@gmail.com>>
Date: Thursday, February 4, 2016 at 8:36 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.or
I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.
The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
I've created a bootstrap spec for spark jobs that instantiates the Spa
try:
mvn test -pl sql -DwildcardSuites=org.apache.spark.sql -Dtest=none
On 12 Nov 2015, at 03:13, weoccc mailto:weo...@gmail.com>>
wrote:
Hi,
I am wondering how to run unit test for specific spark component only.
mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest
Have you tried the following ?
build/sbt "sql/test-only *"
Cheers
On Wed, Nov 11, 2015 at 7:13 PM, weoccc wrote:
> Hi,
>
> I am wondering how to run unit test for specific spark component only.
>
> mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest
Hi,
I am wondering how to run unit test for specific spark component only.
mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none
The above command doesn't seem to work. I'm using spark 1.5.
Thanks,
Weide
I'd suggest setting sbt to fork when running tests.
On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis
wrote:
> Thanks for your response Yana,
>
> I can increase the MaxPermSize parameter and it will allow me to run the
> unit test a few more times before I run out of memory.
Thanks for your response Yana,
I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory.
However, the primary issue is that running the same unit test in the same
JVM (multiple times) results in increased memory (each run of
test
On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis
wrote:
> Hello,
>
> I am using sbt and created a unit test where I create a `HiveContext` and
> execute some query and then return. Each time I run the unit test the JVM
> will increase it's memory usage until I get
Hello,
I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:
Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exce
> Do you get this failure repeatedly?
>
>
>
> On Thu, May 14, 2015 at 12:55 AM, kf wrote:
>
>> Hi, all, i got following error when i run unit test of spark by
>> dev/run-tests
>> on the latest "branch-1.4" branch.
>>
>> the latest com
Yes it is repeatedly on my locally Jenkins.
发自我的 iPhone
在 2015年5月14日,18:30,"Tathagata Das"
mailto:t...@databricks.com>> 写道:
Do you get this failure repeatedly?
On Thu, May 14, 2015 at 12:55 AM, kf
mailto:wangf...@huawei.com>> wrote:
Hi, all, i got following error w
Do you get this failure repeatedly?
On Thu, May 14, 2015 at 12:55 AM, kf wrote:
> Hi, all, i got following error when i run unit test of spark by
> dev/run-tests
> on the latest "branch-1.4" branch.
>
> the latest commit id:
> commit d518c0369fa412567855980c3f0f42
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest "branch-1.4" branch.
the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing
Date: Wed May 13 17:58:29 2015 -0700
error
[
I'm also getting the same error.
Any ideas?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html
Sent from the Apache Spark User List mailing list archive at Nabbl
It's because your tests are running in parallel and you can only have one
context running at a time.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archi
Unknown Source)
> [info] at java.lang.ClassLoader.defineClass(Unknown Source)
> [info] at java.security.SecureClassLoader.defineClass(Unknown Source)
> [info] at java.net.URLClassLoader.defineClass(Unknown Source)
> [info] at java.net.URLClassLoader.access$100(Unknown Source)
Hi experts,
I am trying to write unit tests for my spark application which fails with
javax.servlet.FilterRegistration error.
I am using CDH5.3.2 Spark and below is my dependencies list.
val spark = "1.2.0-cdh5.3.2"
val esriGeometryAPI = "1.2"
val csvWriter = "1.0.0"
Hi,
I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I
was able to run this test fine:
test("Sliding window join with 3 second window duration") {
val input1 =
Seq(
Seq("req1"),
Seq("req2", "req3"),
Seq(),
Seq("req4", "req5", "req6"),
gt; >
> > logger.warn("!!! DEBUG !!! target: {}", r.getURI());
> >
> > String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
> >//.header("")
> >.get(String.class);
> >
&g
t;
> logger.warn("!!! DEBUG !!! target: {}", r.getURI());
>
> String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
>//.header("")
>.get(String.class);
>
> logger.warn("!!! DEBUG !!! Spotlight resp
!! Spotlight response: {}", response);
It seems to work when I use spark-submit to submit the application that
includes this code.
Funny thing is, now my relevant unit test does not run, complaining about
not having enough memory:
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen wrote:
> I'd take a look with 'mvn dependency:tree' on your own code first.
> Maybe you are including JavaEE 6 for example?
>
For reference, my complete pom.xml looks like:
http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
http://www.w3.org/2001/XMLSchem
!!! target: {}", target.getUri().toString());
>
> String response =
> target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
>
> logger.warn("!!! DEBUG !!! Spotlight response: {}", response);
>
> When run inside a unit test as follo
}", target.getUri().toString());
String response =
target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
logger.warn("!!! DEBUG !!! Spotlight response: {}", response);
When run inside a unit test as follows:
mvn clean test -Dtest=SpotlightTest#testC
.
Best,
Burak
- Original Message -
From: "Emre Sevinc"
To: user@spark.apache.org
Sent: Monday, December 8, 2014 2:36:41 AM
Subject: How can I make Spark Streaming count the words in a file in a unit
test?
Hello,
I've successfully built a very simple Spark Streaming appl
on to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.
Now I've tried to create a very basic unit test for this functionality, but
in the test I was n
001560.n3.nabble.com/Cannot-run-unit-test-tp14459p14506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
test
junit
junit
4.8.1
test
org.scalatest
scalatest_2.10
2.2.1
test
Thank you very much!
Best Regards,
Jiajia
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streami
Does it not show the name of the testsuite on stdout, showing that it has
passed? Can you try writing a small "test" unit-test, in the same way as
your kafka unit test, and with print statements on stdout ... to see
whether it works? I believe it is some configuration issue in maven, whi
e test? Are there any other
methods can be used to run this test?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
Sent from the Apache Spark User List
ng to run the KafkaStreamSuite.scala unit
> test.
> I added "scalatest-maven-plugin" to my pom.xml, then ran "mvn test", and got
> the follow error message:
>
> error: object Utils in package util cannot be accessed in package
> org.apache.spark.util
>
Hi TD,
I encountered a problem when trying to run the KafkaStreamSuite.scala unit
test.
I added "scalatest-maven-plugin" to my pom.xml, then ran "mvn test", and got
the follow error message:
error: object Utils in package util cannot be accessed in package
o
This helps a lot!!
Thank you very much!
Jiajia
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Appropriately timed question! Here is the PR that adds a real unit
test for Kafka stream in Spark Streaming. Maybe this will help!
https://github.com/apache/spark/pull/1751/files
On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing wrote:
> Hello Spark Users,
>
> I have a spark streaming pro
Hello Spark Users,
I have a spark streaming program that stream data from kafka topics and
output as parquet file on HDFS.
Now I want to write a unit test for this program to make sure the output
data is correct (i.e not missing any data from kafka).
However, I have no idea about how to do this
gt;>>>>> at junit.framework.TestSuite.runTest(TestSuite.java:232)
>>>>>> at junit.framework.TestSuite.run(TestSuite.java:227)
>>>>>> at
>>>>>> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
>>>>>>
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
hy the path
includes "null" though.
Could you provide the full stack trace?
Andrew
2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev
:
Hi all,
I'm trying to run some transformation on Spark, it works fine on cluster (YARN,
linux machines). However, when I'm trying to run
g.reflect.Method.invoke(Method.java:606)
>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
>>
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>>
>> On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or wrote:
>>
>>> Hi Konstati
udes "null" though.
>>
>> Could you provide the full stack trace?
>>
>> Andrew
>>
>>
>> 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev <
>> kudryavtsev.konstan...@gmail.com>:
>>
>> Hi all,
>>>
120)
>>
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>>
>> On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or wrote:
>> Hi Konstatin,
>>
>> We use hadoop as a library in a few places in Spark. I wonder why the path
>> includ
Andrew Or wrote:
>> Hi Konstatin,
>>
>> We use hadoop as a library in a few places in Spark. I wonder why the path
>> includes "null" though.
>>
>> Could you provide the full stack trace?
>>
>> Andrew
>>
>>
>> 2014-07
> 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev <
> kudryavtsev.konstan...@gmail.com>:
>
> Hi all,
>>
>> I'm trying to run some transformation on *Spark*, it works fine on
>> cluster (YARN, linux machines). However, when I'm trying to run it on local
>&g
t; I'm trying to run some transformation on *Spark*, it works fine on
> cluster (YARN, linux machines). However, when I'm trying to run it on local
> machine (*Windows 7*) under unit test, I got errors:
>
> java.io.IOException: Could not locate executable null\b
Hi all,
I'm trying to run some transformation on *Spark*, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(*Windows 7*) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe
in
ent:* Wednesday, June 18, 2014 12:33 AM
*To:* user@spark.apache.org
*Subject:* Re: Unit test failure: Address already in use
Hi,
Could your problem come from the fact that you run your tests in
parallel ?
If you are spark in local mode, you cannot have concurrent spark
instances running. this means tha
,
Todd
From: Anselme Vignon [mailto:anselme.vig...@flaminem.com]
Sent: Wednesday, June 18, 2014 12:33 AM
To: user@spark.apache.org
Subject: Re: Unit test failure: Address already in use
Hi,
Could your problem come from the fact that you run your tests in parallel ?
If you are spark in local mode
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
erSocketChannelImpl.java:139)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
My unit test is failing (the output is not matching the expected output). I
would like to printout the value of the output. But
rdd.foreach(r=>println(r)) does not work from the unit test. How can I print
or write out the output to a file/screen?
thanks.
--
View this message in cont
ich) -> Elasticsearch => Spark (map/reduce) ->
HBase
2.
Can Spark read data from elasticsearch? What is the prefered way for this?
b0c1
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html
Sent from the Apache Spark User List mailing lis
79 matches
Mail list logo