s(new Path(""))
if (fileExists) println("File exists!")
else println("File doesn't exist!")
Not sure that will help you or not, just a thought.
-Todd
On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh
wrote:
> Thanks Brandon!
>
> i should have reme
You may want to make sure you include the jar of P4J and your plugins as
part of the following so that both the driver and executors have access.
If HDFS is out then you could
make a common mount point on each of the executor nodes so they have access
to the classes.
- spark-submit --jars /com
A little late, but have you looked at https://livy.incubator.apache.org/,
works well for us.
-Todd
On Thu, Mar 28, 2019 at 9:33 PM Jason Nerothin
wrote:
> Meant this one: https://docs.databricks.com/api/latest/jobs.html
>
> On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote:
>
&g
Hi Tomas,
Have you considered using something like https://www.alluxio.org/ for you
cache? Seems like a possible solution for what your trying to do.
-Todd
On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote:
> Hi ,Tomas.
> Thanks for your question give me some prompt.But the best way use
maxRatePerPartition and backpressure.enabled. I thought that maxRate was
not applicable when using back pressure, but may be mistaken.
-Todd
On Thu, Jul 26, 2018 at 8:46 AM Biplob Biswas
wrote:
> Hi Todd,
>
> Thanks for the reply. I have the mayxRatePerPartition set as well. Below
>
tch* when the backpressure mechanism is
> enabled.
If you set the maxRatePerPartition and apply the above formula, I believe
you will be able to achieve the results you are looking for.
HTH.
-Todd
On Thu, Jul 26, 2018 at 7:21 AM Biplob Biswas
wrote:
> Did anyone face similar issue
well for us. We did
the extract route originally, but with the native Exasol connector it is
just a performant as the extract.
HTH.
-Todd
On Mon, Jan 30, 2017 at 10:15 PM, Jörn Franke wrote:
> With a lot of data (TB) it is not that good, hence the extraction.
> Otherwise you have to wait
-compatibility
The JIRA, https://datastax-oss.atlassian.net/browse/SPARKC/, does not seem
to show any outstanding issues with regards to 3.0.8 and 2.0 of Spark or
Spark Cassandra Connector.
HTH.
-Todd
On Tue, Sep 20, 2016 at 1:47 AM, muhammet pakyürek
wrote:
>
>
> please tell me the conf
Hi Mich,
Have you looked at Apache Ignite? https://apacheignite-fs.readme.io/docs.
This looks like something that may be what your looking for:
http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin
HTH.
-Todd
On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh wrote
*val sparkContext = new SparkContext(sparkConf)*
val HiveContext = new HiveContext(streamingContext.sparkContext)
HTH.
-Todd
On Thu, Sep 8, 2016 at 9:11 AM, Mich Talebzadeh
wrote:
> Ok I managed to sort that one out.
>
> This is what I am facing
>
> val spark
Have not tried this, but looks quite useful if one is using Druid:
https://github.com/implydata/pivot - An interactive data exploration UI
for Druid
On Tue, Aug 30, 2016 at 4:10 AM, Alonso Isidoro Roman
wrote:
> Thanks Mitch, i will check it.
>
> Cheers
>
>
> Alonso Isidoro Roman
> [image: htt
Have you looked at spark-packges.org? There are several different HBase
connectors there, not sure if any meet you need or not.
https://spark-packages.org/?q=hbase
HTH,
-Todd
On Tue, Aug 30, 2016 at 5:23 AM, ayan guha wrote:
> You can use rdd level new hadoop format api and pass
-5050-4372-0034'
However, the process doesn’t quit after all. This is critical, because I’d
like to use SparkLauncher to submit such jobs. If my job doesn’t end, jobs
will pile up and fill up the memory. Pls help. :-|
—
BR,
Todd Leo
-cant-find-my-tables-in-spark-sql-using-beeline.html
HTH.
-Todd
On Thu, Jul 21, 2016 at 10:30 AM, Marco Colombo wrote:
> Thanks.
>
> That is just a typo. I'm using on 'spark://10.0.2.15:7077' (standalone).
> Same url used in --master in spark-submit
>
>
&g
You can set the dbtable to this:
.option("dbtable", "(select * from master_schema where 'TID' = '100_0')")
HTH,
Todd
On Thu, Jul 21, 2016 at 10:59 AM, sujeet jog wrote:
> I have a table of size 5GB, and want to load selective rows into datafra
Streaming within its checkpoints by default. You can also manage them
yourself if desired. How are you dealing with offsets ?
Can you verify the offsets on the broker:
kafka-run-class.sh kafka.tools.GetOffsetShell --topic --broker-list
--time -1
-Todd
On Tue, Jun 7, 2016 at 8:17 AM, Dominik
What version of Spark are you using? I do not believe that 1.6.x is
compatible with 0.9.0.1 due to changes in the kafka clients between 0.8.2.2
and 0.9.0.x. See this for more information:
https://issues.apache.org/jira/browse/SPARK-12177
-Todd
On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric
There is a jira that works on spark thrift server HA, the patch works,but still
hasn't merged into the master branch.
At 2016-05-23 20:10:26, "qmzhang" <578967...@qq.com> wrote:
>Dear guys, please help...
>
>In hive,we can enable hiveserver2 high available by using dynamic service
>discove
As far as I know, there would be Akka version conflicting issue when using
Akka as spark streaming source.
At 2016-05-23 21:19:08, "Chaoqiang" wrote:
>I want to know why spark 1.6 use Netty instead of Akka? Is there some
>difficult problems which Akka can not solve, but using Netty can s
ot; wrote:
I got curious so I tried sbt dependencyTree. Looks like Guava comes into spark
core from a couple places.
-Mat
matschaffer.com
On Mon, May 23, 2016 at 2:32 PM, Todd wrote:
Can someone please take alook at my question?I am spark-shell local mode and
yarn-client mode.Spark
Can someone please take alook at my question?I am spark-shell local mode and
yarn-client mode.Spark code uses guava library,spark should have guava in place
during run time.
Thanks.
At 2016-05-23 11:48:58, "Todd" wrote:
Hi,
In the spark code, guava maven dependency scope i
Hi,
In the spark code, guava maven dependency scope is provided, my question is,
how spark depends on guava during runtime? I looked into the
spark-assembly-1.6.1-hadoop2.6.1.jar,and didn't find class entries like
com.google.common.base.Preconditions etc...
From the official site http://arrow.apache.org/, Apache Arrow is used for
Columnar In-Memory storage. I have two quick questions:
1. Does spark support Apache Arrow?
2. When dataframe is cached in memory, the data are saved in columnar in-memory
style. What is the relationship between this featur
Hi,
I brief the spark code, and it looks that structured streaming doesn't support
kafka as data source yet?
Perhaps these may be of some use:
https://github.com/mkuthan/example-spark
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
https://github.com/holdenk/spark-testing-base
On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote:
> Hi Lars,
>
> Do you have any examples for the methods
eaming("mode() can only be called on non-continuous queries")
this.mode = saveMode
this
}
On Wed, May 18, 2016 at 12:25 PM, Todd wrote:
Thanks Ted.
I didn't try, but I think SaveMode and OuputMode are different things.
Currently, the spark code contain two output mode, Append an
At 2016-05-18 12:10:11, "Ted Yu" wrote:
Have you tried adding:
.mode(SaveMode.Overwrite)
On Tue, May 17, 2016 at 8:55 PM, Todd wrote:
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30
seconds")).option("checkpointLoca
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30
seconds")).option("checkpointLocation",
"file:///home/hadoop/jsoncheckpoint").startStream("file:///home/hadoop/jsonresult")
org.apache.spark.sql.AnalysisException: Aggregations are not supported on
streaming DataFrames/Datas
Hi,
I am wondering whether structured streaming supports Kafka as data source. I
brief the source code(meanly related with DataSourceRegister trait), and didn't
find kafka data source things
If
Thanks.
TH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
On 17 May 2016 at 20:02, Michael Armbrust wrote:
In 2.0 you won't be able to do this. The long term vision would be to make
Hi,
We have a requirement to do count(distinct) in a processing batch against all
the streaming data(eg, last 24 hours' data),that is,when we do
count(distinct),we actually want to compute distinct against last 24 hours'
data.
Does structured streaming support this scenario?Thanks!
Thanks Ted!
At 2016-05-17 16:16:09, "Ted Yu" wrote:
Please take a look at:
[SPARK-13146][SQL] Management API for continuous queries
[SPARK-14555] Second cut of Python API for Structured Streaming
On Mon, May 16, 2016 at 11:46 PM, Todd wrote:
Hi,
Are there code examples
Hi,
Are there code examples about how to use the structured streaming feature?
Thanks.
issue the commit:
if (supportsTransactions) { conn.commit() } HTH -Todd
On Sat, Apr 23, 2016 at 8:57 AM, Andrés Ivaldi wrote:
> Hello, so I executed Profiler and found that implicit isolation was turn
> on by JDBC driver, this is the default behavior of MSSQL JDBC driver, but
> it's p
Have you looked at these:
http://allegro.tech/2015/08/spark-kafka-integration.html
http://mkuthan.github.io/blog/2016/01/29/spark-kafka-integration2/
Full example here:
https://github.com/mkuthan/example-spark-kafka
HTH.
-Todd
On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego
wrote
I believe you can adjust it by setting the following:
spark.akka.timeout 100s Communication timeout between Spark nodes.
HTH.
-Todd
On Thu, Apr 21, 2016 at 9:49 AM, yuemeng (A) wrote:
> When I run a spark application,sometimes I get follow ERROR:
>
> 16/04/21 09:26:45 ERROR Spa
e as complex event processing
> engine.
https://stratio.atlassian.net/wiki/display/DECISION0x9/Home
I have not used it, only read about it but it may be of some interest to
you.
-Todd
On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian
wrote:
> Microbatching is certainly not a waste of ti
Hi,
I have a long computing chain, when I get the last RDD after a series of
transformation. I have two choices to do with this last RDD
1. Call checkpoint on RDD to materialize it to disk
2. Call RDD.saveXXX to save it to HDFS, and read it back for further processing
I would ask which choice i
/granturing/a09aed4a302a7367be92
HTH.
-Todd
On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller
wrote:
> I'm pretty new to all of this stuff, so bare with me.
>
> Zeppelin isn't really intended for realtime dashboards as far as I know.
> Its reporting features (tables, grap
n / interval))
val counts = eventsStream.map(event => {
(event.timestamp - event.timestamp % interval, event)
}).updateStateByKey[Long](PrintEventCountsByInterval.counter _, new
HashPartitioner(3), initialRDD = initialRDD)
counts.print()
HTH.
-Todd
On Thu, Mar 10, 2016 at 1:35 AM, Zalzber
(KafkaUtils.createDirectStream) or Receiver (KafkaUtils.createStream)?
You may find this discussion of value on SO:
http://stackoverflow.com/questions/28901123/org-apache-spark-shuffle-metadatafetchfailedexception-missing-an-output-locatio
-Todd
On Mon, Mar 7, 2016 at 5:52 PM, Vinti Maheshwari
wrote
-apache-spark/
Not sure if that is of value to you or not.
HTH.
-Todd
On Tue, Mar 1, 2016 at 7:30 PM, Don Drake wrote:
> I'm interested in building a REST service that utilizes a Spark SQL
> Context to return records from a DataFrame (or IndexedRDD?) and even
> add/update records.
D1J7MzYcFo&feature=youtu.be&t=33m19s> at
ZendCon 2014
*->* IPython notebook
<https://www.youtube.com/watch?v=2AX6g0tK-us&feature=youtu.be&t=37m42s> running
the Spark Kernel underneath
HTH.
Todd
On Tue, Mar 1, 2016 at 4:10 AM, Mich Talebzadeh
wrote:
> Thanks Mohanna
hing obvious ?
>
>
> Le dim. 28 févr. 2016 à 19:01, Todd Nist a écrit :
>
>> Define your SparkConfig to set the master:
>>
>> val conf = new SparkConf().setAppName(AppName)
>> .setMaster(SparkMaster)
>> .set()
>>
>> Where Spark
7".
Then when you create the SparkContext, pass the SparkConf to it:
val sparkContext = new SparkContext(conf)
Then use the sparkContext for interact with the SparkMaster / Cluster.
Your program basically becomes the driver.
HTH.
-Todd
On Sun, Feb 28, 2016 at 9:25 AM, mms wrote:
You could use the "withSessionDo" of the SparkCassandrConnector to preform
the simple insert:
CassandraConnector(conf).withSessionDo { session => session.execute(....) }
-Todd
On Tue, Feb 16, 2016 at 11:01 AM, Cody Koeninger wrote:
> You could use sc.parallelize... but the off
Did you run hive on spark with spark 1.5 and hive 1.1?
I think hive on spark doesn't support spark 1.5. There are compatibility issues.
At 2016-01-28 01:51:43, "Ruslan Dautkhanov" wrote:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
There are quite a lot
Hi,
I am able to maven install the whole spark project(from github ) in my IDEA.
But, when I run the SparkPi example, IDEA compiles the code again and following
exeception is thrown,
Does someone meet this problem? Thanks a lot.
Error:scalac:
while compiling:
D:\opensourceprojects\sp
Hi,
I am kind of confused about how data locality is honored when spark is
running on yarn(client or cluster mode),can someone please elaberate on this?
Thanks!
5432/db_test?user=username&password=password>")
.option("user", username)
.option("password", pwd)
.option("driver", "org.postgresql.Driver")
.option("dbtable", "schema.table1")
.load().filter('dept_number === $deptN
xtends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.register(classOf[org.joda.time.DateTime], new
JodaDateTimeSerializer)
kryo.register(classOf[org.joda.time.Interval], new
JodaIntervalSerializer)
}
}
HTH.
-Todd
On Thu, Jan 14, 2016 at 9:28 AM, Spencer, Alex (Santander
)
.option("user", username)
.option("password", pwd)
.option("driver", "driverClassNameHere")
.option("dbtable", query)
.load()
Not sure if that's what your looking for or not.
HTH.
-Todd
On Mon, Jan 11, 2016 at 3:47 AM, Gaini Raje
Sorry, did not see your update until now.
On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist wrote:
> Hi Yasemin,
>
> What version of Spark are you using? Here is the reference, it is off of
> the DataFrame
> https://spark.apache.org/docs/latest/api/java/index.html#org.apache.spark.sql.D
park.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrame.html>
out
into external storage.
It is the very last method defined there in the api docs.
HTH.
-Todd
On Fri, Jan 8, 2016 at 2:27 PM, Yasemin Kaya wrote:
> Hi,
> There is no write function that Todd mentioned or i
(MYSQL_CONNECTION_URL_WRITE,
"track_on_alarm", connectionProps)
HTH.
-Todd
On Fri, Jan 8, 2016 at 10:53 AM, Ted Yu wrote:
> Which Spark release are you using ?
>
> For case #2, was there any error / clue in the logs ?
>
> Cheers
>
> On Fri, Jan 8, 2016 at 7:36 AM, Yasemin Kaya
: "10.10.5", arch: "x86_64", family: "mac"
On Wed, Jan 6, 2016 at 3:27 PM, Jade Liu wrote:
> Hi, Todd:
>
> Thanks for your suggestion. Yes I did run the
> ./dev/change-scala-version.sh 2.11 script when using scala version 2.11.
>
> I just tried this as
That should read "I think your missing the --name option". Sorry about
that.
On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist wrote:
> Hi Jade,
>
> I think you "--name" option. The makedistribution should look like this:
>
> ./make-distribution.sh --name h
Tests
HTH.
-Todd
On Wed, Jan 6, 2016 at 2:20 PM, Jade Liu wrote:
> I’ve changed the scala version to 2.10.
>
> With this command:
> build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean
> package
> Build was successful.
>
> But make a runnable vers
collects(), just to obtain the count of records on the
DStream.
HTH.
-Todd
On Wed, Dec 16, 2015 at 3:34 PM, Bryan Cutler wrote:
> To follow up with your other issue, if you are just trying to count
> elements in a DStream, you can do that without an Accumulator. foreachRDD
> is meant to
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in
1.6.
On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> The one coming with spark 1.5.2.
>
>
>
> y
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* December-15-15 1:59 PM
Windows DNS and
what it's pointing at. Can you do a
kinit *username *
on that host? It should tell you if it can find the KDC.
Let me know if that's helpful at all.
Todd
On Fri, Dec 11, 2015 at 1:50 PM, Mike Wright wrote:
> As part of our implementation, we are utilizing a ful
this:
val conf = new SparkConf().setAppName(s"YourApp").set("spark.ui.port", "4080")
val sc = new SparkContext(conf)
While there is a rest api to return you information on the application,
http://yourserver:8080/api/v1/applications, it does not return the port
used by t
mark the idle state as being timed out, and call the tracking
* function with State[S].isTimingOut() = true.
*/
def timeout(duration: Duration): this.type
-Todd
On Wed, Nov 25, 2015 at 8:00 AM, diplomatic Guru
wrote:
> Hello,
>
> I know how I could clear the old state dependin
(StreamingListenerBatchSubmitted batchSubmitted)
{ system.out.println("Start time: " +
batchSubmitted.batchInfo.processingStartTime)
}
Sorry for the confusion.
-Todd
On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist wrote:
> Hi Abhi,
>
> You should be able to register a
> org.apache.spark.streaming.sc
/SparkListener.html
.
HTH,
-Todd
On Tue, Nov 24, 2015 at 4:50 PM, Abhishek Anand
wrote:
> Hi ,
>
> I need to get the batch time of the active batches which appears on the UI
> of spark streaming tab,
>
> How can this be achieved in Java ?
>
> BR,
> Abhi
>
Hi,
When I cache the dataframe and run the query,
val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37")
df.cache()
df.show
println(df.queryExecution)
I got the following execution plan,from the optimized logical plan,I can see
the whole analyzed logical
Hi,
I am staring spark thrift server with the following script,
./start-thriftserver.sh --master yarn-client --driver-memory 1G
--executor-memory 2G --driver-cores 2 --executor-cores 2 --num-executors 4
--hiveconf hive.server2.thrift.port=10001 --hiveconf
hive.server2.thrift.bind.host=$(hostname
I am launching spark R with following script:
./sparkR --driver-memory 12G
and I try to load a local 3G csv file with following code,
> a=read.transactions("/home/admin/datamining/data.csv",sep="\t",format="single",cols=c(1,2))
but I encounter an error: could not allocate memory (2048 Mb) in C
Hi,
I am trying to build spark 1.5.1 in my environment, but encounter the following
error complaining Required file not found: sbt-interface.jar:
The error message is below and I am building with:
./make-distribution.sh --name spark-1.5.1-bin-2.6.0 --tgz --with-tachyon
-Phadoop-2.6 -Dhadoop.vers
.
FWIW, the environment was an MBP with OS X 10.10.5 and Java:
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
-Todd
On Tue, Oct 27, 2015 at 12:17 PM, Ted Yu wrote:
> I used the following co
state, is there a
provided 2.11 tgz available as well? I did not think there was, if there
is then should the documentation on the download site be changed to reflect
this?
Sorry for the confusion.
-Todd
On Sun, Oct 25, 2015 at 4:07 PM, Sean Owen wrote:
> No, 2.11 artifacts are in fact
Sorry Sean you are absolutely right it supports 2.11 all o meant is there
is no release available as a standard download and that one has to build
it. Thanks for the clairification.
-Todd
On Sunday, October 25, 2015, Sean Owen wrote:
> Hm, why do you say it doesn't support 2.11?
e are some limitations
see this,
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211,
for what is not supported.
HTH,
-Todd
On Sun, Oct 25, 2015 at 10:56 AM, Bilinmek Istemiyor
wrote:
>
> I am just starting out apache spark. I hava zero knowledge about the spark
&g
you attempt to serialize. Increase this if you get a
"buffer limit exceeded" exception inside Kryo.
-Todd
On Fri, Oct 23, 2015 at 6:51 AM, Yifan LI wrote:
> Thanks for your advice, Jem. :)
>
> I will increase the partitioning and see if it helps.
>
> Best,
> Yifan L
>From tableau, you should be able to use the Initial SQL option to support
this:
So in Tableau add the following to the “Initial SQL”
create function myfunc AS 'myclass'
using jar 'hdfs:///path/to/jar';
HTH,
Todd
On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar wrot
Hi Kali,
If you do not mind sending JSON, you could do something like this, using
json4s:
val rows = p.collect() map ( row => TestTable(row.getString(0),
row.getString(1)) )
val json = parse(write(rows))
producer.send(new KeyedMessage[String, String]("trade", writePretty(json)))
// or for eac
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi
CEP engine. I have not used the below, but they may be of some value to
you:
http://stratio.github.io/streaming-cep-engine/
https://github.com/Stratio/streaming-cep-engine
HTH.
-Todd
On Sun, Sep 13, 2015 at 7:49 PM
gt;>
>>
>> 在2015年09月11日 15:58,Cheng, Hao 写道:
>>
>> Can you confirm if the query really run in the cluster mode? Not the local
>> mode. Can you print the call stack of the executor when the query is running?
>>
>>
>>
>> BTW: spark.shuffle.
here is no table to show queries and execution
plan information.
At 2015-09-11 14:39:06, "Todd" wrote:
Thanks Hao.
Yes,it is still low as SMJ。Let me try the option your suggested,
At 2015-09-11 14:34:46, "Cheng, Hao" wrote:
You mean the performance is stil
/spark-sql? It’s a new feature in Spark 1.5, and it’s true by
default, but we found it probably causes the performance reduce dramatically.
From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 2:17 PM
To: Cheng, Hao
Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.o
t we found it probably causes the performance reduce dramatically.
From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 2:17 PM
To: Cheng, Hao
Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org
Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared w
the query again?
In our previous testing, it’s about 20% slower for sort merge join. I am not
sure if there anything else slow down the performance.
Hao
From: Jesse F Chen [mailto:jfc...@us.ibm.com]
Sent: Friday, September 11, 2015 1:18 PM
To: Michael Armbrust
Cc: Todd; user@spark.
Code Generation: false
At 2015-09-11 02:02:45, "Michael Armbrust" wrote:
I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so
this is surprising. In my experiments Spark 1.5 is either the same or faster
than 1.4 with only
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming
-Todd
On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh <
gurvinder.si...@uninett.no> wrote:
> On 09/10/2015 07:42 AM, Tathagata Das wrote:
> > Rewriting is necessary. Y
Hi,
I am using data generated with
sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark
sql performance (spark on yarn, with 10 nodes) with the following code (The
table store_sales is about 90 million records, 6G in size)
val outputDir="hdfs://tmp/spark_perf/scaleFact
Increase the number of executors, :-)
At 2015-08-26 16:57:48, "Ted Yu" wrote:
Mind sharing how you fixed the issue ?
Cheers
On Aug 26, 2015, at 1:50 AM, Todd wrote:
Sorry for the noise, It's my bad...I have worked it out now.
At 2015-08-26 13:20:57, "Todd&quo
Sorry for the noise, It's my bad...I have worked it out now.
At 2015-08-26 13:20:57, "Todd" wrote:
I think the answer is No. I only see such message on the console..and #2 is the
thread stack trace。
I am thinking is that in Spark SQL Perf forks many dsdgen process to gene
I am using tachyon in the spark program below,but I encounter a
BlockNotFoundxception.
Does someone know what's wrong and also is there guide on how to configure
spark to work with Tackyon?Thanks!
conf.set("spark.externalBlockStore.url", "tachyon://10.18.19.33:19998")
conf.set("spark.ex
e you able to get more detailed error message ?
Thanks
On Aug 25, 2015, at 6:57 PM, Todd wrote:
Thanks Ted Yu.
Following are the error message:
1. The exception that is shown on the UI is :
Exception in thread "Thread-113" Exception in thread "Thread-126" Exception in
(Interpreted frame)
At 2015-08-25 19:32:56, "Ted Yu" wrote:
Looks like you were attaching images to your email which didn't go through.
Consider using third party site for images - or paste error in text.
Cheers
On Tue, Aug 25, 2015 at 4:22 AM, Todd wrote:
Hi,
The
Hi,
The spark sql perf itself contains benchmark data generation. I am using spark
shell to run the spark sql perf to generate the data with 10G memory for both
driver and executor.
When I increase the scalefactor to be 30,and run the job, Then I got the
following error:
When I jstack it to s
oject
['a]
LocalRelation [a#0]
scala> parsedQuery.analyze
res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan=Project [a#0]
LocalRelation [a#0]
The #0 after a is a unique identifier (within this JVM) that says where the
data is coming from, even as plans are rearranged due to optimiza
of modules:
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
On Tue, Aug 25, 2015 at 12:18 PM, Todd wrote:
I cloned the code from https://github.com/apache/spark to my machine. It can
compile successfully,
But when I run the sparkpi, it throws an
I cloned the code from https://github.com/apache/spark to my machine. It can
compile successfully,
But when I run the sparkpi, it throws an exception below complaining the
scala.collection.Seq is not found.
I have installed scala2.10.4 in my machine, and use the default profiles:
window,scala2.1
Thanks Chenghao!
At 2015-08-25 13:06:40, "Cheng, Hao" wrote:
Yes, check the source code
under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst
From: Todd [mailto:bit1...@163.com]
Sent: Tuesday, August 25, 2015 1:01
Hi, Are there test cases for the spark sql catalyst, such as testing the rules
of transforming unsolved query plan?
Thanks!
There are many such kind of case class or concept such as
Attribute/AttributeReference/Expression in Spark SQL
I would ask what Attribute/AttributeReference/Expression mean, given a sql
query like select a,b from c, it a, b are two Attributes? a + b is an
expression?
Looks I misunderstand it b
please try DataFrame.toJSON, it will give you an RDD of JSON string.
At 2015-08-21 15:59:43, "smagadi" wrote:
>val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND
>age <= 19")
>
>I need teenagers to be a JSON object rather a simple row .How can we get
>that done ?
>
Hi,
I would ask if there are some blogs/articles/videos on how to analyse spark
performance during runtime,eg, tools that can be used or something related.
o relaunch if
driver runs as a Hadoop Yarn Application?
On Wednesday, 19 August 2015 12:49 PM, Todd wrote:
There is an option for the spark-submit (Spark standalone or Mesos with cluster
deploy mode only)
--supervise If given, restarts the driver on failure.
At 201
1 - 100 of 213 matches
Mail list logo