Hi,
I have a long computing chain, when I get the last RDD after a series of
transformation. I have two choices to do with this last RDD
1. Call checkpoint on RDD to materialize it to disk
2. Call RDD.saveXXX to save it to HDFS, and read it back for further processing
I would ask which choice i
Hi,
Are there code examples about how to use the structured streaming feature?
Thanks.
Thanks Ted!
At 2016-05-17 16:16:09, "Ted Yu" wrote:
Please take a look at:
[SPARK-13146][SQL] Management API for continuous queries
[SPARK-14555] Second cut of Python API for Structured Streaming
On Mon, May 16, 2016 at 11:46 PM, Todd wrote:
Hi,
Are there code examples
Hi,
We have a requirement to do count(distinct) in a processing batch against all
the streaming data(eg, last 24 hours' data),that is,when we do
count(distinct),we actually want to compute distinct against last 24 hours'
data.
Does structured streaming support this scenario?Thanks!
TH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
On 17 May 2016 at 20:02, Michael Armbrust wrote:
In 2.0 you won't be able to do this. The long term vision would be to make
Hi,
I am wondering whether structured streaming supports Kafka as data source. I
brief the source code(meanly related with DataSourceRegister trait), and didn't
find kafka data source things
If
Thanks.
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30
seconds")).option("checkpointLocation",
"file:///home/hadoop/jsoncheckpoint").startStream("file:///home/hadoop/jsonresult")
org.apache.spark.sql.AnalysisException: Aggregations are not supported on
streaming DataFrames/Datas
At 2016-05-18 12:10:11, "Ted Yu" wrote:
Have you tried adding:
.mode(SaveMode.Overwrite)
On Tue, May 17, 2016 at 8:55 PM, Todd wrote:
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30
seconds")).option("checkpointLoca
eaming("mode() can only be called on non-continuous queries")
this.mode = saveMode
this
}
On Wed, May 18, 2016 at 12:25 PM, Todd wrote:
Thanks Ted.
I didn't try, but I think SaveMode and OuputMode are different things.
Currently, the spark code contain two output mode, Append an
Hi,
I brief the spark code, and it looks that structured streaming doesn't support
kafka as data source yet?
From the official site http://arrow.apache.org/, Apache Arrow is used for
Columnar In-Memory storage. I have two quick questions:
1. Does spark support Apache Arrow?
2. When dataframe is cached in memory, the data are saved in columnar in-memory
style. What is the relationship between this featur
Hi,
In the spark code, guava maven dependency scope is provided, my question is,
how spark depends on guava during runtime? I looked into the
spark-assembly-1.6.1-hadoop2.6.1.jar,and didn't find class entries like
com.google.common.base.Preconditions etc...
Can someone please take alook at my question?I am spark-shell local mode and
yarn-client mode.Spark code uses guava library,spark should have guava in place
during run time.
Thanks.
At 2016-05-23 11:48:58, "Todd" wrote:
Hi,
In the spark code, guava maven dependency scope i
ot; wrote:
I got curious so I tried sbt dependencyTree. Looks like Guava comes into spark
core from a couple places.
-Mat
matschaffer.com
On Mon, May 23, 2016 at 2:32 PM, Todd wrote:
Can someone please take alook at my question?I am spark-shell local mode and
yarn-client mode.Spark
As far as I know, there would be Akka version conflicting issue when using
Akka as spark streaming source.
At 2016-05-23 21:19:08, "Chaoqiang" wrote:
>I want to know why spark 1.6 use Netty instead of Akka? Is there some
>difficult problems which Akka can not solve, but using Netty can s
There is a jira that works on spark thrift server HA, the patch works,but still
hasn't merged into the master branch.
At 2016-05-23 20:10:26, "qmzhang" <578967...@qq.com> wrote:
>Dear guys, please help...
>
>In hive,we can enable hiveserver2 high available by using dynamic service
>discove
Hi,
I am kind of confused about how data locality is honored when spark is
running on yarn(client or cluster mode),can someone please elaberate on this?
Thanks!
Hi,
I am able to maven install the whole spark project(from github ) in my IDEA.
But, when I run the SparkPi example, IDEA compiles the code again and following
exeception is thrown,
Does someone meet this problem? Thanks a lot.
Error:scalac:
while compiling:
D:\opensourceprojects\sp
Did you run hive on spark with spark 1.5 and hive 1.1?
I think hive on spark doesn't support spark 1.5. There are compatibility issues.
At 2016-01-28 01:51:43, "Ruslan Dautkhanov" wrote:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
There are quite a lot
Hi,
I am trying to build spark 1.5.1 in my environment, but encounter the following
error complaining Required file not found: sbt-interface.jar:
The error message is below and I am building with:
./make-distribution.sh --name spark-1.5.1-bin-2.6.0 --tgz --with-tachyon
-Phadoop-2.6 -Dhadoop.vers
I am launching spark R with following script:
./sparkR --driver-memory 12G
and I try to load a local 3G csv file with following code,
> a=read.transactions("/home/admin/datamining/data.csv",sep="\t",format="single",cols=c(1,2))
but I encounter an error: could not allocate memory (2048 Mb) in C
Hi,
I am staring spark thrift server with the following script,
./start-thriftserver.sh --master yarn-client --driver-memory 1G
--executor-memory 2G --driver-cores 2 --executor-cores 2 --num-executors 4
--hiveconf hive.server2.thrift.port=10001 --hiveconf
hive.server2.thrift.bind.host=$(hostname
Hi,
When I cache the dataframe and run the query,
val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37")
df.cache()
df.show
println(df.queryExecution)
I got the following execution plan,from the optimized logical plan,I can see
the whole analyzed logical
Hi,
I got a question about the spark-sql-perf project by Databricks at
https://github.com/databricks/spark-sql-perf/
The Tables.scala
(https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/bigdata/Tables.scala)
and BigData
(https://github.com/dat
y requires
that you have already created the data/tables. I'll work on updating the README
as the QA period moves forward.
On Thu, Aug 13, 2015 at 6:49 AM, Todd wrote:
Hi,
I got a question about the spark-sql-perf project by Databricks at
https://github.com/databricks/spark-sql-perf/
Hi,
I would ask whether there are slides, blogs or videos on the topic about how
spark sql is implemented, the process or the whole picture when spark sql
executes the code, Thanks!.
bricks.com/document/d/1Hc_Ehtr0G8SQUg69cmViZsMi55_Kf3tISD9GPGU5M1Y/edit
FYI
On Thu, Aug 13, 2015 at 8:54 PM, Todd wrote:
Hi,
I would ask whether there are slides, blogs or videos on the topic about how
spark sql is implemented, the process or the whole picture when spark sql
executes the code, Thanks!.
Hi,
With following code snippet, I cached the raw RDD(which is already in memory,
but just for illustration) and its DataFrame.
I thought that the df cache would take less space than the rdd cache,which is
wrong because from the UI that I see the rdd cache takes 168B,while the df
cache takes 272
otprint of dataframe to be lower when it contains more
information ( RDD + Schema)
On Sat, Aug 15, 2015 at 6:35 PM, Todd wrote:
Hi,
With following code snippet, I cached the raw RDD(which is already in memory,
but just for illustration) and its DataFrame.
I thought that the df cache would take
Hi,I have a basic spark sql join run in the local mode. I checked the UI,and
see that there are two jobs are run. There DAG graph are pasted at the end.
I have several questions here:
1. Looks that Job0 and Job1 all have the same DAG Stages, but the stage 3 and
stage4 are skipped. I would ask wha
Hi,
I can't access
http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf.
Could someone help try to see if it is available and reply with it?Thanks!
One spark application can have many jobs,eg,first call rdd.count then call
rdd.collect
At 2015-08-18 15:37:14, "Hemant Bhanawat" wrote:
It is still in memory for future rdd transformations and actions.
This is interesting. You mean Spark holds the data in memory between two job
execut
Take a look at the doc for the method:
/**
* Applies a schema to an RDD of Java Beans.
*
* WARNING: Since there is no guaranteed ordering for fields in a Java Bean,
* SELECT * queries will return the columns in an undefined order.
* @group dataframes
* @since 1.3
Hi,
Following is copied from the spark EventTimeline UI. I don't understand why
there are overlapping between tasks?
I think they should be sequentially one by one in one executor(there are one
core each executor).
The blue part of each task is the scheduler delay time. Does it mean it is the
d
I think I find the answer..
On the UI, the recording time of each task is when it is put into the thread
pool. Then the UI makes sense
At 2015-08-18 17:40:07, "Todd" wrote:
Hi,
Following is copied from the spark EventTimeline UI. I don't understand why
there are overlappin
There is an option for the spark-submit (Spark standalone or Mesos with cluster
deploy mode only)
--supervise If given, restarts the driver on failure.
At 2015-08-19 14:55:39, "Spark Enthusiast" wrote:
Folks,
As I see, the Driver program is a single point of failure. N
I don't find related talk on whether spark sql supports column indexing. If it
does, is there guide how to do it? Thanks.
o relaunch if
driver runs as a Hadoop Yarn Application?
On Wednesday, 19 August 2015 12:49 PM, Todd wrote:
There is an option for the spark-submit (Spark standalone or Mesos with cluster
deploy mode only)
--supervise If given, restarts the driver on failure.
At 201
Hi,
I would ask if there are some blogs/articles/videos on how to analyse spark
performance during runtime,eg, tools that can be used or something related.
please try DataFrame.toJSON, it will give you an RDD of JSON string.
At 2015-08-21 15:59:43, "smagadi" wrote:
>val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND
>age <= 19")
>
>I need teenagers to be a JSON object rather a simple row .How can we get
>that done ?
>
There are many such kind of case class or concept such as
Attribute/AttributeReference/Expression in Spark SQL
I would ask what Attribute/AttributeReference/Expression mean, given a sql
query like select a,b from c, it a, b are two Attributes? a + b is an
expression?
Looks I misunderstand it b
Hi, Are there test cases for the spark sql catalyst, such as testing the rules
of transforming unsolved query plan?
Thanks!
Thanks Chenghao!
At 2015-08-25 13:06:40, "Cheng, Hao" wrote:
Yes, check the source code
under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst
From: Todd [mailto:bit1...@163.com]
Sent: Tuesday, August 25, 2015 1:01
I cloned the code from https://github.com/apache/spark to my machine. It can
compile successfully,
But when I run the sparkpi, it throws an exception below complaining the
scala.collection.Seq is not found.
I have installed scala2.10.4 in my machine, and use the default profiles:
window,scala2.1
of modules:
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
On Tue, Aug 25, 2015 at 12:18 PM, Todd wrote:
I cloned the code from https://github.com/apache/spark to my machine. It can
compile successfully,
But when I run the sparkpi, it throws an
oject
['a]
LocalRelation [a#0]
scala> parsedQuery.analyze
res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan=Project [a#0]
LocalRelation [a#0]
The #0 after a is a unique identifier (within this JVM) that says where the
data is coming from, even as plans are rearranged due to optimiza
Hi,
The spark sql perf itself contains benchmark data generation. I am using spark
shell to run the spark sql perf to generate the data with 10G memory for both
driver and executor.
When I increase the scalefactor to be 30,and run the job, Then I got the
following error:
When I jstack it to s
(Interpreted frame)
At 2015-08-25 19:32:56, "Ted Yu" wrote:
Looks like you were attaching images to your email which didn't go through.
Consider using third party site for images - or paste error in text.
Cheers
On Tue, Aug 25, 2015 at 4:22 AM, Todd wrote:
Hi,
The
e you able to get more detailed error message ?
Thanks
On Aug 25, 2015, at 6:57 PM, Todd wrote:
Thanks Ted Yu.
Following are the error message:
1. The exception that is shown on the UI is :
Exception in thread "Thread-113" Exception in thread "Thread-126" Exception in
I am using tachyon in the spark program below,but I encounter a
BlockNotFoundxception.
Does someone know what's wrong and also is there guide on how to configure
spark to work with Tackyon?Thanks!
conf.set("spark.externalBlockStore.url", "tachyon://10.18.19.33:19998")
conf.set("spark.ex
Sorry for the noise, It's my bad...I have worked it out now.
At 2015-08-26 13:20:57, "Todd" wrote:
I think the answer is No. I only see such message on the console..and #2 is the
thread stack trace。
I am thinking is that in Spark SQL Perf forks many dsdgen process to gene
Increase the number of executors, :-)
At 2015-08-26 16:57:48, "Ted Yu" wrote:
Mind sharing how you fixed the issue ?
Cheers
On Aug 26, 2015, at 1:50 AM, Todd wrote:
Sorry for the noise, It's my bad...I have worked it out now.
At 2015-08-26 13:20:57, "Todd&quo
Hi,
I am using data generated with
sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark
sql performance (spark on yarn, with 10 nodes) with the following code (The
table store_sales is about 90 million records, 6G in size)
val outputDir="hdfs://tmp/spark_perf/scaleFact
Code Generation: false
At 2015-09-11 02:02:45, "Michael Armbrust" wrote:
I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so
this is surprising. In my experiments Spark 1.5 is either the same or faster
than 1.4 with only
the query again?
In our previous testing, it’s about 20% slower for sort merge join. I am not
sure if there anything else slow down the performance.
Hao
From: Jesse F Chen [mailto:jfc...@us.ibm.com]
Sent: Friday, September 11, 2015 1:18 PM
To: Michael Armbrust
Cc: Todd; user@spark.
t we found it probably causes the performance reduce dramatically.
From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 2:17 PM
To: Cheng, Hao
Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org
Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared w
/spark-sql? It’s a new feature in Spark 1.5, and it’s true by
default, but we found it probably causes the performance reduce dramatically.
From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 2:17 PM
To: Cheng, Hao
Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.o
here is no table to show queries and execution
plan information.
At 2015-09-11 14:39:06, "Todd" wrote:
Thanks Hao.
Yes,it is still low as SMJ。Let me try the option your suggested,
At 2015-09-11 14:34:46, "Cheng, Hao" wrote:
You mean the performance is stil
gt;>
>>
>> 在2015年09月11日 15:58,Cheng, Hao 写道:
>>
>> Can you confirm if the query really run in the cluster mode? Not the local
>> mode. Can you print the call stack of the executor when the query is running?
>>
>>
>>
>> BTW: spark.shuffle.
After compiling the Spark 1.2.0 codebase in Intellj Idea, and run the LocalPi
example,I got the following slf4j related issue. Does anyone know how to fix
this? Thanks
Error:scalac: bad symbolic reference. A signature in Logging.class refers to
type Logger
in package org.slf4j which is not av
ed Yu" wrote:
Spark depends on slf4j 1.7.5
Please check your classpath and make sure slf4j is included.
Cheers
On Wed, Feb 11, 2015 at 6:20 AM, Todd wrote:
After compiling the Spark 1.2.0 codebase in Intellj Idea, and run the LocalPi
example,I got the following slf4j related i
Databricks provides a sample code on its website...but i can't find it for now.
At 2015-02-12 00:43:07, "captainfranz" wrote:
>I am confused as to whether avro support was merged into Spark 1.2 or it is
>still an independent library.
>I see some people writing sqlContext.avroFile similarly
sorry for the noise. I have found it..
At 2015-02-18 23:34:40, "Todd" wrote:
Looks the log anylysis reference app provided by Databricks at
https://github.com/databricks/reference-apps only has java API?
I'd like to see the Scala version one.
Looks the log anylysis reference app provided by Databricks at
https://github.com/databricks/reference-apps only has java API?
I'd like to see the Scala version one.
I am a bit new to Spark, except that I tried simple things like word count, and
the examples given in the spark sql programming guide.
Now, I am investigating the internals of Spark, but I think I am almost lost,
because I could not grasp a whole picture what spark does when it executes the
word
Hi,
I have imported the Spark source code in Intellij Idea as a SBT project. I try
to do maven install in Intellij Idea by clicking Install in the Spark Project
Parent POM(root),but failed.
I would ask which profiles should be checked. What I want to achieve is staring
Spark in IDE and Hadoop
Thanks Sean.
I follow the guide, import the codebase into IntellijIdea as Maven project,
with the profiles:hadoop2.4 and yarn.
In the maven project view, I run Maven Install against the module: Spark
Project Parent POM(root).After a pretty long time, all the modules are built
successfully.
But
>From the docs,
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence:
Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in
the JVM. If the RDD does not fit in memory, some partitions will not be
cached and will be recomputed on the fly each time they're n
utor classpath are
duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6
standalone cluster just to ensure I had nothing from testing in the way.
I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will
pick up the jar and append it fine.
Any suggestions on what is going on here? Seems to just ignore whatever I
have in the spark.executor.extraClassPath. Is there a different way to do
this?
TIA.
-Todd
w.
As for the spark-cassandra-connector 1.3.0-SNAPSHOT, I am building that
from master. Haven't hit any issue with it yet.
-Todd
On Fri, May 22, 2015 at 9:39 PM, Yana Kadiyska
wrote:
> Todd, I don't have any answers for you...other than the file is actually
> named spark-de
There use to be a project, StreamSQL (
https://github.com/thunderain-project/StreamSQL), but it appears a bit
dated and I do not see it in the Spark repo, but may have missed it.
@TD Is this project still active?
I'm not sure what the status is but it may provide some insights on how to
achieve w
:
Broadcast cmdLineArg = sc.broadcast(Inetger.parseInd(args[12]));
Then just reference the broadcast variable in you workers. It will get
shipped once to all nodes in the cluster and can be referenced by them.
HTH.
-Todd
On Thu, Jun 11, 2015 at 8:23 AM, gaurav sharma
wrote:
> Hi,
>
> I am us
It was released yesterday.
On Friday, June 12, 2015, ayan guha wrote:
> Hi
>
> When is official spark 1.4 release date?
> Best
> Ayan
>
ere;
-Todd
On Mon, Jun 15, 2015 at 10:57 AM, Proust GZ Feng wrote:
> Thanks a lot Akhil, after try some suggestions in the tuning guide, there
> seems no improvement at all.
>
> And below is the job detail when running locally(8cores) which took 3min
> to complete the job, we can
You can get HDP with at least 1.3.1 from Horton:
http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/
for your convenience from the dos:
wget -nv
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo
-O /etc/yum.repos.d/HDP-TP.repo
You should use:
spark.executor.memory
from the docs <https://spark.apache.org/docs/latest/configuration.html>:
spark.executor.memory512mAmount of memory to use per executor process, in
the same format as JVM memory strings (e.g.512m, 2g).
-Todd
On Thu, Jul 2, 2015 at 3:36 PM, Mulugeta
limitation at this time.
-Todd
On Thu, Jul 2, 2015 at 4:13 PM, Mulugeta Mammo
wrote:
> thanks but my use case requires I specify different start and max heap
> sizes. Looks like spark sets start and max sizes same value.
>
> On Thu, Jul 2, 2015 at 1:08 PM, Todd Nist wrote:
>
&g
utput stream and therefore
materialized.
Change it to a map, foreach or some other form of transform.
HTH
-Todd
On Thu, Jul 9, 2015 at 5:24 PM, Su She wrote:
> Hello All,
>
> I also posted this on the Spark/Datastax thread, but thought it was also
> 50% a spark question (or mostly
conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
HTH
-Todd
On Fri, Jul 10, 2015 at 5:24 AM, Prateek . wrote:
> Hi,
>
>
>
> I am beginner to spark , I want save the word and its count to cassandra
> key
There are there connector packages listed on spark packages web site:
http://spark-packages.org/?q=hbase
HTH.
-Todd
On Wed, Jul 15, 2015 at 2:46 PM, Shushant Arora
wrote:
> Hi
>
> I have a requirement of writing in hbase table from Spark streaming app
> after some processing.
&g
ing functionsrankrankdense_rankdenseRankpercent_rank
percentRankntilentilerow_numberrowNumber
HTH.
-Todd
On Thu, Jul 16, 2015 at 8:10 AM, Lior Chaga wrote:
> Does spark HiveContext support the rank() ... distribute by syntax (as in
> the following article-
> http://www.edwardcapriol
-apache-spark/
Not sure if that is of value to you or not.
HTH.
-Todd
On Tue, Mar 1, 2016 at 7:30 PM, Don Drake wrote:
> I'm interested in building a REST service that utilizes a Spark SQL
> Context to return records from a DataFrame (or IndexedRDD?) and even
> add/update records.
(KafkaUtils.createDirectStream) or Receiver (KafkaUtils.createStream)?
You may find this discussion of value on SO:
http://stackoverflow.com/questions/28901123/org-apache-spark-shuffle-metadatafetchfailedexception-missing-an-output-locatio
-Todd
On Mon, Mar 7, 2016 at 5:52 PM, Vinti Maheshwari
wrote
n / interval))
val counts = eventsStream.map(event => {
(event.timestamp - event.timestamp % interval, event)
}).updateStateByKey[Long](PrintEventCountsByInterval.counter _, new
HashPartitioner(3), initialRDD = initialRDD)
counts.print()
HTH.
-Todd
On Thu, Mar 10, 2016 at 1:35 AM, Zalzber
/granturing/a09aed4a302a7367be92
HTH.
-Todd
On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller
wrote:
> I'm pretty new to all of this stuff, so bare with me.
>
> Zeppelin isn't really intended for realtime dashboards as far as I know.
> Its reporting features (tables, grap
e as complex event processing
> engine.
https://stratio.atlassian.net/wiki/display/DECISION0x9/Home
I have not used it, only read about it but it may be of some interest to
you.
-Todd
On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian
wrote:
> Microbatching is certainly not a waste of ti
I believe you can adjust it by setting the following:
spark.akka.timeout 100s Communication timeout between Spark nodes.
HTH.
-Todd
On Thu, Apr 21, 2016 at 9:49 AM, yuemeng (A) wrote:
> When I run a spark application,sometimes I get follow ERROR:
>
> 16/04/21 09:26:45 ERROR Spa
Have you looked at these:
http://allegro.tech/2015/08/spark-kafka-integration.html
http://mkuthan.github.io/blog/2016/01/29/spark-kafka-integration2/
Full example here:
https://github.com/mkuthan/example-spark-kafka
HTH.
-Todd
On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego
wrote
issue the commit:
if (supportsTransactions) { conn.commit() } HTH -Todd
On Sat, Apr 23, 2016 at 8:57 AM, Andrés Ivaldi wrote:
> Hello, so I executed Profiler and found that implicit isolation was turn
> on by JDBC driver, this is the default behavior of MSSQL JDBC driver, but
> it's p
Perhaps these may be of some use:
https://github.com/mkuthan/example-spark
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
https://github.com/holdenk/spark-testing-base
On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote:
> Hi Lars,
>
> Do you have any examples for the methods
What version of Spark are you using? I do not believe that 1.6.x is
compatible with 0.9.0.1 due to changes in the kafka clients between 0.8.2.2
and 0.9.0.x. See this for more information:
https://issues.apache.org/jira/browse/SPARK-12177
-Todd
On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric
Streaming within its checkpoints by default. You can also manage them
yourself if desired. How are you dealing with offsets ?
Can you verify the offsets on the broker:
kafka-run-class.sh kafka.tools.GetOffsetShell --topic --broker-list
--time -1
-Todd
On Tue, Jun 7, 2016 at 8:17 AM, Dominik
You can set the dbtable to this:
.option("dbtable", "(select * from master_schema where 'TID' = '100_0')")
HTH,
Todd
On Thu, Jul 21, 2016 at 10:59 AM, sujeet jog wrote:
> I have a table of size 5GB, and want to load selective rows into datafra
-cant-find-my-tables-in-spark-sql-using-beeline.html
HTH.
-Todd
On Thu, Jul 21, 2016 at 10:30 AM, Marco Colombo wrote:
> Thanks.
>
> That is just a typo. I'm using on 'spark://10.0.2.15:7077' (standalone).
> Same url used in --master in spark-submit
>
>
&g
-5050-4372-0034'
However, the process doesn’t quit after all. This is critical, because I’d
like to use SparkLauncher to submit such jobs. If my job doesn’t end, jobs
will pile up and fill up the memory. Pls help. :-|
—
BR,
Todd Leo
Windows DNS and
what it's pointing at. Can you do a
kinit *username *
on that host? It should tell you if it can find the KDC.
Let me know if that's helpful at all.
Todd
On Fri, Dec 11, 2015 at 1:50 PM, Mike Wright wrote:
> As part of our implementation, we are utilizing a ful
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in
1.6.
On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> The one coming with spark 1.5.2.
>
>
>
> y
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* December-15-15 1:59 PM
collects(), just to obtain the count of records on the
DStream.
HTH.
-Todd
On Wed, Dec 16, 2015 at 3:34 PM, Bryan Cutler wrote:
> To follow up with your other issue, if you are just trying to count
> elements in a DStream, you can do that without an Accumulator. foreachRDD
> is meant to
Tests
HTH.
-Todd
On Wed, Jan 6, 2016 at 2:20 PM, Jade Liu wrote:
> I’ve changed the scala version to 2.10.
>
> With this command:
> build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean
> package
> Build was successful.
>
> But make a runnable vers
That should read "I think your missing the --name option". Sorry about
that.
On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist wrote:
> Hi Jade,
>
> I think you "--name" option. The makedistribution should look like this:
>
> ./make-distribution.sh --name h
1 - 100 of 213 matches
Mail list logo