Hi there,
I have read about the two fundamental shared features in spark
(broadcasting variables and accumulators), but this is what i need.
I'm using spark streaming in order to get requests from Kafka, these
requests may launch long-running tasks, and i need to control them:
1) Keep them in a
long running tasks (assuming 1 task corresponding to each
> request from Kafka).
>
> TD
>
>
> On Wed, Mar 26, 2014 at 1:19 AM, Bryan Bryan wrote:
>
>> Hi there,
>>
>> I have read about the two fundamental shared features in spark
>> (broadcasting variabl
outer hops) the throughput decreases
significantly, causing job delays.
Is this typical? Have others encountered similar issues? Is there Kafka
configuration that might mitigate this issue?
Regards,
Bryan Jeffrey
Sent from Outlook Mail for Windows 10 phone
while missing data from other partitions.
Regards,
Bryan Jeffrey
Sent from Outlook Mail for Windows 10 phone
From: vivek.meghanat...@wipro.com
Sent: Thursday, December 24, 2015 5:22 AM
To: user@spark.apache.org
Subject: Spark Streaming + Kafka + scala job message read issue
Hi All,
We are
Vivek,
https://spark.apache.org/docs/1.5.2/streaming-kafka-integration.html
The map is per partitions number of topics to consume. Is numThreads below
equal to the number of partitions in your topic?
Regards,
Bryan Jeffrey
Sent from Outlook Mail for Windows 10 phone
From: vivek.meghanat
problem?
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap - Number of
threads used here is 1
val searches = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(line
=> parse(line._2).extract[Search])
Regards,
Vivek M
From: Bryan [mailto:bryan.jeff...@gma
message is always reaching Kafka (checked through the console
consumer).
Regards
Vivek
Sent using CloudMagic Email
On Sat, Dec 26, 2015 at 2:42 am, Bryan wrote:
Agreed. I did not see that they were using the same group name.
Sent from Outlook Mail for Windows 10 phone
From: PhuDuc Nguyen
Hello,
Can anyone point me to a good example of updateStateByKey with an initial RDD?
I am seeing a compile time error when following the API.
Regards,
Bryan Jeffrey
(broadcasting the smaller set).
For joining two large datasets, it would seem to be better to repartition both
sets in the same way then join each partition. It there a suggested practice
for this problem?
Thank you,
Bryan Jeffrey
Jerry,
Thank you for the note. It sounds like you were able to get further than I have
been - any insight? Just a Spark 1.4.1 vs Spark 1.5?
Regards,
Bryan Jeffrey
-Original Message-
From: "Jerry Lam"
Sent: 10/28/2015 6:29 PM
To: "Bryan Jeffrey"
Cc: "Susan
storage location for the data. That seems very hacky
though, and likely to result in maintenance issues.
Regards,
Bryan Jeffrey
-Original Message-
From: "Yana Kadiyska"
Sent: 10/28/2015 8:32 PM
To: "Bryan Jeffrey"
Cc: "Susan Zhang" ; "user"
Subj
Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?
Regards,
Bryan Jeffrey
-Original Message-
From: "Bryan Jeffrey"
Sent: 11/4/2015 11:16 AM
To: "user"
Subject: Cassandra via SparkSQL/Hive JDBC
Hello.
I have been working to add SparkSQL
uses
reduceByKeyAndWindow. Is there another method to seed in the initial counts
beyond updateStateByKey?
Regards,
Bryan Jeffrey
.
Is there an alternative method to initialize state?
InputQueueStream joined to window would seem to work, but InputQueueStream does
not allow checkpointing
Sent from Outlook Mail
From: Tathagata Das
Sent: Sunday, November 22, 2015 8:01 PM
To: Bryan
Cc: user
Subject: Re: Initial State
There is
Cheng,
That’s exactly what I was hoping for – native support for writing DateTime
objects. As it stands Spark 1.5.2 seems to leave no option but to do manual
conversion (to nanos, Timestamp, etc) prior to writing records to hive.
Regards,
Bryan Jeffrey
Sent from Outlook Mail
From: Cheng
?
Regards,
Bryan Jeffrey
Sent from Outlook Mail
From: Cheng Lian
Sent: Tuesday, November 24, 2015 6:49 AM
To: Bryan;user
Subject: Re: DateTime Support - Hive Parquet
I see, then this is actually irrelevant to Parquet. I guess can support Joda
DateTime in Spark SQL reflective schema inference
Akhil,
This looks like the issue. I'll update my path to include the (soon to be
added) winutils & assoc. DLLs.
Thank you,
Bryan
-Original Message-
From: "Akhil Das"
Sent: 9/14/2015 6:46 AM
To: "Bryan Jeffrey"
Cc: "user"
Subject: Re: Probl
Tathagata,
Simple batch jobs do work. The cluster has a good set of resources and a
limited input volume on the given Kafka topic.
The job works on the small 3-node standalone-configured cluster I have setup
for test.
Regards,
Bryan Jeffrey
-Original Message-
From: "Tathagat
Also - I double checked - we're setting the master to "yarn-cluster"
-Original Message-
From: "Tathagata Das"
Sent: 9/23/2015 2:38 PM
To: "Bryan"
Cc: "user" ; "Hari Shreedharan"
Subject: Re: Yarn Shutting Down Spark Processing
Marcelo,
The error below is from the application logs. The spark streaming context is
initialized and actively processing data when yarn claims that the context is
not initialized.
There are a number of errors, but they're all associated with the ssc shutting
down.
Regards,
Bryan Je
Srinivas,
Interestingly, I did have the metrics jar packaged as part of my main jar. It
worked well both on driver and locally, but not on executors.
Regards,
Bryan Jeffrey
Get Outlook for Android<https://aka.ms/ghei36>
From: Srinivas V
Sent: Saturday
On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey
wrote:
> Srinivas,
>
> I finally broke a little bit of time free to look at this issue. I
> reduced the scope of my ambitions and simply cloned a the ConsoleSink and
> ConsoleReporter class. After doing so I can see the original versi
Jungtaek,
How would you contrast stateful streaming with checkpoint vs. the idea of
writing updates to a Delta Lake table, and then using the Delta Lake table
as a streaming source for our state stream?
Thank you,
Bryan
On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh
wrote:
> Thank
Prateek,
I believe that one task is created per Cassandra partition. How is your
data partitioned?
Regards,
Bryan Jeffrey
On Thu, Mar 10, 2016 at 10:36 AM, Prateek . wrote:
> Hi,
>
>
>
> I have a Spark Batch job for reading timeseries data from Cassandra which
&g
Are you trying to save predictions on a dataset to a file, or the model
produced after training with ALS?
On Thu, Mar 10, 2016 at 7:57 PM, Shishir Anshuman wrote:
> hello,
>
> I am new to Apache Spark and would like to get the Recommendation output
> of the ALS algorithm in a file.
> Please sugg
Steve & Adam,
I would be interesting in hearing the outcome here as well. I am seeing
some similar issues in my 1.4.1 pipeline, using stateful functions
(reduceByKeyAndWindow and updateStateByKey).
Regards,
Bryan Jeffrey
On Mon, Mar 14, 2016 at 6:45 AM, Steve Loughran
wrote:
>
> &
/scala/org/apache/spark/examples/mllib/RecommendationExample.scala#L62
On Fri, Mar 11, 2016 at 8:18 PM, Shishir Anshuman wrote:
> The model produced after training.
>
> On Fri, Mar 11, 2016 at 10:29 PM, Bryan Cutler wrote:
>
>> Are you trying to save predictions on a dataset to a
Cody et. al,
I am seeing a similar error. I've increased the number of retries. Once
I've got a job up and running I'm seeing it retry correctly. However, I am
having trouble getting the job started - number of retries does not seem to
help with startup behavior.
Thoughts?
titions =
kafkaWritePartitions)
detectionWriter.write(dataToWriteToKafka)
Hope that helps!
Bryan Jeffrey
On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego
wrote:
> Thanks Ted.
>
> KafkaWordCount (producer) does not operate on a DStream[T]
>
> ```scala
>
>
ughts?
Regards,
Bryan Jeffrey
ook for or known
bugs in similar instances?
Regards,
Bryan Jeffrey
This is currently being worked on, planned for 2.1 I believe
https://issues.apache.org/jira/browse/SPARK-7159
On May 28, 2016 9:31 PM, "Stephen Boesch" wrote:
> Thanks Phuong But the point of my post is how to achieve without using
> the deprecated the mllib pacakge. The mllib package already ha
In that mode, it will run on the application master, whichever node that is
as specified in your yarn conf.
On Jun 5, 2016 4:54 PM, "Saiph Kappa" wrote:
> Hi,
>
> In yarn-cluster mode, is there any way to specify on which node I want the
> driver to run?
>
> Thanks.
>
ld run in the yarn
> conf? I haven't found any useful information regarding that.
>
> Thanks.
>
> On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler wrote:
>
>> In that mode, it will run on the application master, whichever node that
>> is as specified in your yarn conf.
Hello.
I am looking at the option of moving RDD based operations to Dataset based
operations. We are calling 'reduceByKey' on some pair RDDs we have. What
would the equivalent be in the Dataset interface - I do not see a simple
reduceByKey replacement.
Regards,
Bryan Jeffrey
It would also be nice if there was a better example of joining two
Datasets. I am looking at the documentation here:
http://spark.apache.org/docs/latest/sql-programming-guide.html. It seems a
little bit sparse - is there a better documentation source?
Regards,
Bryan Jeffrey
On Tue, Jun 7, 2016
All,
Thank you for the replies. It seems as though the Dataset API is still far
behind the RDD API. This is unfortunate as the Dataset API potentially
provides a number of performance benefits. I will move to using it in a
more limited set of cases for the moment.
Thank you!
Bryan Jeffrey
MyTopic,43]))
Thank you,
Bryan Jeffrey
Cody,
We already set the maxRetries. We're still seeing issue - when leader is
shifted, for example, it does not appear that direct stream reader
correctly handles this. We're running 1.6.1.
Bryan Jeffrey
On Mon, Jun 13, 2016 at 10:37 AM, Cody Koeninger wrote:
> http://spark.ap
The stack trace you provided seems to hint that you are calling "predict"
on an RDD with Vectors that are not the same size as the number of features
in your trained model, they should be equal. If that's not the issue, it
would be easier to troubleshoot if you could share your code and possibly
s
The problem might be that you are evaluating with "predictionLabel" instead
of "prediction", where predictionLabel is the prediction index mapped to
the original label strings - at least according to the
RandomForestClassifierExample, not sure if your code is exactly the same.
On Tue, Jun 28, 2016
Are you fitting the VectorIndexer to the entire data set and not just
training or test data? If you are able to post your code and some data to
reproduce, that would help in troubleshooting.
On Tue, Jun 28, 2016 at 4:40 PM, Rich Tarro wrote:
> Thanks for the response, but in my case I reversed
Hi Felix,
I think the problem you are describing has been fixed in later versions,
check out this JIRA https://issues.apache.org/jira/browse/SPARK-13803
On Wed, Jun 29, 2016 at 9:27 AM, Mich Talebzadeh
wrote:
> Fine. in standalone mode spark uses its own scheduling as opposed to Yarn
> or anyt
Can you try running the example like this
./bin/run-example sql.RDDRelation
I know there are some jars in the example folders, and running them this
way adds them to the classpath
On Jul 7, 2016 3:47 AM, "kevin" wrote:
> hi,all:
> I build spark use:
>
> ./make-distribution.sh --name "hadoop2.7
es, and index them.
// Set maxCategories so features with > 4 distinct values are treated as
continuous.
val featureIndexer = new
VectorIndexer().setInputCol("features").setOutputCol("indexedFeatures").setMaxCategories(4).fit(digits)
Hope that helps!
On Fri, Jul 1, 2016
Hi Rory, for starters what version of Spark are you using? I believe that
in a 1.5.? release (I don't know which one off the top of my head) there
was an addition that would also display the config property when a timeout
happened. That might help some if you are able to upgrade.
On Jul 18, 2016
rwise you might have luck trying a more recent version of
Spark, such as 1.6.2 or even 2.0.0 (soon to be released) which no longer
uses Akka and the ActorSystem. Hope that helps!
On Tue, Jul 19, 2016 at 2:29 AM, Rory Waite wrote:
> Sorry Bryan, I should have mentioned that I'm running 1.6
Hi JG,
If you didn't know this, Spark MLlib has 2 APIs, one of which uses
DataFrames. Take a look at this example
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
This example uses a Dataset, which is t
ML has a DataFrame based API, while MLlib is RDDs and will be deprecated as
of Spark 2.0.
On Thu, Jul 21, 2016 at 10:41 PM, VG wrote:
> Why do we have these 2 packages ... ml and mlib?
> What is the difference in these
>
>
>
> On Fri, Jul 22, 2016 at 11:09 AM, Bryan Cutler
Everett, I had the same question today and came across this old thread.
Not sure if there has been any more recent work to support this.
http://apache-spark-developers-list.1001551.n3.nabble.com/Using-UDFs-in-Java-without-registration-td12497.html
On Thu, Jul 21, 2016 at 10:10 AM, Everett Anderso
ress
this (and would likely be willing to go fix it myself). Should I just
create a ticket?
Thank you,
Bryan Jeffrey
algorithm?
Thank you,
Bryan Jeffrey
That's the correct fix. I have this done along with a few other Java
examples that still use the old MLlib Vectors in this PR thats waiting for
review https://github.com/apache/spark/pull/14308
On Jul 28, 2016 5:14 AM, "Robert Goodman" wrote:
> I changed import in the sample from
>
> import
The algorithm update is just broken into 2 steps: trainOn - to learn/update
the cluster centers, and predictOn - predicts cluster assignment on data
The StreamingKMeansExample you reference breaks up data into training and
test because you might want to score the predictions. If you don't care
ab
You will need to cast bestModel to include the MLWritable trait. The class
Model does not mix it in by default. For instance:
cvModel.bestModel.asInstanceOf[MLWritable].save("/my/path")
Alternatively, you could save the CV model directly, which takes care of
this
cvModel.save("/my/path")
On F
Hi Roberto,
1. How do they differ in terms of performance?
They both use alternating least squares matrix factorization, the main
difference is ml.recommendation.ALS uses DataFrames as input which has
built-in optimizations and should give better performance
2. Am I correct to assume ml.recommen
I had a bunch of library dependencies that were still using Scala 2.10
versions. I updated them to 2.11 and everything has worked fine since.
On Wed, Dec 16, 2015 at 3:12 AM, Ashwin Sai Shankar
wrote:
> Hi Bryan,
> I see the same issue with 1.5.2, can you pls let me know what w
Hi Andy,
Regarding the foreachrdd return value, this Jira that will be in 1.6 should
take care of that https://issues.apache.org/jira/browse/SPARK-4557 and make
things a little simpler.
On Dec 15, 2015 6:55 PM, "Andy Davidson"
wrote:
> I am writing a JUnit test for some simple streaming code. I
ct().toString());
total += rdd.count();
}
}
MyFunc f = new MyFunc();
inputStream.foreachRDD(f);
// f.total will have the count of all RDDs
Hope that helps some!
-bryan
On Wed, Dec 16, 2015 at 8:37 AM, Bryan Cutler wrote:
> Hi Andy,
>
> Regarding the foreachrdd return valu
This is a known issue https://issues.apache.org/jira/browse/SPARK-9844. As
Noorul said, it is probably safe to ignore as the executor process is
already destroyed at this point.
On Mon, Dec 21, 2015 at 8:54 PM, Noorul Islam K M wrote:
> carlilek writes:
>
> > My users use Spark 1.5.1 in standa
Hi Andrew,
I know that older versions of Spark could not run PySpark on YARN in
cluster mode. I'm not sure if that is fixed in 1.6.0 though. Can you try
setting deploy-mode option to "client" when calling spark-submit?
Bryan
On Thu, Jan 7, 2016 at 2:39 PM, weineran
rk-submit --master yarn --deploy-mode client
--driver-memory 4g --executor-memory 2g --executor-cores 1
./examples/src/main/python/pi.py 10*
That is a good sign that local jobs and Java examples work, probably just a
small configuration issue :)
Bryan
On Wed, Jan 13, 20
"+str(sys.version_info) +"\n"+
str([(k,os.environ[k]) for k in os.environ if "PY" in k]))
On Thu, Jan 14, 2016 at 8:37 AM, Andrew Weiner <
andrewweiner2...@u.northwestern.edu> wrote:
> Hi Bryan,
>
> I ran "$> python --version" on every node on
solved as part of this JIRA
https://issues.apache.org/jira/browse/SPARK-12183
Bryan
On Thu, Jan 14, 2016 at 8:12 AM, Rachana Srivastava <
rachana.srivast...@markmonitor.com> wrote:
> Tried using 1.6 version of Spark that takes numberOfFeatures fifth
> argument in the API but s
If you are able to just train the RandomForestClassificationModel from ML
directly instead of training the old model and converting, then that would
be the way to go.
On Thu, Jan 14, 2016 at 2:21 PM,
wrote:
> Thanks so much Bryan for your response. Is there any workaround?
>
>
Glad you got it going! It's wasn't very obvious what needed to be set,
maybe it is worth explicitly stating this in the docs since it seems to
have come up a couple times before too.
Bryan
On Fri, Jan 15, 2016 at 12:33 PM, Andrew Weiner <
andrewweiner2...@u.northwestern.edu> wr
nd so I am sure we're doing consistent hashing.
The 'reduceAdd' function is adding to a map. The 'inverseReduceFunction' is
subtracting from the map. The filter function is removing items where the
number of entries in the map is zero. Has anyone seen this error before?
Regards,
Bryan Jeffrey
Excuse me - I should have mentioned: I am running Spark 1.4.1, Scala 2.11.
I am running in streaming mode receiving data from Kafka.
Regards,
Bryan Jeffrey
On Mon, Feb 1, 2016 at 9:19 PM, Bryan Jeffrey
wrote:
> Hello.
>
> I have a reduceByKeyAndWindow function with an invertable fun
>From within a Spark job you can use a Periodic Listener:
ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60)))
class PeriodicStatisticsListener(timePeriod: Duration) extends
StreamingListener {
private val logger = LoggerFactory.getLogger("Application")
override def onBatchComple
Arko,
Check this out: https://github.com/Microsoft/SparkCLR
This is a Microsoft authored C# language binding for Spark.
Regards,
Bryan Jeffrey
On Tue, Feb 9, 2016 at 3:13 PM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:
> Doesn't seem to be supported, but
Could you elaborate where the issue is? You say calling
model.latestModel.clusterCenters.foreach(println) doesn't show an updated
model, but that is just a single statement to print the centers once..
Also, is there any reason you don't predict on the test data like this?
model.predictOnValues(t
Can you share more of your code to reproduce this issue? The model should
be updated with each batch, but can't tell what is happening from what you
posted so far.
On Fri, Feb 19, 2016 at 10:40 AM, krishna ramachandran
wrote:
> Hi Bryan
> Agreed. It is a single statement to print
2, 4, 6226.40232139]
>>
>> [1, 2, 785.84266]
>>
>> [5, 1, 6706.05424139]
>>
>>
>>
>> and monitor. please let know if I missed something
>>
>> Krishna
>>
>>
>>
>>
>>
>> On Fri
Using flatmap on a string will treat it as a sequence, which is why you are
getting an RDD of char. I think you want to just do a map instead. Like
this
val timestamps = stream.map(event => event.getCreatedAt.toString)
On Feb 25, 2016 8:27 AM, "Dominik Safaric" wrote:
> Recently, I've implemen
I'm not exactly sure how you would like to setup your LDA model, but I
noticed there was no Python example for LDA in Spark. I created this issue
to add it https://issues.apache.org/jira/browse/SPARK-13500. Keep an eye
on this if it could be of help.
bryan
On Wed, Feb 24, 2016 at 8:
ek.mis...@xerox.com> wrote:
> Hello Bryan,
>
>
>
> Thank you for the update on Jira. I took your code and tried with mine.
> But I get an error with the vector being created. Please see my code below
> and suggest me.
>
> My input file has some conte
Hello.
Is there a suggested method and/or some example code to write results from
a Spark streaming job back to Kafka?
I'm using Scala and Spark 1.4.1.
Regards,
Bryan Jeffrey
Nukunj,
No, I'm not calling set w/ master at all. This ended up being a foolish
configuration problem with my slaves file.
Regards,
Bryan Jeffrey
On Fri, Sep 25, 2015 at 11:20 PM, N B wrote:
> Bryan,
>
> By any chance, are you calling SparkConf.setMaster("loc
cessGivenRole'. Looking at the method calls, the function
that is called for appears to be the same. I was hoping an example might
shed some light on the issue.
Regards,
Bryan Jeffrey
On Thu, Oct 8, 2015 at 7:04 AM, Aniket Bhatnagar wrote:
> Here is an example:
>
>
*
> interval).map(n => (n % interval, n / interval))
> val counts = eventsStream.map(event => {
> (event.timestamp - event.timestamp % interval, event)
> }).updateStateByKey[Long](PrintEventCountsByInterval.counter _, new
> HashPartitioner(3), initialRDD = initialRDD)
&
All,
I'm seeing the following error compiling Spark 1.4.1 w/ Scala 2.11 & Hive
support. Any ideas?
mvn -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive
-Phive-thriftserver package
[INFO] Spark Project Parent POM .. SUCCESS [4.124s]
[INFO] Spark Launcher Proje
All,
The error resolved to a bad version of jline pulling from Maven. The jline
version is defined as 'scala.version' -- the 2.11 version does not exist in
maven. Instead the following should be used:
org.scala-lang
jline
2.11.0-M3
Regards,
Bry
me to a persistent Hive table accomplished? Has
anyone else run into the same issue?
Regards,
Bryan Jeffrey
of the Spark documentation, but do not see version
specified anywhere - it would be a good addition.
Thank you,
Bryan Jeffrey
straightforward way to write to partitioned tables using Spark
SQL? I understand that the read performance for partitioned data is far
better - are there other performance improvements that might be better to
use instead of partitioning?
Regards,
Bryan Jeffrey
luded in the data I get odd string
conversion issues
(3) When partitioning without maps I see frequent out of memory issues
I'll update this email when I've got a more concrete example of problems.
Regards,
Bryan Jeffrey
On Wed, Oct 28, 2015 at 1:33 PM, Susan Zhang w
d this happens every time. Is this a
known issue? Is there a workaround?
Regards,
Bryan Jeffrey
On Wed, Oct 28, 2015 at 3:13 PM, Bryan Jeffrey
wrote:
> Susan,
>
> I did give that a shot -- I'm seeing a number of oddities:
>
> (1) 'Partition By' appears onl
.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
|
| InputFormat:
org.apache.hadoop.mapred.SequenceFileInputFormat
|
| OutputFormat:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|
This seems like a pretty big bug associated with persistent tables. Am I
missing a step somewhere?
Thank you,
Bryan Jeffrey
On Wed,
Deenar,
This worked perfectly - I moved to SQL Server and things are working well.
Regards,
Bryan Jeffrey
On Thu, Oct 29, 2015 at 8:14 AM, Deenar Toraskar
wrote:
> Hi Bryan
>
> For your use case you don't need to have multiple metastores. The default
> metastore uses embedd
conversion prior to
insertion?
Regards,
Bryan Jeffrey
efresh the table on a periodic basis.
Is there a more straightforward way to do this? Is it possible to register
the Cassandra table with Hive so that the SparkSQL thrift server instance
can just read data directly?
Regards,
Bryan Jeffrey
val conf = new SparkConf().set("spark.driver.allowMultipleContexts",
"true").setAppName(appName).setMaster(master)
new StreamingContext(conf, Seconds(seconds))
}
Regards,
Bryan Jeffrey
On Wed, Nov 4, 2015 at 9:49 AM, Ted Yu wrote:
> Are you trying to speed up tests w
Obviously the manually calculated fields are correct. However, the
dynamically calculated (string) partition for idAndSource is a random field
from within my case class. I've duplicated this with several other classes
and have seen the same result (I use this example because it's very simple).
Any idea if this is a known bug? Is there a workaround?
Regards,
Bryan Jeffrey
Mohammed,
That is great. It looks like a perfect scenario. Would I be able to make
the created DF queryable over the Hive JDBC/ODBC server?
Regards,
Bryan Jeffrey
On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller
wrote:
> Short answer: yes.
>
>
>
> The Spark Cassandra Connect
Yes, I do - I found your example of doing that later in your slides. Thank
you for your help!
On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller
wrote:
> Did you mean Hive or Spark SQL JDBC/ODBC server?
>
>
>
> Mohammed
>
>
>
> *From:* Bryan Jeffrey [mailto:bryan
ra OPTIONS (
keyspace "c2", table "detectionresult" );
]Error: java.io.IOException: Failed to open native connection to Cassandra
at {10.0.0.4}:9042 (state=,code=0)
This seems to be connecting to local host regardless of the value I set
spark.cassandra.connection.host to.
Regards,
Br
Answer: In beeline run the following: SET
spark.cassandra.connection.host="10.0.0.10"
On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey
wrote:
> Mohammed,
>
> While you're willing to answer questions, is there a trick to getting the
> Hive Thrift server to connect to
master
spark://10.0.0.4:7077 --packages
com.datastax.spark:spark-cassandra-connector_2.11:1.5.0-M1 --hiveconf
"spark.cores.max=2" --hiveconf "spark.executor.memory=2g"
Do I perhaps need to include an additional library to do the default
conversion?
Regards,
Bryan Jeffrey
On Th
/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L350
-bryan
On Tue, Nov 17, 2015 at 3:06 AM, frula00
wrote:
> Hi,
> I'm working in Java, with Spark 1.3.1 - I am trying to extract data from
> the
> RDD returned by
> org.apache.spark.mllib.clustering.DistributedLD
Hello.
I'm seeing an error creating a Hive Context moving from Spark 1.4.1 to
1.5.2. Has anyone seen this issue?
I'm invoking the following:
new HiveContext(sc) // sc is a Spark Context
I am seeing the following error:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding i
1 - 100 of 183 matches
Mail list logo