Make sure if you are using 127.0.0.1 please check in /etc/hosts and uncheck
or create 127.0.1.1 named it as localhost
On Sat, Mar 21, 2015 at 9:57 AM, Ted Yu wrote:
> bq. Caused by: java.net.UnknownHostException: dhcp-10-35-14-100: Name or
> service not known
>
> Can you check your DNS ?
>
> Che
No, I didn't mean local cluster. I mean run in local, like in IDE.
On Mon, 16 Mar 2015 23:12 xu Peng wrote:
> Hi David,
>
> You can try the local-cluster.
>
> the number in local-cluster[2,2,1024] represents that there are 2 worker,
> 2 cores and 1024M
>
> Best Regards
>
> Peng Xu
>
> 2015-03-16
Hi Sean,
It's getting strange now. If I ran from IDE, my executor memory is always
set to 6.7G, no matter what value I set in code. I have check my
environment variable, and there's no value of 6.7, or 12.5
Any idea?
Thanks,
David
On Tue, 17 Mar 2015 00:35 null wrote:
> Hi Xi Shen,
>
> You c
Hey Eason!
Weird problem indeed. More information will probably help to find te issue:
Have you searched the logs for peculiar messages?
How does your Spark environment look like? #workers, #threads, etc?
Does it work if you create separate receivers for the topics?
Regards,
Jeff
2015-03-21 2:27
Hi,
I'm not completely sure about this either, but this is what we are doing
currently:
Configure your logging to write to STDOUT, not to a file explicitely. Spark
will capture stdour and stderr and separate the messages into a app/driver
folder structure in the configured worker directory.
We the
bq. Requesting 1 new executor(s) because tasks are backlogged
1 executor was requested.
Which hadoop release are you using ?
Can you check resource manager log to see if there is some clue ?
Thanks
On Fri, Mar 20, 2015 at 4:17 PM, Manoj Samel
wrote:
> Forgot to add - the cluster is idle othe
hey mike!
you'll definitely want to increase your parallelism by adding more shards to
the stream - as well as spinning up 1 receiver per shard and unioning all the
shards per the KinesisWordCount example that is included with the kinesis
streaming package.
you'll need more cores (cluster) or t
If you are running from your IDE, then I don't know what you are
running or in what mode. The discussion here concerns using standard
mechanisms like spark-submit to configure executor memory. Please try
these first instead of trying to directly invoke Spark, which will
require more understanding o
Mike:
Once hadoop 2.7.0 is released, you should be able to enjoy the enhanced
performance of s3a.
See HADOOP-11571
Cheers
On Sat, Mar 21, 2015 at 8:09 AM, Chris Fregly wrote:
> hey mike!
>
> you'll definitely want to increase your parallelism by adding more shards
> to the stream - as well as s
Hi,
I wonder if someone can help suggest a solution to my problem, I had a simple
process working using Strings and now
want to convert to RDD[Char], the problem is when I end up with a nested call
as follow:
1) Load a text file into an RDD[Char]
val inputRDD = sc.textFile(“myFile.txt
Hi,
Apologies for the generic question.
As I am developing predictive models for the first time and soon model will
be deployed in production very soon.
Could somebody help me with the model deployment in production , I have
read quite a few on model deployment and have read some books on Datab
I am consistently running into this ArrayIndexOutOfBoundsException issue
when using trainImplicit. I have tried changing the partitions and
switching to JavaSerializer. But they don't seem to help. I see that this
is the same as https://issues.apache.org/jira/browse/SPARK-3080. My lambda
is 0.01, r
Is there a module in spark streaming that lets you listen to
the alerts/conditions as they happen in the streaming module? Generally
spark streaming components will execute on large set of clusters like hdfs
or Cassandra, however when it comes to alerting you generally can't send it
directly from t
1. make sure your secret key doesn't have a "/" in it. If it does, generate a
new key.
2. jets3t and hadoop JAR versions need to be in sync; jets3t 0.9.0 was picked
up in Hadoop 2.4 and not AFAIK
3. Hadoop 2.6 has a new S3 client, "s3a", which compatible with s3n data. It
uses the AWS toolkit
Thank you for your help Akhil! We found that it is no longer working from
our laptop to remotely connect to the remote Spark cluster, but it works if
the client is on the remote cluster as well, starting from the version
1.2.0 and beyond (v1.1.1 and below are fine). Not sure if this is related
th
I believe that you can get what you want by using HiveQL instead of the
pure programatic API. This is a little verbose so perhaps a specialized
function would also be useful here. I'm not sure I would call it
saveAsExternalTable as there are also "external" spark sql data source
tables that have
>
> Now, I am not able to directly use my RDD object and have it implicitly
> become a DataFrame. It can be used as a DataFrameHolder, of which I could
> write:
>
> rdd.toDF.registerTempTable("foo")
>
The rational here was that we added a lot of methods to DataFrame and made
the implicits more
I have a couple of data frames that I pulled from SparkSQL and the primary
key of one is a foreign key of the same name in the other. I'd rather not
have to specify each column in the SELECT statement just so that I can
rename this single column.
When I try to join the data frames, I get an excep
In the log, I saw
MemoryStorage: MemoryStore started with capacity 6.7GB
But I still can not find where to set this storage capacity.
On Sat, 21 Mar 2015 20:30 Xi Shen wrote:
> Hi Sean,
>
> It's getting strange now. If I ran from IDE, my executor memory is always
> set to 6.7G, no matter wha
Yeah, I think it is harder to troubleshot the properties issues in a IDE.
But the reason I stick to IDE is because if I use spark-submit, the BLAS
native cannot be loaded. May be I should open another thread to discuss
that.
Thanks,
David
On Sun, 22 Mar 2015 10:38 Xi Shen wrote:
> In the log, I
Hi,
I use the *OpenBLAS* DLL, and have configured my application to work in
IDE. When I start my Spark application from IntelliJ IDE, I can see in the
log that the native lib is loaded successfully.
But if I use *spark-submit* to start my application, the native lib still
cannot be load. I saw th
Hi,
Does anyone have concrete recommendations how to reduce Spark's logging
verbosity.
We have attempted on several occasions to address this by setting various
log4j properties, both in configuration property files and in
$SPARK_HOME/conf/ spark-env.sh; however, all of those attempts have failed.
Can you try the --driver-library-path option ?
spark-submit --driver-library-path /opt/hadoop/lib/native ...
Cheers
On Sat, Mar 21, 2015 at 4:58 PM, Xi Shen wrote:
> Hi,
>
> I use the *OpenBLAS* DLL, and have configured my application to work in
> IDE. When I start my Spark application from In
Hello,
I am trying to install Spark 1.3.0 on my mac. Earlier, I was working with
Spark 1.1.0. Now, I come across this error :
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-network-common_2.10;1.3.0: configuration not public
in org.apache.spark#spark-network-common_2.10;1.3.0
bq. the BLAS native cannot be loaded
Have you tried specifying --driver-library-path option ?
Cheers
On Sat, Mar 21, 2015 at 4:42 PM, Xi Shen wrote:
> Yeah, I think it is harder to troubleshot the properties issues in a IDE.
> But the reason I stick to IDE is because if I use spark-submit, the
Hi Shashidhar,
Our team at PredictionIO is trying to solve the production deployment of
model. We built a powered-by-Spark framework (also certified on Spark by
Databricks) that allows a user to build models with everything available
from the Spark API, persist the model automatically with version
Hi,
I have two big RDD, and I need to do some math against each pair of them.
Traditionally, it is like a nested for-loop. But for RDD, it cause a nested
RDD which is prohibited.
Currently, I am collecting one of them, then do a nested for-loop, so to
avoid nested RDD. But would like to know if t
You can do this with the 'cartesian' product method on RDD. For example:
val rdd1 = ...
val rdd2 = ...
val combinations = rdd1.cartesian(rdd2).filter{ case (a,b) => a < b }
Reza
On Sat, Mar 21, 2015 at 10:37 PM, Xi Shen wrote:
> Hi,
>
> I have two big RDD, and I need to do some math against e
What do you mean not distinct?
It does works for me:
[image: Inline image 1]
Code:
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}
val ssc = new StreamingContext(sc, Seconds(1))
val data =
ssc.textFileStream("/home/akhld/mobi/loca
29 matches
Mail list logo