Hi,
Quick question on data type transform when creating RDD object.
I want to create a person object with "name" and DOB(date of birth):
case class Person(name: String, DOB: java.sql.Date)
then I want to create an RDD from a text file without the header, e.g. "name"
and "DOB". I have pr
Hi,
Can we configure Spark to enable SSE (Server Side Encryption) for saving files
to s3?
much appreciated!
thanks
Confidentiality Notice:: This email, including attachments, may include
non-public, proprietary, confidential or legally privileged information. If
you are not an intended rec
Hi,
I have a question on the number of workers that Spark enable to parallelize the
loading of files using sc.textFile. When I used sc.textFile to access multiple
files in AWS S3, it seems to only enable 2 workers regardless of how many
worker nodes I have in my cluster. So how does Spark confi
Hi Xiangrui,
For the following problem, I found out an issue ticket you posted before
https://issues.apache.org/jira/browse/HADOOP-10614
I wonder if this has been fixed in Spark 1.5.2 which I believe so. Any
suggestion on how to fix it?
Thanks
Hao
From: Lin, Hao [mailto:hao@finra.org
Hi Robert,
I just use textFile. Here is the simple code:
val fs3File=sc.textFile("s3n://my bucket/myfolder/")
fs3File.count
do you suggest I should use sc.parallelize?
many thanks
From: Robert Collich [mailto:rcoll...@gmail.com]
Sent: Monday, February 01, 2016 6:54 PM
To: Lin,
If you look at the Spark Doc, variable SPARK_WORKER_INSTANCES can still be
specified but yet the SPARK_EXECUTOR_INSTANCES
http://spark.apache.org/docs/1.5.2/spark-standalone.html
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Monday, February 01, 2016 5:45 PM
To: Lin, Hao
Cc: user
Subject
When I tried to read multiple bz2 files from s3, I have the following warning
messages. What is the problem here?
16/02/01 22:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
10.162.67.248): java.lang.ArrayIndexOutOfBoundsException: -1844424343
at
org.apache.hadoop.io.compr
Can I still use SPARK_WORKER_INSTANCES in conf/spark-env.sh? the following is
what I’ve got after trying to set this parameter and run spark-shell
SPARK_WORKER_INSTANCES was detected (set to '32').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to sp
: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: how to access local file from Spark sc.textFile("file:///path
to/myfile")
Hm, are you referencing a local file from your remote workers? That won't work
as the file only exists in one machine (I presume).
On Fri, Dec 11, 2015 at 5:
Yes to your question. I have spun up a cluster, login to the master as a root
user, run spark-shell, and reference the local file of the master machine.
From: Vijay Gharge [mailto:vijay.gha...@gmail.com]
Sent: Friday, December 11, 2015 12:50 PM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re
Here you go, thanks.
-rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv
From: Vijay Gharge [mailto:vijay.gha...@gmail.com]
Sent: Friday, December 11, 2015 12:31 PM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: how to access local file from Spark sc.textFile("file:///path
to/m
Hi,
I have problem accessing local file, with such example:
sc.textFile("file:///root/2008.csv").count()
with error: File file:/root/2008.csv does not exist.
The file clearly exists since, since if I missed type the file name to an
non-existing one, it will show:
Error: Input path does not exi
Hi Andy, quick question, does Spark-Notebook include its own Spark engine, or I
need to install Spark separately and point to it from Spark Notebook? thanks
From: Lin, Hao [mailto:hao@finra.org]
Sent: Tuesday, December 08, 2015 7:01 PM
To: andy petrella; Jörn Franke
Cc: user@spark.apache.org
Thanks Andy, I certainly will give a try to your suggestion.
From: andy petrella [mailto:andy.petre...@gmail.com]
Sent: Tuesday, December 08, 2015 1:21 PM
To: Lin, Hao; Jörn Franke
Cc: user@spark.apache.org
Subject: Re: Graph visualization tool for GraphX
Hello Lin,
This is indeed a tough
specific ☺. Thanks
hao
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Tuesday, December 08, 2015 11:31 AM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: Graph visualization tool for GraphX
I am not sure about your use case. How should a human interpret many terabytes
of data in one large
Hi,
Anyone can recommend a great Graph visualization tool for GraphX that can
handle truly large Data (~ TB) ?
Thanks so much
Hao
Confidentiality Notice:: This email, including attachments, may include
non-public, proprietary, confidential or legally privileged information. If
you are not
Thanks, I will keep an eye on it.
From: Michal Klos [mailto:michal.klo...@gmail.com]
Sent: Friday, December 04, 2015 1:50 PM
To: Lin, Hao
Cc: user
Subject: Re: Is Temporary Access Credential (AccessKeyId, SecretAccessKey +
SecurityToken) support by Spark?
We were looking into this as well
Hi,
Does anyone knows if Spark run in AWS is supported by temporary access
credential (AccessKeyId, SecretAccessKey + SecurityToken) to access S3? I only
see references to specify fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey,
without any mention of security token. Apparently this is only
I actually don't have the folder /tmp/hive created in my master node, is that a
problem?
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Wednesday, December 02, 2015 5:40 PM
To: Lin, Hao; user@spark.apache.org
Subject: RE: starting spark-shell throws /tmp/hive on HDFS should be wri
Mich, did you run this locally or on EC2 (I use EC2)? Is this problem
universal or specific to, say EC2? Many thanks
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Wednesday, December 02, 2015 5:01 PM
To: Lin, Hao; user@spark.apache.org
Subject: RE: starting spark-shell throws /tmp
I also have the same problem on my side using version 1.5.0. Just wonder if
anyone has any update on this. Should I go back to the earlier version, like
1.4.0?
thanks
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Friday, November 20, 2015 5:43 PM
To: user@spark.apache.org
Subject: FW
It seems that the data size is only 2.9MB, far less than the default rdd
size. How about put more data into kafka? and what about the number of
topic partitions from kafka?
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao
For you question, I think the discussion in this link can help.
http://apache-spark-user-list.1001560.n3.nabble.com/Error-related-to-serialisation-in-spark-streaming-td6801.html
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos
btw, from spark web ui, the acl is marked with root
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao/sets
From: Dean Wampler
To: Lin Hao Xu/China/IBM@IBMCN
Cc: Hai Shan Wu/China/IBM@IBMCN, user
Date: 2015/04
Actually, to simplify this problem, we run our program on a single machine
with 4 slave workers. Since on a single machine, I think all slave workers
are ran with root privilege.
BTW, if we have a cluster, how to make sure slaves on remote machines run
program as root?
Best regards,
Lin Hao XU
.
3. We also tested List nifs = Pcaps.findAllDevs() in
a standard Java program, it really worked like a champion.
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao/sets
From: Dean Wampler
To: Hai Shan Wu/China/IBM@IBMCN
26 matches
Mail list logo