Hello, I am a spark user. I use the "spark-shell.cmd" startup command in
windows cmd, the first startup is normal, when I use the "ctrl+c" command to
force the end of the spark window, it can't start normally again. .The error
message is as follows &q
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in
windows cmd, the first startup is normal, when I use the "ctrl+c" command to
force the end of the spark window, it can't start normally again. .The error
message is as follows &q
Hi All ,
I am new for spark , i just want to know which technology is good/best for
spark learning ?
1) Scala 2) Java 3) Python
I know spark support all 3 languages , but which one is best ?
Thanks su
language in the production environment.
Learning Scala requires some time. If you're very comfortable with Java /
Python, you can go with that while at the same time familiarizing yourself with
Scala.
Cheers
On Thu, Jun 25, 2015 at 12:04 PM, spark user
wrote:
Hi All ,
I am new for spark , i jus
.jets3t.service.S3ServiceException: S3 HEAD request failed for
'/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request
what does the user has to do here??? i am using key & secret !!!
How can i simply create RDD from text file on S3
Thanks
Didi
--
View this message in
Hi Can you help me how to load data from s3 bucket to redshift , if you gave
sample code can you pls send me
Thanks su
pe this helps,best,/Shahab
On Wed, Jul 8, 2015 at 12:57 AM, spark user
wrote:
Hi Can you help me how to load data from s3 bucket to redshift , if you gave
sample code can you pls send me
Thanks su
Hi All
To Start new project in Spark , which technology is good .Java8 OR Scala .
I am Java developer , Can i start with Java 8 or I Need to learn Scala .
which one is better technology for quick start any POC project
Thanks
- su
is DataFrame support nested json to dump directely to data base
For simple json it working fine
{"id":2,"name":"Gerald","email":"gbarn...@zimbio.com","city":"Štoky","country":"Czech
Republic","ip":"92.158.154.75”},
But for nested json it failed to load
root |-- rows: array (nullable = true)
I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to
Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive
all are looking good .if you are very good In Scala the go with Scala otherwise
Java is best fit .
This is just my openion because I am Ja
Hi Roberto
I have question regarding HiveContext .
when you create HiveContext where you define Hive connection properties ?
Suppose Hive is not in local machine i need to connect , how HiveConext will
know the data base info like url ,username and password ?
String username = "";
String pas
Hi all ,
Can we create data frame from excels sheet or csv file , in below example It
seems they support only json ?
DataFrame df =
sqlContext.read().json("examples/src/main/resources/people.json");
case class Record(keyAttr: String, attr1: String, attr2: String, attr3:
String)
val ds = sparkSession.createDataset(rdd).as[Record]
val attr1Counts = ds.groupBy('keyAttr', 'attr1').count()
val attr2Counts = ds.groupBy('keyAttr', 'attr2').count()
val attr3Counts = ds.groupBy('keyAttr', 'attr3').
Hi All,
I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The
groupBy stage runs with 200 partitions. Although the RDD had 5000
partitions. I also seem to have no way to change that 200 partitions on the
Dataset to some other large number. This seems to be affecting the
parall
Hi All,
I have a UDAF that seems to perform poorly when its input is skewed. I have
been debugging the UDAF implementation but I don't see any code that is
causing the performance to degrade. More details on the data and the
experiments I have run.
DataSet: Assume 3 columns, column1 being the key
Trying again. Hoping to find some help in figuring out the performance
bottleneck we are observing.
Thanks,
Bharath
On Sun, Oct 30, 2016 at 11:58 AM, Spark User
wrote:
> Hi All,
>
> I have a UDAF that seems to perform poorly when its input is skewed. I
> have been debugg
Hi All,
It seems like the heap usage for
org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously.
The driver crashes with OOM eventually.
More details:
I have a spark streaming app that runs on spark-2.0. The
spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is 204
one has solved similar
problem.
Thanks,
Bharath
On Mon, Oct 31, 2016 at 11:40 AM, Spark User
wrote:
> Trying again. Hoping to find some help in figuring out the performance
> bottleneck we are observing.
>
> Thanks,
> Bharath
>
> On Sun, Oct 30, 2016 at 11:58 AM, Spark User
>
Spark has more support for scala, by that I mean more APIs are available
for scala compared to python or Java. Also scala code will be more concise
and easy to read. Java is very verbose.
On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran
wrote:
> I would say Java, since it will be somewhat similar t
My take on the 2-3 tasks per CPU core is that you want to ensure you are
utilizing the cores to the max, which means it will help you with scaling
and performance. The question would be why not 1 task per core? The reason
is that you can probably get a good handle on the average execution time
per
>> 发件人:方孝健(玄弟)
>> 发送时间:2017年2月10日(星期五) 12:35
>> 收件人:spark-dev ; spark-user
>> 主 题:Driver hung and happend out of memory while writing to console
>> progress bar
>>
>> [Stage 172:==> (10328 + 93)
21 matches
Mail list logo