from:"spark User"

error bug,please help me!!!

2022-03-20 Thread spark User

Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows &q

Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User

Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows &q

Scala/Python or Java

2015-06-25 Thread spark user

Hi All , I am new for spark , i just want to know which technology is good/best for spark learning ? 1) Scala 2) Java 3) Python I know spark support all 3 languages , but which one is best ? Thanks su

Re: Scala/Python or Java

2015-06-25 Thread spark user

language in the production environment. Learning Scala requires some time. If you're very comfortable with Java / Python, you can go with that while at the same time familiarizing yourself with Scala. Cheers On Thu, Jun 25, 2015 at 12:04 PM, spark user wrote: Hi All , I am new for spark , i jus

Re: s3 bucket access/read file

2015-06-29 Thread spark user

.jets3t.service.S3ServiceException: S3 HEAD request failed for '/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request what does the user has to do here??? i am using key & secret !!! How can i simply create RDD from text file on S3 Thanks Didi -- View this message in

spark - redshift !!!

2015-07-07 Thread spark user

Hi Can you help me how to load data from s3 bucket to redshift , if you gave sample code can you pls send me Thanks su

Re: spark - redshift !!!

2015-07-08 Thread spark user

pe this helps,best,/Shahab On Wed, Jul 8, 2015 at 12:57 AM, spark user wrote: Hi Can you help me how to load data from s3 bucket to redshift , if you gave sample code can you pls send me Thanks su

Java 8 vs Scala

2015-07-14 Thread spark user

Hi All To Start new project in Spark , which technology is good .Java8 OR Scala . I am Java developer , Can i start with Java 8 or I Need to learn Scala . which one is better technology for quick start any POC project Thanks - su

Data Frame for nested json

2015-07-14 Thread spark user

is DataFrame support nested json to dump directely to data base For simple json it working fine {"id":2,"name":"Gerald","email":"gbarn...@zimbio.com","city":"Štoky","country":"Czech Republic","ip":"92.158.154.75”}, But for nested json it failed to load root |-- rows: array (nullable = true)

Re: Java 8 vs Scala

2015-07-15 Thread spark user

I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive all are looking good .if you are very good In Scala the go with Scala otherwise Java is best fit . This is just my openion because I am Ja

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread spark user

Hi Roberto I have question regarding HiveContext . when you create HiveContext where you define Hive connection properties ? Suppose Hive is not in local machine i need to connect , how HiveConext will know the data base info like url ,username and password ? String username = ""; String pas

Data Frame support CSV or excel format ?

2015-08-27 Thread spark user

Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df = sqlContext.read().json("examples/src/main/resources/people.json");

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User

case class Record(keyAttr: String, attr1: String, attr2: String, attr3: String) val ds = sparkSession.createDataset(rdd).as[Record] val attr1Counts = ds.groupBy('keyAttr', 'attr1').count() val attr2Counts = ds.groupBy('keyAttr', 'attr2').count() val attr3Counts = ds.groupBy('keyAttr', 'attr3').

RDD to Dataset results in fixed number of partitions

2016-10-21 Thread Spark User

Hi All, I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The groupBy stage runs with 200 partitions. Although the RDD had 5000 partitions. I also seem to have no way to change that 200 partitions on the Dataset to some other large number. This seems to be affecting the parall

Performance bug in UDAF?

2016-10-30 Thread Spark User

Hi All, I have a UDAF that seems to perform poorly when its input is skewed. I have been debugging the UDAF implementation but I don't see any code that is causing the performance to degrade. More details on the data and the experiments I have run. DataSet: Assume 3 columns, column1 being the key

Re: Performance bug in UDAF?

2016-10-31 Thread Spark User

Trying again. Hoping to find some help in figuring out the performance bottleneck we are observing. Thanks, Bharath On Sun, Oct 30, 2016 at 11:58 AM, Spark User wrote: > Hi All, > > I have a UDAF that seems to perform poorly when its input is skewed. I > have been debugg

Potential memory leak in yarn ApplicationMaster

2016-11-21 Thread Spark User

Hi All, It seems like the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously. The driver crashes with OOM eventually. More details: I have a spark streaming app that runs on spark-2.0. The spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is 204

Re: Performance bug in UDAF?

2017-02-09 Thread Spark User

one has solved similar problem. Thanks, Bharath On Mon, Oct 31, 2016 at 11:40 AM, Spark User wrote: > Trying again. Hoping to find some help in figuring out the performance > bottleneck we are observing. > > Thanks, > Bharath > > On Sun, Oct 30, 2016 at 11:58 AM, Spark User >

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-13 Thread Spark User

Spark has more support for scala, by that I mean more APIs are available for scala compared to python or Java. Also scala code will be more concise and easy to read. Java is very verbose. On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran wrote: > I would say Java, since it will be somewhat similar t

Re: Question about best Spark tuning

2017-02-13 Thread Spark User

My take on the 2-3 tasks per CPU core is that you want to ensure you are utilizing the cores to the max, which means it will help you with scaling and performance. The question would be why not 1 task per core? The reason is that you can probably get a good handle on the average execution time per

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-13 Thread Spark User

>> 发件人：方孝健(玄弟) >> 发送时间：2017年2月10日(星期五) 12:35 >> 收件人：spark-dev ; spark-user >> 主题：Driver hung and happend out of memory while writing to console >> progress bar >> >> [Stage 172:==> (10328 + 93)

error bug,please help me!!!

Unusual bug,please help me,i can do nothing!!!

Scala/Python or Java

Re: Scala/Python or Java

Re: s3 bucket access/read file

spark - redshift !!!

Re: spark - redshift !!!

Java 8 vs Scala

Data Frame for nested json

Re: Java 8 vs Scala

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

Data Frame support CSV or excel format ?

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

RDD to Dataset results in fixed number of partitions

Performance bug in UDAF?

Re: Performance bug in UDAF?

Potential memory leak in yarn ApplicationMaster

Re: Performance bug in UDAF?

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

Re: Question about best Spark tuning

Re: Driver hung and happend out of memory while writing to console progress bar

21 matches

Site Navigation

Mail list logo

Footer information