date:20151005

How can I access data on RDDs?

2015-10-05 Thread jatinganhotra

Consider the following 2 scenarios: *Scenario #1* val pagecounts = sc.textFile("data/pagecounts") pagecounts.checkpoint pagecounts.count *Scenario #2* val pagecounts = sc.textFile("data/pagecounts") pagecounts.count The total time show in the Spark shell Application UI was different for both sce

FW: Spark error while running in spark mode

2015-10-05 Thread Ratika Prasad

From: Ratika Prasad Sent: Monday, October 05, 2015 2:39 PM To: u...@spark.apache.org Cc: Ameeta Jayarajan Subject: Spark error while running in spark mode Hi, When we run our spark component in cluster mode as below we get the following error ./bin/spark-submit --class com.coupons.stream.pr

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Patrick Wendell

The missing artifacts are uploaded now. Things should propagate in the next 24 hours. If there are still issues past then ping this thread. Thanks! - Patrick On Mon, Oct 5, 2015 at 2:41 PM, Nicholas Chammas wrote: > Thanks for looking into this Josh. > > On Mon, Oct 5, 2015 at 5:39 PM Josh Rose

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Reynold Xin

I meant to say just copy everything to a local hdfs, and then don't use caching ... On Mon, Oct 5, 2015 at 4:52 PM, Jegan wrote: > I am sorry, I didn't understand it completely. Are you suggesting to copy > the files from S3 to HDFS? Actually, that is what I am doing. I am reading > the files u

Re: spark hive branch location

2015-10-05 Thread weoccc

Hi Michael, Thanks for pointing me the branch. What's the build instructions to build the hive 1.2.1 release branch for spark 1.5 ? Weide On Mon, Oct 5, 2015 at 12:06 PM, Michael Armbrust wrote: > I think this is the most up to date branch (used in Spark 1.5): > https://github.com/pwendell/hiv

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Jegan

I am sorry, I didn't understand it completely. Are you suggesting to copy the files from S3 to HDFS? Actually, that is what I am doing. I am reading the files using Spark and persisting it locally. Or did you actually mean to ask the producer to write the files directly to HDFS instead of S3? I am

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Reynold Xin

You can write the data to local hdfs (or local disk) and just load it from there. On Mon, Oct 5, 2015 at 4:37 PM, Jegan wrote: > Thanks for your suggestion Ted. > > Unfortunately at this point of time I cannot go beyond 1000 partitions. I > am writing this data to BigQuery and it has a limit of

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Jegan

Thanks for your suggestion Ted. Unfortunately at this point of time I cannot go beyond 1000 partitions. I am writing this data to BigQuery and it has a limit of 1000 jobs per day for a table(they have some limits on this) I currently create 1 load job per partition. Is there any other work-around

Re: Dataframes: PrunedFilteredScan without Spark Side Filtering

2015-10-05 Thread Russell Spitzer

That sounds fine to me, we already do the filtering so populating that field would be pretty simple. On Sun, Sep 27, 2015 at 2:08 PM Michael Armbrust wrote: > We have to try and maintain binary compatibility here, so probably the > easiest thing to do here would be to add a method to the class.

Re: StructType has more rows, than corresponding Row has objects.

2015-10-05 Thread Davies Liu

Could you tell us a way to reproduce this failure? Reading from JSON or Parquet? On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov wrote: > Hi, > > We're building our own framework on top of spark and we give users pretty > complex schema to work with. That requires from us to build dataframes by >

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu

As a workaround, can you set the number of partitions higher in the sc.textFile method ? Cheers On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote: > Hi All, > > I am facing the below exception when the size of the file being read in a > partition is above 2GB. This is apparently because Java's limita

IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Jegan

Hi All, I am facing the below exception when the size of the file being read in a partition is above 2GB. This is apparently because Java's limitation on memory mapped files. It supports mapping only 2GB files. Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at s

Re: failure notice

2015-10-05 Thread Tathagata Das

What happens when a whole node running your " per node streaming engine (built-in checkpoint and recovery)" fails? Can its checkpoint and recovery mechanism handle whole node failure? Can you recover from the checkpoint on a different node? Spark and Spark Streaming were designed with the idea th

Re: failure notice

2015-10-05 Thread Renyi Xiong

if RDDs from same DStream not guaranteed to run on same worker, then the question becomes: is it possible to specify an unlimited duration in ssc to have a continuous stream (as opposed to discretized). say, we have a per node streaming engine (built-in checkpoint and recovery) we'd like to integ

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Nicholas Chammas

Thanks for looking into this Josh. On Mon, Oct 5, 2015 at 5:39 PM Josh Rosen wrote: > I'm working on a fix for this right now. I'm planning to re-run a modified > copy of the release packaging scripts which will emit only the missing > artifacts (so we won't upload new artifacts with different S

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Josh Rosen

I'm working on a fix for this right now. I'm planning to re-run a modified copy of the release packaging scripts which will emit only the missing artifacts (so we won't upload new artifacts with different SHAs for the builds which *did* succeed). I expect to have this finished in the next day or s

HiveContext in standalone mode: shuffle hang ups

2015-10-05 Thread Saif.A.Ellafi

Hi all, I have a process where local mode takes only 40 seconds. While the same on stand-alone mode, being the same node used for local mode the only available node, is taking up for ever. rdd actions hang up. I could only "sort this out" by turning speculation on, so the same task hanging is

Re: spark hive branch location

2015-10-05 Thread Michael Armbrust

I think this is the most up to date branch (used in Spark 1.5): https://github.com/pwendell/hive/tree/release-1.2.1-spark On Mon, Oct 5, 2015 at 1:03 PM, weoccc wrote: > Hi, > > I would like to know where is the spark hive github location where spark > build depend on ? I was told it used to be

spark hive branch location

2015-10-05 Thread weoccc

Hi, I would like to know where is the spark hive github location where spark build depend on ? I was told it used to be here https://github.com/pwendell/hive but it seems it is no longer there. Thanks a lot, Weide

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Ewan Leith

Thanks Yin, I'll put together a JIRA and a PR tomorrow. Ewan -- Original message-- From: Yin Huai Date: Mon, 5 Oct 2015 17:39 To: Ewan Leith; Cc: dev@spark.apache.org; Subject:Re: Dataframe nested schema inference from Json without type conflicts Hello Ewan, Adding a JSON-specif

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Yin Huai

Hello Ewan, Adding a JSON-specific option makes sense. Can you open a JIRA for this? Also, sending out a PR will be great. For JSONRelation, I think we can pass all user-specific options to it (see org.apache.spark.sql.execution.datasources.json.DefaultSource's createRelation) just like what we do

RE: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Ewan Leith

I've done some digging today and, as a quick and ugly fix, altering the case statement of the JSON inferField function in InferSchema.scala https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala to have case VALUE_ST

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Nicholas Chammas

Blaž said: Also missing is http://s3.amazonaws.com/spark-related-packages/spark-1.5.1-bin-hadoop1.tgz which breaks spark-ec2 script. This is the package I am referring to in my original email. Nick said: It appears that almost every version of Spark up to and including 1.5.0 has included a —bin

StructType has more rows, than corresponding Row has objects.

2015-10-05 Thread Eugene Morozov

Hi, We're building our own framework on top of spark and we give users pretty complex schema to work with. That requires from us to build dataframes by ourselves: we transform business objects to rows and struct types and uses these two to create dataframe. Everything was fine until I started to

Re: Difference between a task and a job

2015-10-05 Thread Daniel Darabos

Actions trigger jobs. A job is made up of stages. A stage is made up of tasks. Executor threads execute tasks. Does that answer your question? On Mon, Oct 5, 2015 at 12:52 PM, Guna Prasaad wrote: > What is the difference between a task and a job in spark and > spark-streaming? > > Regards, > Gu

Difference between a task and a job

2015-10-05 Thread Guna Prasaad

What is the difference between a task and a job in spark and spark-streaming? Regards, Guna

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Blaž Šnuderl

Also missing is http://s3.amazonaws.com/spark-related-packages/spark-1.5.1-bin-hadoop1.tgz which breaks spark-ec2 script. On Mon, Oct 5, 2015 at 5:20 AM, Ted Yu wrote: > hadoop1 package for Scala 2.10 wasn't in RC1 either: > http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/

How can I access data on RDDs?

FW: Spark error while running in spark mode

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: spark hive branch location

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: Dataframes: PrunedFilteredScan without Spark Side Filtering

Re: StructType has more rows, than corresponding Row has objects.

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: failure notice

Re: failure notice

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

HiveContext in standalone mode: shuffle hang ups

Re: spark hive branch location

spark hive branch location

Re: Dataframe nested schema inference from Json without type conflicts

Re: Dataframe nested schema inference from Json without type conflicts

RE: Dataframe nested schema inference from Json without type conflicts

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

StructType has more rows, than corresponding Row has objects.

Re: Difference between a task and a job

Difference between a task and a job

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

27 matches

Site Navigation

Mail list logo

Footer information