date:20140802

Re: Is there a way to write spark RDD to Avro files

2014-08-02 Thread Fengyun RAO

Below works for me: val job = Job.getInstance val schema = Schema.create(Schema.Type.STRING) AvroJob.setOutputKeySchema(job, schema) records.map(item => (new AvroKey[String](item.getGridsumId), NullWritable.get())) .saveAsNewAPIHadoopFile(args(1),

Re: Spark SQL, Parquet and Impala

2014-08-02 Thread Patrick McGloin

Hi Michael, Thanks for your reply. Is this the correct way to load data from Spark into Parquet? Somehow it doesn't feel right. When we followed the steps described for storing the data into Hive tables everything was smooth, we used HiveContext and the table is automatically recognised by Hive

Spark ReduceByKey - Working in Java

2014-08-02 Thread Anil Karamchandani

Hi, I am a complete newbie to spark and map reduce frameworks and have a basic question on the reduce function. I was working on the word count example and was stuck at the reduce stage where the sum happens. I am trying to understand the working of the reducebykey in Spark using java as the prog

Re: Is there a way to write spark RDD to Avro files

2014-08-02 Thread touchdown

YES! This worked! thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Spark ReduceByKey - Working in Java

2014-08-02 Thread Sean Owen

I think your questions revolve around the reduce function here, which is a function of 2 arguments returning 1, whereas in a Reducer, you implement a function of many-to-many. This API is simpler if less general. Here you provide an associative operation that can reduce any 2 values down to 1 (e.g

Re: Iterator over RDD in PySpark

2014-08-02 Thread Andrei

Excellent, thank you! On Sat, Aug 2, 2014 at 4:46 AM, Aaron Davidson wrote: > Ah, that's unfortunate, that definitely should be added. Using a > pyspark-internal method, you could try something like > > javaIterator = rdd._jrdd.toLocalIterator() > it = rdd._collect_iterator_through_file(javaIte

Re: spark sql

2014-08-02 Thread Madabhattula Rajesh Kumar

Hi Team, Could you please help me to resolve above compilation issue. Regards, Rajesh On Sat, Aug 2, 2014 at 2:02 AM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi Team, > > I'm not able to print the values from Spark Sql JavaSchemaRDD. Please find > below my code > >

RE: spark sql

2014-08-02 Thread N . Venkata Naga Ravi

Hi Rajesh, Can you recheck the version and your code again? I tried similar below code and its work fine (compiles and executes)... // Apply a schema to an RDD of Java Beans and register it as a table. JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class); schemaPeople.r

Re: spark sql

2014-08-02 Thread Ted Yu

I noticed misspelling in compilation error (extra letter 'a'): new Function*a* But in your code the spelling was right. A bit confused. On Fri, Aug 1, 2014 at 1:32 PM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi Team, > > I'm not able to print the values from Spark Sql JavaSch

GraphX

2014-08-02 Thread Deep Pradhan

Hi, I am running Spark in a single node cluster. I am able to run the codes in Spark like SparkPageRank.scala, SparkKMeans.scala by the following command, bin/run-examples org.apache.spark.examples.SparkPageRank Now, I want to run the Pagerank.scala that is there in GraphX. Do we have a similar co

GraphX

2014-08-02 Thread Deep Pradhan

Hi, I am running Spark in a single node cluster. I am able to run the codes in Spark like SparkPageRank.scala, SparkKMeans.scala by the following command, bin/run-examples org.apache.spark.examples.SparkPageRank Now, I want to run the Pagerank.scala that is there in GraphX. Do we have a similar co

Re: GraphX

2014-08-02 Thread Yifan LI

Try this: ./bin/run-example graphx.LiveJournalPageRank <…> On Aug 2, 2014, at 5:55 PM, Deep Pradhan wrote: > Hi, > I am running Spark in a single node cluster. I am able to run the codes in > Spark like SparkPageRank.scala, SparkKMeans.scala by the following command, > bin/run-examples org.ap

[GraphX] how to compute only a subset of vertices in the whole graph?

2014-08-02 Thread Yifan LI

Hi, I just implemented our algorithm(like personalised pagerank) using Pregel api, and it seems works well. But I am thinking of if I can compute only some selected vertexes(hubs), not to do "update" on every vertex… is it possible to do this using Pregel API? or, more realistically, only hu

Low Level Kafka Consumer for Spark

2014-08-02 Thread Dibyendu Bhattacharya

Hi, I have implemented a Low Level Kafka Consumer for Spark Streaming using Kafka Simple Consumer API. This API will give better control over the Kafka offset management and recovery from failures. As the present Spark KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better control

Re: GraphX

2014-08-02 Thread Ankur Dave

At 2014-08-02 21:29:33 +0530, Deep Pradhan wrote: > How should I run graphx codes? At the moment it's a little more complicated to run the GraphX algorithms than the Spark examples due to SPARK-1986 [1]. There is a driver program in org.apache.spark.graphx.lib.Analytics which you can invoke usi

Re: [GraphX] how to compute only a subset of vertices in the whole graph?

2014-08-02 Thread Ankur Dave

At 2014-08-02 19:04:22 +0200, Yifan LI wrote: > But I am thinking of if I can compute only some selected vertexes(hubs), not > to do "update" on every vertex… > > is it possible to do this using Pregel API? The Pregel API already only runs vprog on vertices that received messages in the previou

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

2014-08-02 Thread DB Tsai

I ran into this issue as well. The workaround by copying jar and ivy manually suggested by Shivaram works for me. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Aug 1, 2014 at 3:31 P

RE: Spark SQL, Parquet and Impala

2014-08-02 Thread Andrew Lee

Hi Patrick, In Impala 131, when you update tables and metadata, do you still need to run 'invalidate metadata' in impala-shell? My understanding is that it is a pull architecture to refresh the metastore on the catalogd in Impala, not sure if this still applies to this case since you are updatin

Re: SQLCtx cacheTable

2014-08-02 Thread Michael Armbrust

I am not a mesos expert... but it sounds like there is some mismatch between the size that mesos is giving you and the maximum heap size of the executors (-Xmx). On Fri, Aug 1, 2014 at 12:07 AM, Gurvinder Singh wrote: > It is not getting out of memory exception. I am using Mesos as cluster > ma

Re: Spark-sql with Tachyon cache

2014-08-02 Thread Michael Armbrust

We are investigating various ways to integrate with Tachyon. I'll note that you can already use saveAsParquetFile and parquetFile(...).registerAsTable("tableName") (soon to be registerTempTable in Spark 1.1) to store data into tachyon and query it with Spark SQL. On Fri, Aug 1, 2014 at 1:42 AM,

Re: Spark SQL Query Plan optimization

2014-08-02 Thread Michael Armbrust

The number of partitions (which decides the number of tasks) is fixed after any shuffle and can be configured using 'spark.sql.shuffle.partitions' though SQLConf (i.e. sqlContext.set(...) or "SET spark.sql.shuffle.partitions=..." in sql) It is possible we will auto select this based on statistics

spark-shell ip address issues

2014-08-02 Thread Mohit Jaggi

Folks, When my MacBook's IP address changes spark-shell throws up (when I restart it). It somehow remembers the old address. I worked around this by using SPARK_LOCAL_IP= Mohit Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/

Re: GraphX

2014-08-02 Thread Deep Pradhan

I am aware of how to run the LiveJournalPageRank However, I tried what Ankur had suggested, and I got the result. I have one question on that. Running either by bin/run-examples or by invoking the Analytics in GraphX, both of them finally call Analytics, right? So why not club all the codes in the

Re: GraphX

2014-08-02 Thread Deep Pradhan

I am aware of how to run the LiveJournalPageRank However, I tried what Ankur had suggested, and I got the result. I have one question on that. Running either by bin/run-examples or by invoking the Analytics in GraphX, both of them finally call Analytics, right? So why not club all the codes in the

Re: Is there a way to write spark RDD to Avro files

Re: Spark SQL, Parquet and Impala

Spark ReduceByKey - Working in Java

Re: Is there a way to write spark RDD to Avro files

Re: Spark ReduceByKey - Working in Java

Re: Iterator over RDD in PySpark

Re: spark sql

RE: spark sql

Re: spark sql

GraphX

GraphX

Re: GraphX

[GraphX] how to compute only a subset of vertices in the whole graph?

Low Level Kafka Consumer for Spark

Re: GraphX

Re: [GraphX] how to compute only a subset of vertices in the whole graph?

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

RE: Spark SQL, Parquet and Impala

Re: SQLCtx cacheTable

Re: Spark-sql with Tachyon cache

Re: Spark SQL Query Plan optimization

spark-shell ip address issues

Re: GraphX

Re: GraphX

24 matches

Site Navigation

Mail list logo

Footer information