Below works for me:
val job = Job.getInstance
val schema = Schema.create(Schema.Type.STRING)
AvroJob.setOutputKeySchema(job, schema)
records.map(item => (new AvroKey[String](item.getGridsumId),
NullWritable.get()))
.saveAsNewAPIHadoopFile(args(1),
Hi Michael,
Thanks for your reply. Is this the correct way to load data from Spark
into Parquet? Somehow it doesn't feel right. When we followed the steps
described for storing the data into Hive tables everything was smooth, we
used HiveContext and the table is automatically recognised by Hive
Hi,
I am a complete newbie to spark and map reduce frameworks and have a basic
question on the reduce function. I was working on the word count example
and was stuck at the reduce stage where the sum happens.
I am trying to understand the working of the reducebykey in Spark using
java as the prog
YES! This worked! thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
I think your questions revolve around the reduce function here, which
is a function of 2 arguments returning 1, whereas in a Reducer, you
implement a function of many-to-many.
This API is simpler if less general. Here you provide an associative
operation that can reduce any 2 values down to 1 (e.g
Excellent, thank you!
On Sat, Aug 2, 2014 at 4:46 AM, Aaron Davidson wrote:
> Ah, that's unfortunate, that definitely should be added. Using a
> pyspark-internal method, you could try something like
>
> javaIterator = rdd._jrdd.toLocalIterator()
> it = rdd._collect_iterator_through_file(javaIte
Hi Team,
Could you please help me to resolve above compilation issue.
Regards,
Rajesh
On Sat, Aug 2, 2014 at 2:02 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi Team,
>
> I'm not able to print the values from Spark Sql JavaSchemaRDD. Please find
> below my code
>
>
Hi Rajesh,
Can you recheck the version and your code again?
I tried similar below code and its work fine (compiles and executes)...
// Apply a schema to an RDD of Java Beans and register it as a table.
JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class);
schemaPeople.r
I noticed misspelling in compilation error (extra letter 'a'):
new Function*a*
But in your code the spelling was right.
A bit confused.
On Fri, Aug 1, 2014 at 1:32 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi Team,
>
> I'm not able to print the values from Spark Sql JavaSch
Hi,
I am running Spark in a single node cluster. I am able to run the codes in
Spark like SparkPageRank.scala, SparkKMeans.scala by the following command,
bin/run-examples org.apache.spark.examples.SparkPageRank
Now, I want to run the Pagerank.scala that is there in GraphX. Do we have a
similar co
Hi,
I am running Spark in a single node cluster. I am able to run the codes in
Spark like SparkPageRank.scala, SparkKMeans.scala by the following command,
bin/run-examples org.apache.spark.examples.SparkPageRank
Now, I want to run the Pagerank.scala that is there in GraphX. Do we have a
similar co
Try this:
./bin/run-example graphx.LiveJournalPageRank <…>
On Aug 2, 2014, at 5:55 PM, Deep Pradhan wrote:
> Hi,
> I am running Spark in a single node cluster. I am able to run the codes in
> Spark like SparkPageRank.scala, SparkKMeans.scala by the following command,
> bin/run-examples org.ap
Hi,
I just implemented our algorithm(like personalised pagerank) using Pregel api,
and it seems works well.
But I am thinking of if I can compute only some selected vertexes(hubs), not to
do "update" on every vertex…
is it possible to do this using Pregel API?
or, more realistically, only hu
Hi,
I have implemented a Low Level Kafka Consumer for Spark Streaming using
Kafka Simple Consumer API. This API will give better control over the Kafka
offset management and recovery from failures. As the present Spark
KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better
control
At 2014-08-02 21:29:33 +0530, Deep Pradhan wrote:
> How should I run graphx codes?
At the moment it's a little more complicated to run the GraphX algorithms than
the Spark examples due to SPARK-1986 [1]. There is a driver program in
org.apache.spark.graphx.lib.Analytics which you can invoke usi
At 2014-08-02 19:04:22 +0200, Yifan LI wrote:
> But I am thinking of if I can compute only some selected vertexes(hubs), not
> to do "update" on every vertex…
>
> is it possible to do this using Pregel API?
The Pregel API already only runs vprog on vertices that received messages in
the previou
I ran into this issue as well. The workaround by copying jar and ivy
manually suggested by Shivaram works for me.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri, Aug 1, 2014 at 3:31 P
Hi Patrick,
In Impala 131, when you update tables and metadata, do you still need to run
'invalidate metadata' in impala-shell? My understanding is that it is a pull
architecture to refresh the metastore on the catalogd in Impala, not sure if
this still applies to this case since you are updatin
I am not a mesos expert... but it sounds like there is some mismatch
between the size that mesos is giving you and the maximum heap size of the
executors (-Xmx).
On Fri, Aug 1, 2014 at 12:07 AM, Gurvinder Singh wrote:
> It is not getting out of memory exception. I am using Mesos as cluster
> ma
We are investigating various ways to integrate with Tachyon. I'll note
that you can already use saveAsParquetFile and
parquetFile(...).registerAsTable("tableName") (soon to be registerTempTable
in Spark 1.1) to store data into tachyon and query it with Spark SQL.
On Fri, Aug 1, 2014 at 1:42 AM,
The number of partitions (which decides the number of tasks) is fixed after
any shuffle and can be configured using 'spark.sql.shuffle.partitions'
though SQLConf (i.e. sqlContext.set(...) or
"SET spark.sql.shuffle.partitions=..." in sql) It is possible we will auto
select this based on statistics
Folks,
When my MacBook's IP address changes spark-shell throws up (when I restart
it). It somehow remembers the old address. I worked around this by using
SPARK_LOCAL_IP=
Mohit
Welcome to
__
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/
I am aware of how to run the LiveJournalPageRank
However, I tried what Ankur had suggested, and I got the result. I have one
question on that. Running either by bin/run-examples or by invoking the
Analytics in GraphX, both of them finally call Analytics, right? So why not
club all the codes in the
I am aware of how to run the LiveJournalPageRank
However, I tried what Ankur had suggested, and I got the result. I have one
question on that. Running either by bin/run-examples or by invoking the
Analytics in GraphX, both of them finally call Analytics, right? So why not
club all the codes in the
24 matches
Mail list logo