Re: Spark/HIVE Insert Into values Error

2014-10-25 Thread arthur.hk.c...@gmail.com
Hi, I have already found the way about how to “insert into HIVE_TABLE values (…..) Regards Arthur On 18 Oct, 2014, at 10:09 pm, Cheng Lian wrote: > Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO > ... VALUES ... syntax. > > On 10/18/14 1:33 AM, arthur.hk.c...@gma

Re: Spark as Relational Database

2014-10-25 Thread Soumya Simanta
1. What data store do you want to store your data in ? HDFS, HBase, Cassandra, S3 or something else? 2. Have you looked at SparkSQL (https://spark.apache.org/sql/)? One option is to process the data in Spark and then store it in the relational database of your choice. On Sat, Oct 25, 2014 at 1

Spark as Relational Database

2014-10-25 Thread Peter Wolf
Hello all, We are considering Spark for our organization. It is obviously a superb platform for processing massive amounts of data... how about retrieving it? We are currently storing our data in a relational database in a star schema. Retrieving our data requires doing many complicated joins a

Re: Multitenancy in Spark - within/across spark context

2014-10-25 Thread RJ Nowling
Ashwin, What is your motivation for needing to share RDDs between jobs? Optimizing for reusing data across jobs? If so, you may want to look into Tachyon. My understanding is that Tachyon acts like a caching layer and you can designate when data will be reused in multiple jobs so it know to keep

Re: Read a TextFile(1 record contains 4 lines) into a RDD

2014-10-25 Thread Xiangrui Meng
If your file is not very large, try sc.wholeTextFiles("...").values.flatMap(_.split("\n").grouped(4).map(_.mkString("\n"))) -Xiangrui On Sat, Oct 25, 2014 at 12:57 AM, Parthus wrote: > Hi, > > It might be a naive question, but I still wish that somebody could help me > handle it. > > I have a

Asymmetric spark cluster memory utilization

2014-10-25 Thread Manas Kar
Hi, I have a spark cluster that has 5 machines with 32 GB memory each and 2 machines with 24 GB each. I believe the spark.executor.memory will assign the executor memory for all executors. How can I use 32 GB memory from the first 5 machines and 24 GB from the next 2 machines. Thanks ..Manas

Re: Bug in Accumulators...

2014-10-25 Thread Rishi Yadav
works fine. Spark 1.1.0 on REPL On Sat, Oct 25, 2014 at 1:41 PM, octavian.ganea wrote: > There is for sure a bug in the Accumulators code. > > More specifically, the following code works well as expected: > > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("EL LBP SP

Re: Shuffle issues in the current master

2014-10-25 Thread DB Tsai
Hi Andrew, We were running the master after SPARK-3613. Will give another shot against the current master while Josh fixed couple issues in shuffle. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.c

Bug in Accumulators...

2014-10-25 Thread octavian.ganea
There is for sure a bug in the Accumulators code. More specifically, the following code works well as expected: def main(args: Array[String]) { val conf = new SparkConf().setAppName("EL LBP SPARK") val sc = new SparkContext(conf) val accum = sc.accumulator(0) sc.parallelize(Arr

Accumulators : Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-10-25 Thread octavian.ganea
Hi all, I tried to use accumulators without any success so far. My code is simple: val sc = new SparkContext(conf) val accum = sc.accumulator(0) val partialStats = sc.textFile(f.getAbsolutePath()) .map(line => { val key = line.split("\t").head; (key , line)} )

NullPointerException when using Accumulators on cluster

2014-10-25 Thread octavian.ganea
Hi, I have a simple accumulator that needs to be passed to a foo() function inside a map job: val myCounter = sc.accumulator(0) val myRDD = sc.textFile(inputpath) // :spark.RDD[String] myRDD.flatMap(line => foo(line)) def foo(line: String) = { myCounter += 1 // line throwing NullPointerExcep

Re: spark-submit memory too larger

2014-10-25 Thread marylucy
Version: spark 1.1.0 42 workers,40g memory per worker Running graphx componentgraph ,use five hours > 在 Oct 25, 2014,1:27,"Sameer Farooqui" 写道: > > That does seem a bit odd. How many Executors are running under this Driver? > > Does the spark-submit process start out using ~60GB of memory righ

Read a TextFile(1 record contains 4 lines) into a RDD

2014-10-25 Thread Parthus
Hi, It might be a naive question, but I still wish that somebody could help me handle it. I have a textFile, in which every 4 lines represent a record. Since SparkContext.textFile() API deems of one line as a record, it does not fit into my case. I know that SparkContext.hadoopFile or newAPIHadoo