[no subject]

2014-03-26 Thread Hahn Jiang
Hi, all I write a spark program on yarn. When I use small size input file, my program can run well. But my job will failed if input size is more than 40G. the error log: java.io.FileNotFoundException (java.io.FileNotFoundException: /home/work/data12/yarn/nodemanager/usercache/appcache/application

Re: [BLOG] : Shark on Cassandra

2014-03-26 Thread Rohit Rai
Thanks a lot for this post Brian! It was on our todo list like forever! :) *Founder & CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Wed, Mar 26, 2014 at 10:51 AM, Matei Zaharia wrote: > Very cool, thanks for posting this! > > Matei > >

Re: Announcing Spark SQL

2014-03-26 Thread Rohit Rai
Great work guys! Have been looking forward to this . . . In the blog it mentions support for reading from Hbase/Avro... What will be the recommended approach for this? Will it be writing custom wrappers for SQLContext like in HiveContext or using Hive's "EXTERNAL TABLE" support? I ask this becaus

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Thanks Chen, its a bit clear now and it up and running... 1) In the WebUI, only memory used per node is given. Though in logs i can find out, but does there exist a port over which i can monitor memory usage, GC memory overhead, RDD creation in UI.

Not getting it

2014-03-26 Thread lannyripple
Hi all, I've got something which I think should be straightforward but it's not so I'm not getting it. I have an 8 node spark 0.9.0 cluster also running HDFS. Workers have 16g of memory using 8 cores. In HDFS I have a CSV file of 110M lines of 9 columns (e.g., [key,a,b,c...]). I have another f

Re: Announcing Spark SQL

2014-03-26 Thread Michael Armbrust
> > Any plans to make the SQL typesafe using something like Slick ( > http://slick.typesafe.com/) > I would really like to do something like that, and maybe we will in a couple of months. However, in the near term, I think the top priorities are going to be performance and stability. Michael

Spark preferred compression format

2014-03-26 Thread Debasish Das
Hi, What's the splittable compression format that works with Spark right now ? We are looking into bzip2 / lzo / gzip...gzip is not splittable so not a good optionWithin bzip2/lzo I am confused. Thanks. Deb

Re: Announcing Spark SQL

2014-03-26 Thread Soumya Simanta
Very nice. Any plans to make the SQL typesafe using something like Slick ( http://slick.typesafe.com/) Thanks ! On Wed, Mar 26, 2014 at 5:58 PM, Michael Armbrust wrote: > Hey Everyone, > > This already went out to the dev list, but I wanted to put a pointer here > as well to a new feature we a

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-26 Thread Scott Clasen
The web-ui shows 3 executors, the driver and one spark task on each worker. I do see that there were 8 successful tasks and the ninth failed like so... java.lang.Exception (java.lang.Exception: Could not compute split, block input-0-1395860790200 not found) org.apache.spark.rdd.BlockRDD.compute(B

Re: All pairs shortest paths?

2014-03-26 Thread Matei Zaharia
Yeah, if you’re just worried about statistics, maybe you can do sampling (do single-pair paths from 100 random nodes and you get an idea of what percentage of nodes have what distribution of neighbors in a given distance). Matei On Mar 26, 2014, at 5:55 PM, Ryan Compton wrote: > Much thanks,

Re: All pairs shortest paths?

2014-03-26 Thread Ryan Compton
Much thanks, I suspected this would be difficult. I was hoping to generate some "4 degrees of separation"-like statistics. Looks like I'll just have to work with a subset of my graph. On Wed, Mar 26, 2014 at 5:20 PM, Matei Zaharia wrote: > All-pairs distances is tricky for a large graph because y

Re: Announcing Spark SQL

2014-03-26 Thread Christopher Nguyen
+1 Michael, Reynold et al. This is key to some of the things we're doing. -- Christopher T. Nguyen Co-founder & CEO, Adatao linkedin.com/in/ctnguyen On Wed, Mar 26, 2014 at 2:58 PM, Michael Armbrust wrote: > Hey Everyone, > > This already went out to the dev list, but I wan

Re: All pairs shortest paths?

2014-03-26 Thread Matei Zaharia
All-pairs distances is tricky for a large graph because you need O(V^2) storage. Do you want to just quickly query the distance between two vertices? In that case you can do single-source shortest paths, which I believe exists in GraphX, or at least is very quick to implement on top of its Prege

Re: Announcing Spark SQL

2014-03-26 Thread Matei Zaharia
Congrats Michael & co for putting this together — this is probably the neatest piece of technology added to Spark in the past few months, and it will greatly change what users can do as more data sources are added. Matei On Mar 26, 2014, at 3:22 PM, Ognen Duzlevski wrote: > Wow! > Ognen >

Re: coalescing RDD into equally sized partitions

2014-03-26 Thread Walrus theCat
For the record, I tried this, and it worked. On Wed, Mar 26, 2014 at 10:51 AM, Walrus theCat wrote: > Oh so if I had something more reasonable, like RDD's full of tuples of > say, (Int,Set,Set), I could expect a more uniform distribution? > > Thanks > > > On Mon, Mar 24, 2014 at 11:11 PM, Ma

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
The issue and a workaround can be found here https://github.com/apache/spark/pull/181 On Wed, Mar 26, 2014 at 10:12 PM, Aniket Mokashi wrote: > context.objectFile[ReIdDataSetEntry]("data") -not sure how this is > compiled in scala. But, if it uses some sort of ObjectInputStream, you need > to be

Re: All pairs shortest paths?

2014-03-26 Thread Ryan Compton
To clarify: I don't need the actual paths, just the distances. On Wed, Mar 26, 2014 at 3:04 PM, Ryan Compton wrote: > No idea how feasible this is. Has anyone done it?

Re: Announcing Spark SQL

2014-03-26 Thread Ognen Duzlevski
Wow! Ognen On 3/26/14, 4:58 PM, Michael Armbrust wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-usi

Re: Announcing Spark SQL

2014-03-26 Thread Sean Owen
(Long since taken by a W3C RDF project I'm afraid... http://www.w3.org/TR/rdf-sparql-query/ ) On Wed, Mar 26, 2014 at 10:12 PM, Bingham, Skyler wrote: > Fantastic! Although, I think they missed an obvious name choice: SparkQL > (pronounced sparkle) :)

Re: Announcing Spark SQL

2014-03-26 Thread daniel queiroz
Well done guys!! Thanks! 2014-03-26 19:10 GMT-03:00 Nicholas Chammas : > This is so, so COOL. YES. I'm excited about using this once I'm a bit more > comfortable with Spark. > > Nice work, people! > > > On Wed, Mar 26, 2014 at 5:58 PM, Michael Armbrust > wrote: > >> Hey Everyone, >> >> This al

RE: Announcing Spark SQL

2014-03-26 Thread Bingham, Skyler
Fantastic! Although, I think they missed an obvious name choice: SparkQL (pronounced sparkle) :) Skyler From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, March 26, 2014 3:58 PM To: user@spark.apache.org Subject: Announcing Spark SQL Hey Everyone, This already went out to

Re: Announcing Spark SQL

2014-03-26 Thread Nicholas Chammas
This is so, so COOL. YES. I'm excited about using this once I'm a bit more comfortable with Spark. Nice work, people! On Wed, Mar 26, 2014 at 5:58 PM, Michael Armbrust wrote: > Hey Everyone, > > This already went out to the dev list, but I wanted to put a pointer here > as well to a new feature

Re: Announcing Spark SQL

2014-03-26 Thread Andy Robb
Thanks for sending this out! Andy Robb Senior Product Manager Phone 1.650.265.7612 ar...@walmartlabs.com On Mar 26, 2014, at 2:58 PM, Michael Armbrust wrote: > Hey Everyone, > > This already went out to the dev list, but I wanted to put a pointer here as > well to a new feature we are pretty

All pairs shortest paths?

2014-03-26 Thread Ryan Compton
No idea how feasible this is. Has anyone done it?

Announcing Spark SQL

2014-03-26 Thread Michael Armbrust
Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread giive chen
This response is for Sai The easiest way to verify your current Spark-Shell setting is just type "sc.master" IF your setting is correct, it should return scala> sc.master res0: String = spark://master.ip.url.com:5050 If your SPARK_MASTER_IP is not correct setting, it will response scala> sc.mast

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Aniket Mokashi
context.objectFile[ReIdDataSetEntry]("data") -not sure how this is compiled in scala. But, if it uses some sort of ObjectInputStream, you need to be careful - ObjectInputStream uses root classloader to load classes and does not work with jars that are added to TCCC. Apache commons has ClassLoaderOb

Re: YARN problem using an external jar in worker nodes Inbox x

2014-03-26 Thread Sandy Ryza
Hi Sung, Are you using yarn-standalone mode? Have you specified the --addJars option with your external jars? -Sandy On Wed, Mar 26, 2014 at 1:17 PM, Sung Hwan Chung wrote: > Hello, (this is Yarn related) > > I'm able to load an external jar and use its classes within > ApplicationMaster. I w

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
it seems to be an old problem : http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I Does anyone got the solution ? On Wed, Mar 26, 2014 at 5

Re: rdd.saveAsTextFile problem

2014-03-26 Thread Tathagata Das
Can you give us the more detailed exception + stack trace in the log? It should be in the driver log. If not, please take a look at the executor logs, through the web ui to find the stack trace. TD On Tue, Mar 25, 2014 at 10:43 PM, gaganbm wrote: > Hi Folks, > > Is this issue resolved ? If yes

YARN problem using an external jar in worker nodes Inbox x

2014-03-26 Thread Sung Hwan Chung
Hello, (this is Yarn related) I'm able to load an external jar and use its classes within ApplicationMaster. I wish to use this jar within worker nodes, so I added sc.addJar(pathToJar) and ran. I get the following exception: org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 4 times

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-26 Thread Tathagata Das
Does the Spark applications web-ui give any indication on what kind of resources it is getting from mesos? TD On Wed, Mar 26, 2014 at 12:18 PM, Scott Clasen wrote: > I have a mesos cluster which runs marathon. > > I am using marathon to launch a long running spark streaming job which > consumes

Re: streaming questions

2014-03-26 Thread Tathagata Das
foreachRDD is the underlying function that is used behind print and saveAsTextFiles. If you are curious, here is the actual source https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L759 Yes, since the RDD is used twice, its proba

Re: streaming questions

2014-03-26 Thread Diana Carroll
Thanks, Tagatha, very helpful. A follow-up question below... On Wed, Mar 26, 2014 at 2:27 PM, Tathagata Das wrote: > > > *Answer 3:*You can do something like > wordCounts.foreachRDD((rdd: RDD[...], time: Time) => { >if (rdd.take(1).size == 1) { > // There exists at least one element i

Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-26 Thread Scott Clasen
I have a mesos cluster which runs marathon. I am using marathon to launch a long running spark streaming job which consumes a Kafka Input Stream. With one worker node in the cluster, I can successsfully launch the driver job in marathon, which in turn launches a task in mesos via spark (spark is

Re: interleave partitions?

2014-03-26 Thread Walrus theCat
Answering my own question here. This may not be efficient, but this is what I came up with: rdd1.coalesce(N).glom.zip(rdd2.coalesce(N).glom).map { case(x,y) => x++y} On Wed, Mar 26, 2014 at 11:11 AM, Walrus theCat wrote: > Hi, > > I want to do something like this: > > rdd3 = rdd1.coalesce(N).p

RE: streaming questions

2014-03-26 Thread Adrian Mocanu
Hi Diana, I'll answer Q3: You can check if an RDD is empty in several ways. Someone here mentioned that using an iterator was safer: val isEmpty = rdd.mapPartitions(iter => Iterator(! iter.hasNext)).reduce(_&&_) You can also check with a fold or rdd.count rdd.reduce(_ + _) // can't handle em

Re: streaming questions

2014-03-26 Thread Tathagata Das
*Answer 1:*Make sure you are using master as "local[n]" with n > 1 (assuming you are running it in local mode). The way Spark Streaming works is that it assigns a code to the data receiver, and so if you run the program with only one core (i.e., with local or local[1]), then it wont have resources

streaming questions

2014-03-26 Thread Diana Carroll
I'm trying to understand Spark streaming, hoping someone can help. I've kinda-sorta got a version of Word Count running, and it looks like this: import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.streaming.StreamingContext._ object StreamingWordCount { def m

interleave partitions?

2014-03-26 Thread Walrus theCat
Hi, I want to do something like this: rdd3 = rdd1.coalesce(N).partitions.zip(rdd2.coalesce(N).partitions) I realize the above will get me something like Array[(partition,partition)]. I hope you see what I'm going for here -- any tips on how to accomplish this? Thanks

Re: coalescing RDD into equally sized partitions

2014-03-26 Thread Walrus theCat
Oh so if I had something more reasonable, like RDD's full of tuples of say, (Int,Set,Set), I could expect a more uniform distribution? Thanks On Mon, Mar 24, 2014 at 11:11 PM, Matei Zaharia wrote: > This happened because they were integers equal to 0 mod 5, and we used the > default hashCod

RE: closures & moving averages (state)

2014-03-26 Thread Adrian Mocanu
Tried with reduce and it's giving me pretty weird results that make no sense ie: 1 number for an entire RDD val smaStream= inputStream.reduce{ case(t1,t2) => { val sma= average.addDataPoint(t1) sma }} Tried with transform and that worked correctly, but unfortunately, it

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Yana Kadiyska
I might be way off here but are you looking at the logs on the worker machines? I am running an older version (0.8) and when I look at the error log for the executor process I see the exact location where the executor process tries to load the jar from...with a line like this: 14/03/26 13:57:11 IN

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
In fact, It may be related to object serialization : 14/03/26 17:02:19 INFO TaskSetManager: Serialized task 3.0:1 as 2025 bytes in 1 ms 14/03/26 17:02:19 WARN TaskSetManager: Lost TID 6 (task 3.0:0) 14/03/26 17:02:19 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: value.mode

Re: Change print() in JavaNetworkWordCount

2014-03-26 Thread Sourav Chandra
This works just fine without any compilation error: def print() { def foreachFunc = (rdd: RDD[T], time: Time) => { val total = rdd.collect().toList println ("---") println ("Time: " + time) println ("-

Re: Using an external jar in the driver, in yarn-standalone mode.

2014-03-26 Thread Sandy Ryza
Andrew, Spark automatically deploys the jar on the DFS cache if it's included with the addJars option. It then still needs to be SparkContext.addJar'd to get it to the executors. -Sandy On Wed, Mar 26, 2014 at 6:14 AM, Julien Carme wrote: > Hello Andrew, > > Thanks for the tip, I accessed the

Re: closures & moving averages (state)

2014-03-26 Thread Benjamin Black
Perhaps you want reduce rather than map? On Wednesday, March 26, 2014, Adrian Mocanu wrote: > I'm passing a moving average function during the map phase like this: > > val average= new Sma(window=3) > > stream.map(x=> average.addNumber(x)) > > > > where > > class Sma extends Serializable {

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked through the logs fully? I have seen this (in my limited experience) pop up as a result of previous exceptions/errors, also as a result of being unable to serialize objects etc. Ognen On 3/26/14, 10:39 AM, Jaonary Rabarisoa wrote: I notice that I get this error when I'm trying to

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
I notice that I get this error when I'm trying to load an objectFile with val viperReloaded = context.objectFile[ReIdDataSetEntry]("data") On Wed, Mar 26, 2014 at 3:58 PM, Jaonary Rabarisoa wrote: > Here the output that I get : > > [error] (run-main-0) org.apache.spark.SparkException: Job abort

closures & moving averages (state)

2014-03-26 Thread Adrian Mocanu
I'm passing a moving average function during the map phase like this: val average= new Sma(window=3) stream.map(x=> average.addNumber(x)) where class Sma extends Serializable { .. } I also tried to put creation of object average in an object like I saw in another post: object Average {

Re: Change print() in JavaNetworkWordCount

2014-03-26 Thread Eduardo Costa Alfaia
Thanks Guys for reply, but I have found this piece of code in streaming: def print() { 585 def foreachFunc = (rdd: RDD[T], time: Time) => { 586 //val first11 = rdd.take(11) 587 //val first100 = rdd.take(100) 588 val total = rdd.collect() 589 println ("---

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
Here the output that I get : [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 1.0:1 failed 4 times (most recent failure: Exception failure in TID 6 on host 172.166.86.36: java.lang.ClassNotFoundException: value.models.ReIdDataSetEntry) org.apache.spark.SparkException: Job ab

Shark drop table partitions

2014-03-26 Thread vinay Bajaj
Hi Is there any way to drop all or subset of partitions of table in one run?? In hive we have To drop a partition from a Hive table, this works: ALTER TABLE foo DROP PARTITION(ds = 'date') ...but it should also work to drop all partitions prior to date. ALTER TABLE foo DROP PARTITION(ds < 'd

Re: spark-shell on standalone cluster gives error " no mesos in java.library.path"

2014-03-26 Thread Christoph Böhm
Hi, I have a similar issue like the user below: I’m running Spark 0.8.1 (standalone). When I test the streaming NetworkWordCount example as in the docs with local[2] it works fine. As soon as I want to connect to my cluster using [NetworkWordCount master …] it says: --- Failed to load native Mes

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked at the individual nodes logs? Can you post a bit more of the exception's output? On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote: Hi all, I got java.lang.ClassNotFoundException even with "addJar" called. The jar file is present in each node. I use the version of spark from gith

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
and, yes, I think that picture is a bit misleading, though in the following paragraph it has mentioned that “ Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
master does more work than that actually, I just explained why he should set MASTER_IP correctly a simplified list: 1. maintain the worker status 2. maintain in-cluster driver status 3. maintain executor status (the worker tells master what happened on the executor, -- Nan Zhu On Wed

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Yana Kadiyska
Nan (or anyone who feels they understand the cluster architecture well), can you clarify something for me. >From reading this user group and your explanation above it appears that the cluster master is only involved in this during application startup -- to allocate executors(from what you wrote so

java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
Hi all, I got java.lang.ClassNotFoundException even with "addJar" called. The jar file is present in each node. I use the version of spark from github master. Any ideas ? Jaonary

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what you only need to do is ensure your spark cluster is running well, (you can check by access the Spark UI to see if all workers are displayed) then, you have to set correct SPARK_MASTER_IP in the machine where you run spark-shell The more details are : when you run bin/spark-shell, it will

Re: Using an external jar in the driver, in yarn-standalone mode.

2014-03-26 Thread Julien Carme
Hello Andrew, Thanks for the tip, I accessed the Classpath Entries on the yarn monitoring (in case of yarn it is not localhost:4040 but yarn_master:8088//proxy/[application_id]/environment). I saw that my jar was actually on the CLASSPATH and was available to my application. I realized that I cou

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.] If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in the /conf/slaves at the master node, will the interactive shell code run at the master get distributed across multiple machines ??

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what do you mean by run across the cluster? you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines? if the former case, yes, as long as you indicate the right master URL if the later case, also yes, you can observe the distributed task in the

Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Is it possible to run across cluster using Spark Interactive Shell ? To be more explicit, is the procedure similar to running standalone master-slave spark. I want to execute my code in the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure

Re: Spark Streaming - Shared hashmaps

2014-03-26 Thread Bryan Bryan
Thanks for your support, This is my idea of the project, i'm a newbie so please forgive my misunderstandings: Spark streaming will collect requests, for example: create a table, append records to a table, erase a table (it's just an example). With spark streaming i can filter the messages by key

Re: Spark Streaming - Shared hashmaps

2014-03-26 Thread Tathagata Das
When you say "launch long-running tasks" does it mean long running Spark jobs/tasks, or long-running tasks in another system? If the rate of requests from Kafka is not low (in terms of records per second), you could collect the records in the driver, and maintain the "shared bag" in the driver. A

Re: tracking resource usage for spark-shell commands

2014-03-26 Thread Bharath Bhushan
Thanks for the response Mayur. I was seeing the webui of 0.9.0 spark. I see lots of detailed statistics in the newer 1.0.0-snapshot version. The only thing I found missing was the actual code that I had typed in at the spark-shell prompt, but I can always get it from the shell history. On 26-Ma

Re: Shark does not give any results with SELECT count(*) command

2014-03-26 Thread qingyang li
i have found such log on bigdata003: 14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection from [bigdata001/192.168.1.101] 14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection from [bigdata002/192.168.1.102] 14/03/25 17:08:49 INFO network.ConnectionManager: Accepte

Re: How to set environment variable for a spark job

2014-03-26 Thread santhoma
OK, it was working. I printed System.getenv(..) for both env variables and they gave correct values. However it did not give me the intended result. My intention was to load a native library from LD_LIBRARY_PATH, but looks like the library is loaded from value of -Djava.library.path. Value o

Re: Shark does not give any results with SELECT count(*) command

2014-03-26 Thread qingyang li
hi, Praveen, I can start server on bigdata001 using "/bin/shark --service sharkserver", i can also connect this server using "./bin/shark -h bigdata001" . but, the problem still there: run "select count(*) from b " on bigdata001, no result , no error. run "select count(*) from b " on bigdata0

Spark Streaming - Shared hashmaps

2014-03-26 Thread Bryan Bryan
Hi there, I have read about the two fundamental shared features in spark (broadcasting variables and accumulators), but this is what i need. I'm using spark streaming in order to get requests from Kafka, these requests may launch long-running tasks, and i need to control them: 1) Keep them in a

Re: Shark does not give any results with SELECT count(*) command

2014-03-26 Thread Praveen R
Oh k. You must be running shark server on bigdata001 to use it from other machines. ./bin/shark --service sharkserver # runs shark server on port 1 You could connect to shark server as ./bin/shark -h , this should work unless there is a firewall blocking it. You might use telnet bigdata001 10

Re: Shark does not give any results with SELECT count(*) command

2014-03-26 Thread qingyang li
hi, Praveen, thanks for replying. I am using hive-0.11 which comes from amplab, at the begining , the hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my cluster and then remove some attributes and aslo add some atrributs. i think it is not the reason for my problem, i think

Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-26 Thread qingyang li
Egor, i encounter the same problem which you have asked in this thread: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3CCAMrx5DwJVJS0g_FE7_2qwMu4Xf0y5VfV=tlyauv2kh5v4k6...@mail.gmail.com%3E have you fixed this problem? i am using shark to read a table which i have created on h