Re: Visualizing Spark Streaming data

2015-03-20 Thread Roger Hoover
Hi Harut, Jeff's right that Kibana + Elasticsearch can take you quite far out of the box. Depending on your volume of data, you may only be able to keep recent data around though. Another option that is custom-built for handling many dimensions at query time (not as separate metrics) is Druid (h

Re: Running a spark-submit compatible app in spark-shell

2014-06-04 Thread Roger Hoover
014 at 8:42 AM, Roger Hoover wrote: > Thanks, Andrew. I'll give it a try. > > > On Mon, May 26, 2014 at 2:22 PM, Andrew Or wrote: > >> Hi Roger, >> >> This was due to a bug in the Spark shell code, and is fixed in the latest >> master (and RC11). Her

Re: Using Spark on Data size larger than Memory size

2014-06-05 Thread Roger Hoover
Hi Aaron, When you say that sorting is being worked on, can you elaborate a little more please? If particular, I want to sort the items within each partition (not globally) without necessarily bringing them all into memory at once. Thanks, Roger On Sat, May 31, 2014 at 11:10 PM, Aaron Davidso

Re: Using Spark on Data size larger than Memory size

2014-06-05 Thread Roger Hoover
I think it would very handy to be able to specify that you want sorting during a partitioning stage. On Thu, Jun 5, 2014 at 4:42 PM, Roger Hoover wrote: > Hi Aaron, > > When you say that sorting is being worked on, can you elaborate a little > more please? > > If particular,

Re: Using Spark on Data size larger than Memory size

2014-06-06 Thread Roger Hoover
> > As far as the work that Aaron mentioned is happening, I think he might be > referring to the discussion and code surrounding > https://issues.apache.org/jira/browse/SPARK-983 > > Cheers! > Andrew > > > On Thu, Jun 5, 2014 at 5:16 PM, Roger Hoover > wrote: > &g

Re: Low Level Kafka Consumer for Spark

2014-08-30 Thread Roger Hoover
I have this same question. Isn't there somewhere that the Kafka range metadata can be saved? From my naive perspective, it seems like it should be very similar to HDFS lineage. The original HDFS blocks are kept somewhere (in the driver?) so that if an RDD partition is lost, it can be recomputed.

Re: Is Spark the right tool for me?

2014-12-02 Thread Roger Hoover
"I’ve also considered to use Kafka to message between Web UI and the pipes, I think it will fit. Chaining the pipes together as a workflow and implementing, managing and monitoring these long running user tasks with locality as I need them is still causing me headache." You can look at Apache Sam

Re: Spark - ready for prime time?

2014-04-10 Thread Roger Hoover
Can anyone comment on their experience running Spark Streaming in production? On Thu, Apr 10, 2014 at 10:33 AM, Dmitriy Lyubimov wrote: > > > > On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash wrote: > >> The biggest issue I've come across is that the cluster is somewhat >> unstable when under memor

How to cogroup/join pair RDDs with different key types?

2014-04-14 Thread Roger Hoover
Hi, I'm trying to figure out how to join two RDDs with different key types and appreciate any suggestions. Say I have two RDDS: ipToUrl of type (IP, String) ipRangeToZip of type (IPRange, String) How can I join/cogroup these two RDDs together to produce a new RDD of type (IP, (String, St

Re: How to cogroup/join pair RDDs with different key types?

2014-04-15 Thread Roger Hoover
t; of CIDR notations and do the join then, but you're starting to have the > cartesian product work against you on scale at that point. > > Andrew > > > On Tue, Apr 15, 2014 at 1:07 AM, Roger Hoover wrote: > >> Hi, >> >> I'm trying to figure out how to

Re: How to cogroup/join pair RDDs with different key types?

2014-04-15 Thread Roger Hoover
I'm thinking of creating a union type for the key so that IPRange and IP types can be joined. On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover wrote: > Andrew, > > Thank you very much for your feedback. Unfortunately, the ranges are not > of predictable size but you gave me

Re: How to cogroup/join pair RDDs with different key types?

2014-04-16 Thread Roger Hoover
Ah, in case this helps others, looks like RDD.zipPartitions will accomplish step 4. On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover wrote: > Andrew, > > Thank you very much for your feedback. Unfortunately, the ranges are not > of predictable size but you gave me an idea of how

Re: How to cogroup/join pair RDDs with different key types?

2014-04-16 Thread Roger Hoover
d help with? > > > On Wed, Apr 16, 2014 at 7:11 PM, Roger Hoover wrote: > >> Ah, in case this helps others, looks like RDD.zipPartitions will >> accomplish step 4. >> >> >> On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover wrote: >> >>> Andrew, >&

Running a spark-submit compatible app in spark-shell

2014-04-27 Thread Roger Hoover
Hi, >From the meetup talk about the 1.0 release, I saw that spark-submit will be the preferred way to launch apps going forward. How do you recommend launching such jobs in a development cycle? For example, how can I load an app that's expecting to a given to spark-submit into spark-shell? Also

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
method and just > call that method from the SBT shell, that should work. > > Matei > > On Apr 27, 2014, at 3:14 PM, Roger Hoover wrote: > > > Hi, > > > > From the meetup talk about the 1.0 release, I saw that spark-submit will > be the preferred way

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
ll fails. When I do that in the scala repl, it works. BTW, I'm using the latest code from the master branch (8421034e793c0960373a0a1d694ce334ad36e747) On Mon, Apr 28, 2014 at 3:40 PM, Roger Hoover wrote: > Matei, thank you. That seemed to work but I'm not able to import a class >

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
; I think either this or the --jars flag should work, but it's possible > there is a bug with the --jars flag when calling the Repl. > > > On Mon, Apr 28, 2014 at 4:30 PM, Roger Hoover wrote: > >> A couple of issues: >> 1) the jar doesn't show up on the classpa

Re: How to declare Tuple return type for a function

2014-04-29 Thread Roger Hoover
The return type should be RDD[(Int, Int, Int)] because sc.textFile() returns an RDD. Try adding an import for the RDD type to get rid of the compile error. import org.apache.spark.rdd.RDD On Mon, Apr 28, 2014 at 6:22 PM, SK wrote: > Hi, > > I am a new user of Spark. I have a class that define

Re: Running a spark-submit compatible app in spark-shell

2014-05-27 Thread Roger Hoover
/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205. > Try it now and it should work. :) > > Andrew > > > 2014-05-26 10:35 GMT+02:00 Perttu Ranta-aho : > > Hi Roger, >> >> Were you able to solve this? >> >> -Perttu >> >> >> On Tue, Apr 29, 2014