Re: Processing audio/video/images

2014-06-02 Thread Philip Ogren
I asked a question related to Marcelo's answer a few months ago. The discussion there may be useful: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-URI-td1054.html On 06/02/2014 06:09 PM, Marcelo Vanzin wrote: Hi Jamal, If what you want is to process lots of files in parallel, the b

Re: Unit test failure: Address already in use

2014-06-18 Thread Philip Ogren
In my unit tests I have a base class that all my tests extend that has a setup and teardown method that they inherit. They look something like this: var spark: SparkContext = _ @Before def setUp() { Thread.sleep(100L) //this seems to give spark more time to reset from the

Multiple SparkContexts with different configurations in same JVM

2014-07-10 Thread Philip Ogren
In various previous versions of Spark (and I believe the current version, 1.0.0, as well) we have noticed that it does not seem possible to have a "local" SparkContext and a SparkContext connected to a cluster via either a Spark Cluster (i.e. using the Spark resource manager) or a YARN cluster.

Re: Announcing Spark 1.0.1

2014-07-14 Thread Philip Ogren
Hi Patrick, This is great news but I nearly missed the announcement because it had scrolled off the folder view that I have Spark users list messages go to. 40+ new threads since you sent the email out on Friday evening. You might consider having someone on your team create a spark-announce

relationship of RDD[Array[String]] to Array[Array[String]]

2014-07-21 Thread Philip Ogren
It is really nice that Spark RDD's provide functions that are often equivalent to functions found in Scala collections. For example, I can call: myArray.map(myFx) and equivalently myRdd.map(myFx) Awesome! My question is this. Is it possible to write code that works on either an RDD or a

Re: relationship of RDD[Array[String]] to Array[Array[String]]

2014-07-21 Thread Philip Ogren
ethod-parameter-forwarding-possible-in-scala I'm not seeing a way to utilize implicit conversions in this case. Since Scala is statically (albeit inferred) typed, I don't see a way around having a common supertype. On Monday, July 21, 2014 11:01 AM, Philip Ogren wrote: It is really nice t

creating a distributed index

2014-08-01 Thread Philip Ogren
Suppose I want to take my large text data input and create a distributed inverted index in Spark on each string in the input (imagine an in-memory lucene index - not want I'm doing but it's analogous). It seems that I could do this with mapPartition so that each element in a partition gets a

Re: creating a distributed index

2014-08-04 Thread Philip Ogren
each of the ten indexes and aggregate the n-best matches from the ten sets of results. Would this be possible with IndexedRDD or some other feature of Spark? Thanks, Philip On 08/01/2014 04:26 PM, Ankur Dave wrote: At 2014-08-01 14:50:22 -0600, Philip Ogren wrote: It seems that I cou

Re: creating a distributed index

2014-08-04 Thread Philip Ogren
tch`(myquery)) I'm sure it won't take much imagination to figure out how to the the matching in a batch way. If anyone has done anything along these lines I'd love to have some feedback. Thanks, Philip On 08/04/2014 09:46 AM, Philip Ogren wrote: This looks like a really c

Spark assembly for YARN/CDH5

2014-10-16 Thread Philip Ogren
Does anyone know if there Spark assemblies are created and available for download that have been built for CDH5 and YARN? Thanks, Philip - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mai

Re: Is there a way to get the current progress of the job?

2014-04-01 Thread Philip Ogren
Hi DB, Just wondering if you ever got an answer to your question about monitoring progress - either offline or through your own investigation. Any findings would be appreciated. Thanks, Philip On 01/30/2014 10:32 PM, DB Tsai wrote: Hi guys, When we're running a very long job, we would lik

Re: Is there a way to get the current progress of the job?

2014-04-02 Thread Philip Ogren
SON - but I can't seem to figure out how to do this or if it is possible. Any advice is appreciated. Thanks, Philip On 04/01/2014 09:43 AM, Philip Ogren wrote: Hi DB, Just wondering if you ever got an answer to your question about monitoring progress - either offline or through your own

Re: Is there a way to get the current progress of the job?

2014-04-03 Thread Philip Ogren
arbitrary format and will be deprecated soon. If you find this feature useful, you can test it out by building the master branch of Spark yourself, following the instructions in https://github.com/apache/spark/pull/42. Andrew On Wed, Apr 2, 2014 at 3:39 PM, Philip Ogren <mailto:philip

Re: Is there a way to get the current progress of the job?

2014-04-03 Thread Philip Ogren
consider exposing the JobProgressListener directly - I think it's been factored nicely so it's fairly decoupled from the UI. The concern is this is a semi-internal piece of functionality and something we might, e.g. want to change the API of over time. - Patrick On Wed, Apr

RDD.tail()

2014-04-14 Thread Philip Ogren
Has there been any thought to adding a tail() method to RDD? It would be really handy to skip over the first item in an RDD when it contains header information. Even better would be a drop(int) function that would allow you to skip over several lines of header information. Our attempts to do

Re: Opinions stratosphere

2014-05-02 Thread Philip Ogren
Great reference! I just skimmed through the results without reading much of the methodology - but it looks like Spark outperforms Stratosphere fairly consistently in the experiments. It's too bad the data sources only range from 2GB to 8GB. Who knows if the apparent pattern would extend out

Re: Spark unit testing best practices

2014-05-14 Thread Philip Ogren
Have you actually found this to be true? I have found Spark local mode to be quite good about blowing up if there is something non-serializable and so my unit tests have been great for detecting this. I have never seen something that worked in local mode that didn't work on the cluster becaus

Re: Use SparkListener to get overall progress of an action

2014-05-23 Thread Philip Ogren
Hi Pierre, I asked a similar question on this list about 6 weeks ago. Here is one answer I got that is of particular note: In the upcoming release of Sp