Re: [Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-08-31 Thread Aniket Bhatnagar
Hi everyone It turns out that I had chef installed and it's chmod has higher preference than cygwin's chmod in the PATH. I fixed the environment variable and now its working fine. On 1 September 2014 11:48, Aniket Bhatnagar wrote: > On my local (windows) dev environment, I have been trying to

[Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-08-31 Thread Aniket Bhatnagar
On my local (windows) dev environment, I have been trying to get spark streaming running to test my real time(ish) jobs. I have set the checkpoint directory as /tmp/spark and have installed latest cygwin. I keep getting the following error: org.apache.hadoop.util.Shell$ExitCodeException: chmod: ca

Re: HELP! EXPORT DATA FROM HIVE TO SQL SERVER

2014-08-31 Thread Gordon Wang
try sqoop ? What do you mean by exporting results to sql server? On Mon, Sep 1, 2014 at 10:41 AM, churly lin wrote: > I am working on hive from spark now. I use sparkSQL(HiveFormSpark) for > calculating data and save the results in hive table. > And now, I need export the results in hive table t

HELP! EXPORT DATA FROM HIVE TO SQL SERVER

2014-08-31 Thread churly lin
hi, all: I am working on hive from spark now. I use sparkSQL(HiveFormSpark) for calculating data and save the results in hive table. And now, I need export the results in hive table to sql server. Is there a way to do this? Thank you all. --

RE: how to filter value in spark

2014-08-31 Thread Liu, Raymond
You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent:

RE: The concurrent model of spark job/stage/task

2014-08-31 Thread Liu, Raymond
1,2 :As the docs mentioned, "if they were submitted from separate threads" say, you fork your main thread and invoke action in each thread. Job and stage is always numbered in order , while not necessary corresponding to their execute order, but generated order. In your case, If you just call mu

Spark+OpenCV: Real Time Image Processing

2014-08-31 Thread Varuzhan
Hi everybody! Now I'm doing something like this: 1) User is uploading an image to server 2) Server is working with that image using of DataBase and Java + OpenCV 3) Server Returns some generated result to user That is slow now, and if there will be many users, it will work slower and maybe will not

Re: What does "appMasterRpcPort: -1" indicate ?

2014-08-31 Thread Tao Xiao
Thanks Yi, I think your answers make sense. We can see a series of messages with "appMasterRpcPort: -1" followed by a message with "appMasterRpcPort: 0", perhaps that means we were waiting for the application master to be started ("appMasterRpcPort: -1"), and later the application master got start

numpy digitize

2014-08-31 Thread filipus
hi Folks is there a function in spark like "numpy digitize" with discretize a numerical variable or even better is there a way to use the functionality of the decission tree builder of spark mllib which splits data into bins in such a way that the splitted variable mostly predict the target valu

Re: Mapping Hadoop Reduce to Spark

2014-08-31 Thread Matei Zaharia
mapPartitions just gives you an Iterator of the values in each partition, and lets you return an Iterator of outputs. For instance, take a look at  https://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/JavaAPISuite.java#L694. Matei On August 31, 2014 at 12:26:51 PM, Ste

Re: Mapping Hadoop Reduce to Spark

2014-08-31 Thread Matei Zaharia
Just to be clear, no operation requires all the keys to fit in memory, only the values for each specific key. All the values for each individual key need to fit, but the system can spill to disk across keys. Right now it's for both sides of it, unless you do a broadcast join by hand with somethi

Re: This always tries to connect to HDFS: user$ export MASTER=local[NN]; pyspark --master local[NN] ...

2014-08-31 Thread Sean Owen
I think you're saying it's looking for "/foo" on HDFS and not on your local file system? If so, I would suggest to either prefix your local paths with "file:" to be unambiguous, or unset HADOOP_HOME and HADOOP_CONF_DIR On Sun, Aug 31, 2014 at 10:17 PM, didata wrote: > Hello friends: > > I use th

Re: Low Level Kafka Consumer for Spark

2014-08-31 Thread RodrigoB
Just a comment on the recovery part. Is it correct to say that currently Spark Streaming recovery design does not consider re-computations (upon metadata lineage recovery) that depend on blocks of data of the received stream? https://issues.apache.org/jira/browse/SPARK-1647 Just to illustrate a

This always tries to connect to HDFS: user$ export MASTER=local[NN]; pyspark --master local[NN] ...

2014-08-31 Thread didata
Hello friends: I use the Cloudera/CDH5 version of Spark (v1.0.0 Spark RPMs), but the following is also true when using the Apache Spark distribution built against a locally installed Hadoop/YARN installation. The problem: If the following directory exists, */etc/hadoop/conf/*, and the pertinent

Re: Spark Streaming checkpoint recovery causes IO re-execution

2014-08-31 Thread RodrigoB
Hi Yana, You are correct. What needs to be added is that besides RDDs being checkpointed, metadata which represents execution of computations are also checkpointed in Spark Streaming. Upon driver recovery, the last batches (the ones already executed and the ones that should have been executed whi

Re: Mapping Hadoop Reduce to Spark

2014-08-31 Thread Koert Kuipers
matei, it is good to hear that the restriction that keys need to fit in memory no longer applies to combineByKey. however join requiring keys to fit in memory is still a big deal to me. does it apply to both sides of the join, or only one (while othe other side is streaming)? On Sat, Aug 30, 201

Re: Mapping Hadoop Reduce to Spark

2014-08-31 Thread Steve Lewis
Is there a sample of how to do this - I see 1.1 is out but cannot find samples of mapPartitions A Java sample would be very useful On Sat, Aug 30, 2014 at 10:30 AM, Matei Zaharia wrote: > In 1.1, you'll be able to get all of these properties using sortByKey, and > then mapPartitions on top to i

Re: What does "appMasterRpcPort: -1" indicate ?

2014-08-31 Thread Yi Tian
I think -1 means your application master has not been started yet. > 在 2014年8月31日,23:02,Tao Xiao 写道: > > I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it. > > Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit > my job in local mode, Spark Standalone mode and

What does "appMasterRpcPort: -1" indicate ?

2014-08-31 Thread Tao Xiao
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it. Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit my job in local mode, Spark Standalone mode and YARN mode. I successfully submitted my job in local mode and Standalone mode, however, I noticed the following messag

Re: jdbcRDD from JAVA

2014-08-31 Thread Sean Owen
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext, scala.Function0, java.lang.String, long, long, int, scala.Function1, scala.reflect.ClassTag) I don't think there is a completely Java-friendly version of this class. However you s

jdbcRDD from JAVA

2014-08-31 Thread Ahmad Osama
hi, is there a simple example for jdbcRDD from JAVA and not scala, trying to figure out the last parameter in the constructor of jdbcRDD thanks

Re: Spark Master/Slave and HA

2014-08-31 Thread Sean Owen
The Master doesn't do work. I don't quite understand the rest; there's not a "Spark slave" role. You can have master and workers, and even your driver, on one machine; what's the error? On Sun, Aug 31, 2014 at 12:53 AM, arthur.hk.c...@gmail.com wrote: > Hi, > > I have few questions about Spark Ma

Re: How can a "deserialized Java object" be stored on disk?

2014-08-31 Thread Sean Owen
Yes, there's no such thing as writing a deserialized form to disk. However there are other persistence levels that store *serialized* forms in memory. The meaning here is that the objects are not serialized in memory in the JVM. Of course, they are serialized on disk. On Sun, Aug 31, 2014 at 5:02