Re: collect on hadoopFile RDD returns wrong results

2014-09-18 Thread vasiliy
ing rdd.collect and > rdd.foreach(println) > > Thanks > Best Regards > > On Wed, Sep 17, 2014 at 12:26 PM, vasiliy < > zadonskiyd@ > > wrote: > >> it also appears in streaming hdfs fileStream >> >> >> >> -- >> View this message in co

Re: collect on hadoopFile RDD returns wrong results

2014-09-17 Thread vasiliy
full code example: def main(args: Array[String]) { val conf = new SparkConf().setAppName("ErrorExample").setMaster("local[8]") .set("spark.serializer", classOf[KryoSerializer].getName) val sc = new SparkContext(conf) val rdd = sc.hadoopFile( "hdfs://./user.avro"

Re: collect on hadoopFile RDD returns wrong results

2014-09-16 Thread vasiliy
it also appears in streaming hdfs fileStream -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD-returns-wrong-results-tp14368p14425.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

collect on hadoopFile RDD returns wrong results

2014-09-16 Thread vasiliy
Hello. I have a hadoopFile RDD and i tried to collect items to driver program, but it returns me an array of identical records (equals to last record of my file). My code is like this: val rdd = sc.hadoopFile( "hdfs:///data.avro", classOf[org.apache.avro.mapred.AvroInputFormat[MyAv

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-16 Thread vasiliy
it works, thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Thrift-JDBC-server-deployment-for-production-tp13947p14345.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: can fileStream() or textFileStream() remember state?

2014-09-11 Thread vasiliy
When you get a stream from sc.fileStream() spark will process only files with file timestamp > then current timestamp so all data from HDFS should not be processed again. You may have a another problem - spark will not process files that moved to your HDFS folder between your application restarts.

Spark SQL Thrift JDBC server deployment for production

2014-09-10 Thread vasiliy
Hi, i have a question about spark sql Thrift JDBC server. Is there a best practice for spark SQL deployement ? If i understand right script ./sbin/start-thriftserver.sh starts Thrift JDBC server in local mode. Is there an script options for running this server on yarn-cluster mode ? -- Vie