Spark-13979: issues with hadoopConf

2016-07-03 Thread Gil Vernik
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.

new object store driver for Spark

2016-03-22 Thread Gil Vernik
We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to s

how to send additional configuration to the RDD after it was lazily created

2015-09-17 Thread Gil Vernik
Hi, I have the following case, which i am not sure how to resolve. My code uses HadoopRDD and creates various RDDs on top of it (MapPartitionsRDD, and so on ) After all RDDs were lazily created, my code "knows" some new information and i want that "compute" method of the HadoopRDD will be awar

Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Gil Vernik
efault comes from. From: Mohit Jaggi To: Gil Vernik/Haifa/IBM@IBMIL Cc: Dev Date: 19/08/2015 21:47 Subject:Re: [spark-csv] how to build with Hadoop 2.6.0? spark-csv should not depend on hadoop On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik wrote: I would like to build spark-cs

[spark-csv] how to build with Hadoop 2.6.0?

2015-08-16 Thread Gil Vernik
I would like to build spark-csv with Hadoop 2.6.0 I noticed that when i build it with sbt/sbt ++2.10.4 package it build it with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository). How to define 2.6.0 during spark-csv build? By the way, is it possible to build spark-csv using mav

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Gil Vernik
sparkContext.hadoopFile than FileInputFormat will provide all the partitions and splits, but if i will access the same bucket from some code that relies on HadoopFSRelation than partitions will be created by HadoopFSRelation? Thanks Gil. From: Cheng Lian To: Gil Vernik/Haifa/IBM@IBMIL, Dev Date

possible issues with listing objects in the HadoopFSrelation

2015-08-10 Thread Gil Vernik
Just some thoughts, hope i didn't missed something obvious. HadoopFSRelation calls directly FileSystem class to list files in the path. It looks like it implements basically the same logic as in the FileInputFormat.listStatus method ( located in hadoop-map-reduce-client-core) The point is that

Re: problems with build of latest the master

2015-07-15 Thread Gil Vernik
. From: Sean Owen To: Gil Vernik/Haifa/IBM@IBMIL Cc: Ted Yu , Dev , Josh Rosen , Steve Loughran Date: 15/07/2015 21:41 Subject:Re: problems with build of latest the master You shouldn't get dependencies you need from Spark, right? you declare direct dependencies. A

Re: problems with build of latest the master

2015-07-15 Thread Gil Vernik
o add dependence of it. From: Ted Yu To: Josh Rosen Cc: Steve Loughran , Gil Vernik/Haifa/IBM@IBMIL, Dev Date: 15/07/2015 18:28 Subject:Re: problems with build of latest the master If I understand correctly, hadoop-openstack is not currently dependence in Spark. On J

Re: problems with build of latest the master

2015-07-14 Thread Gil Vernik
-all I guess this is needed for Hadoop version 2.6.0, but perhaps latest Hadoop versions has the same mockito versions as Spark uses. Gil Vernik. From: Gil Vernik/Haifa/IBM@IBMIL To: Dev Date: 14/07/2015 12:23 Subject:problems with build of latest the

problems with build of latest the master

2015-07-14 Thread Gil Vernik
I just did checkout of the master and tried to build it with mvn -Dhadoop.version=2.6.0 -DskipTests clean package Got: [ERROR] /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:117: error: cannot find symbol [ERROR] when(shuffleMemo

Re: question related partitions of the DataFrame

2015-07-14 Thread Gil Vernik
data? For example, if i create DataFrame from HadoopRDD - does it means that DataFrame has the same partitions as HadoopRDD? Thanks Gil. From: Gil Vernik/Haifa/IBM@IBMIL To: Dev Date: 12/07/2015 13:06 Subject:question related partitions of the DataFrame Hi, DataFrame exte

question related partitions of the DataFrame

2015-07-12 Thread Gil Vernik
Hi, DataFrame extends RDDApi, that provides RDD like methods. My question is, does DataFrame is sort of stand alone RDD with it?s own partitions or it depends on the underlying RDD that was used to load the data into its partitions? It's written that DataFrame has ability to scale from kilobyt

TableScan vs PrunedScan

2015-07-07 Thread Gil Vernik
Hi All, I wanted to experiment a little bit with TableScan and PrunedScan. My first test was to print columns from various SQL queries. To make this test easier, i just took spark-csv and i replaced TableScan with PrunedScan. I then changed buildScan method of CsvRelation from def BuildScan =

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Gil Vernik
Thanks a lot for the info on it. Does this explains 2 temp file generation per each task ( one temp that is renamed to another )? I understand why there is one temp file per task, but still not sure why there were 2 per each task, Thanks Gil. From: Imran Rashid To: Gil Vernik/Haifa

saveAsTextFile and tmp files generations in tasks

2015-04-14 Thread Gil Vernik
created in memory? And the last one, where is the code that responsible for this? Thanks a lot, Gil Vernik.

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Gil Vernik
I actually saw the same issue, where we analyzed some container with few hundreds of GBs zip files - one was corrupted and Spark exit with Exception on the entire job. I like SPARK-6593, since it can cover also additional cases, not just in case of corrupted zip files. From: Dale Richardso

parquet support - some questions about code

2015-03-18 Thread Gil Vernik
Hi, I am trying to better understand the code for Parquet support. In particular i got lost trying to understand ParquetRelation and ParquetRelation2. Does ParquetRelation2 is the new code that should completely remove ParquetRelation? ( I think there is some remark in the code notifying this

Re: problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
I just noticed about this one https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 I verified it and this resolves my issues with Parquet and swift:// name space. From: Gil Vernik/Haifa/IBM@IBMIL To: dev Date: 16/03/2015 02:11 PM Subject

problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
be accessed via "file://" ? I will be glad to dig into this in case it's a bug, but would like to know if this is something intentionally in Spark 1.3.0 ( I do can access swift:// names pace from SparkContext, only sqlContext has this issue ) Thanks, Gil Vernik. scala&g

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Gil Vernik
download were built with jackson 1.8.8 which makes them impossible to use with Hadoop 2.6.0 jars Thanks Gil Vernik. From: Sean Owen To: Ted Yu Cc: Gil Vernik/Haifa/IBM@IBMIL, dev Date: 18/01/2015 08:23 PM Subject:Re: run time exceptions in Spark 1.2.0 manual build

run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-17 Thread Gil Vernik
into this issue. Is there any particular need in Spark for jackson 1.8.8 and not 1.9.13? Can we remove 1.8.8 and put 1.9.13 for Avro? It looks to me that all works fine when Spark build with jackson 1.9.13, but i am not an expert and not sure what should be tested. Thanks, Gil Vernik.

Re: Apache Spark and Swift object store

2014-06-14 Thread Gil Vernik
Spark and Swift object store On 06/08/2014 04:01 AM, Gil Vernik wrote: > Hello everyone, > > I would like to initiate discussion about integration Apache Spark and > Openstack Swift. > (https://issues.apache.org/jira/browse/SPARK-938 was created while ago) > > I created a patc

Apache Spark and Swift object store

2014-06-08 Thread Gil Vernik
greatly for the exposure of Spark. The integration between Spark and Swift is very similar to how Spark integrates with S3. Will be great to hear comments / suggestions / remarks from the community! All the best, Gil Vernik.

queston about Spark repositories in GitHub

2014-05-19 Thread Gil Vernik
Streaming branches? Thanking you in advance, Gil Vernik.