Hello,
Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979
?
Does others see the same issues?
Thanks
Gil.
We recently released an object store connector for Spark.
https://github.com/SparkTC/stocator
Currently this connector contains driver for the Swift based object store
( like SoftLayer or any other Swift cluster ), but it can easily support
additional object stores.
There is a pending patch to s
Hi,
I have the following case, which i am not sure how to resolve.
My code uses HadoopRDD and creates various RDDs on top of it
(MapPartitionsRDD, and so on )
After all RDDs were lazily created, my code "knows" some new information
and i want that "compute" method of the HadoopRDD will be awar
efault comes from.
From: Mohit Jaggi
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: Dev
Date: 19/08/2015 21:47
Subject:Re: [spark-csv] how to build with Hadoop 2.6.0?
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik wrote:
I would like to build spark-cs
I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
How to define 2.6.0 during spark-csv build? By the way, is it possible to
build spark-csv using mav
sparkContext.hadoopFile than FileInputFormat will provide all the
partitions and splits, but if i will access the same bucket from some code
that relies on HadoopFSRelation than partitions will be created by
HadoopFSRelation?
Thanks
Gil.
From: Cheng Lian
To: Gil Vernik/Haifa/IBM@IBMIL, Dev
Date
Just some thoughts, hope i didn't missed something obvious.
HadoopFSRelation calls directly FileSystem class to list files in the
path.
It looks like it implements basically the same logic as in the
FileInputFormat.listStatus method ( located in
hadoop-map-reduce-client-core)
The point is that
.
From: Sean Owen
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: Ted Yu , Dev , Josh
Rosen , Steve Loughran
Date: 15/07/2015 21:41
Subject:Re: problems with build of latest the master
You shouldn't get dependencies you need from Spark, right? you declare
direct dependencies. A
o add dependence of it.
From: Ted Yu
To: Josh Rosen
Cc: Steve Loughran , Gil
Vernik/Haifa/IBM@IBMIL, Dev
Date: 15/07/2015 18:28
Subject:Re: problems with build of latest the master
If I understand correctly, hadoop-openstack is not currently dependence in
Spark.
On J
-all
I guess this is needed for Hadoop version 2.6.0, but perhaps latest Hadoop
versions has the same mockito versions as Spark uses.
Gil Vernik.
From: Gil Vernik/Haifa/IBM@IBMIL
To: Dev
Date: 14/07/2015 12:23
Subject:problems with build of latest the
I just did checkout of the master and tried to build it with
mvn -Dhadoop.version=2.6.0 -DskipTests clean package
Got:
[ERROR]
/Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:117:
error: cannot find symbol
[ERROR]
when(shuffleMemo
data?
For example, if i create DataFrame from HadoopRDD - does it means that
DataFrame has the same partitions as HadoopRDD?
Thanks
Gil.
From: Gil Vernik/Haifa/IBM@IBMIL
To: Dev
Date: 12/07/2015 13:06
Subject:question related partitions of the DataFrame
Hi,
DataFrame exte
Hi,
DataFrame extends RDDApi, that provides RDD like methods.
My question is, does DataFrame is sort of stand alone RDD with it?s own
partitions or it depends on the underlying RDD that was used to load the
data into its partitions? It's written that DataFrame has ability to scale
from kilobyt
Hi All,
I wanted to experiment a little bit with TableScan and PrunedScan.
My first test was to print columns from various SQL queries.
To make this test easier, i just took spark-csv and i replaced TableScan
with PrunedScan.
I then changed buildScan method of CsvRelation from
def BuildScan =
Thanks a lot for the info on it.
Does this explains 2 temp file generation per each task ( one temp that is
renamed to another )?
I understand why there is one temp file per task, but still not sure why
there were 2 per each task,
Thanks
Gil.
From: Imran Rashid
To: Gil Vernik/Haifa
created in memory?
And the last one, where is the code that responsible for this?
Thanks a lot,
Gil Vernik.
I actually saw the same issue, where we analyzed some container with few
hundreds of GBs zip files - one was corrupted and Spark exit with
Exception on the entire job.
I like SPARK-6593, since it can cover also additional cases, not just in
case of corrupted zip files.
From: Dale Richardso
Hi,
I am trying to better understand the code for Parquet support.
In particular i got lost trying to understand ParquetRelation and
ParquetRelation2. Does ParquetRelation2 is the new code that should
completely remove ParquetRelation? ( I think there is some remark in the
code notifying this
I just noticed about this one
https://issues.apache.org/jira/browse/SPARK-6351
https://github.com/apache/spark/pull/5039
I verified it and this resolves my issues with Parquet and swift:// name
space.
From: Gil Vernik/Haifa/IBM@IBMIL
To: dev
Date: 16/03/2015 02:11 PM
Subject
be accessed via "file://"
?
I will be glad to dig into this in case it's a bug, but would like to know
if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext
has this issue )
Thanks,
Gil Vernik.
scala&g
download were built with jackson 1.8.8 which makes them impossible to use
with Hadoop 2.6.0 jars
Thanks
Gil Vernik.
From: Sean Owen
To: Ted Yu
Cc: Gil Vernik/Haifa/IBM@IBMIL, dev
Date: 18/01/2015 08:23 PM
Subject:Re: run time exceptions in Spark 1.2.0 manual build
into this
issue.
Is there any particular need in Spark for jackson 1.8.8 and not 1.9.13?
Can we remove 1.8.8 and put 1.9.13 for Avro?
It looks to me that all works fine when Spark build with jackson 1.9.13,
but i am not an expert and not sure what should be tested.
Thanks,
Gil Vernik.
Spark and Swift object store
On 06/08/2014 04:01 AM, Gil Vernik wrote:
> Hello everyone,
>
> I would like to initiate discussion about integration Apache Spark and
> Openstack Swift.
> (https://issues.apache.org/jira/browse/SPARK-938 was created while ago)
>
> I created a patc
greatly for the exposure of Spark.
The integration between Spark and Swift is very similar to how Spark
integrates with S3.
Will be great to hear comments / suggestions / remarks from the community!
All the best,
Gil Vernik.
Streaming branches?
Thanking you in advance,
Gil Vernik.
25 matches
Mail list logo