Re: Super slow caching in 1.3?

2015-04-27 Thread Christian Perez
atabricks.com] > Sent: Thursday, April 16, 2015 7:23 PM > To: Evo Eftimov > Cc: Christian Perez; user > > > Subject: Re: Super slow caching in 1.3? > > > > Here are the types that we specialize, other types will be much slower. > This is only for Spark SQL, normal RDD

Re: Pyspark where do third parties libraries need to be installed under Yarn-client mode

2015-04-24 Thread Christian Perez
le.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> ----- >> To unsubscrib

Re: Super slow caching in 1.3?

2015-04-16 Thread Christian Perez
g back to > kryo and even then there are some locking issues). > > If so, would it be possible to try caching a flattened version? > > CACHE TABLE flattenedTable AS SELECT ... FROM parquetTable > > On Mon, Apr 6, 2015 at 5:00 PM, Christian Perez wrote: >> >> Hi al

Super slow caching in 1.3?

2015-04-06 Thread Christian Perez
Hi all, Has anyone else noticed very slow time to cache a Parquet file? It takes 14 s per 235 MB (1 block) uncompressed node local Parquet file on M2 EC2 instances. Or are my expectations way off... Cheers, Christian -- Christian Perez Silicon Valley Data Science Data Analyst christ

Re: input size too large | Performance issues with Spark

2015-04-02 Thread Christian Perez
we wan to use Spark to provide us the capability to process our >> in-memory data structure very fast as well as scale to a larger volume >> when >> required in the future. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-l

Re: persist(MEMORY_ONLY) takes lot of time

2015-04-02 Thread Christian Perez
gt; http://apache-spark-user-list.1001560.n3.nabble.com/persist-MEMORY-ONLY-takes-lot-of-time-tp22343.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------- > To unsubscribe, e-mail: use

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-20 Thread Christian Perez
Any other users interested in a feature DataFrame.saveAsExternalTable() for making _useful_ external tables in Hive, or am I the only one? Bueller? If I start a PR for this, will it be taken seriously? On Thu, Mar 19, 2015 at 9:34 AM, Christian Perez wrote: > Hi Yin, > > Thank

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
gt;> property, there will be a field called "spark.sql.sources.provider" and the >> value will be "org.apache.spark.sql.parquet.DefaultSource". You can also >> look at your files in the file system. They are stored by Parquet. >> >> Thanks, >> >> Yi

saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
alized properly on receive. I'm tracing execution through source code... but before I get any deeper, can anyone reproduce this behavior? Cheers, Christian -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd -