from:"jeff saremi"

Access to broadcasted variable

2016-02-18 Thread jeff saremi

I'd like to know if the broadcasted object gets serialized when accessed by the executor during the execution of a task? I know that it gets serialized from the driver to the worker. This question is inside worker when executor JVM's are accessing it thanks Jeff

RE: Access to broadcasted variable

2016-02-19 Thread jeff saremi

could someone please comment on this? thanks From: jeffsar...@hotmail.com To: user@spark.apache.org Subject: Access to broadcasted variable Date: Thu, 18 Feb 2016 14:44:07 -0500 I'd like to know if the broadcasted object gets serialized when accessed by the executor during the execution o

RE: Access to broadcasted variable

2016-02-20 Thread jeff saremi

iable From: shixi...@databricks.com To: jeffsar...@hotmail.com CC: user@spark.apache.org The broadcasted object is serialized in driver and sent to the executors. And in the executor, it will deserialize the bytes to get the broadcasted object. On Fri, Feb 19, 2016 at 5:54 AM, jeff saremi

flatmap() and spark performance

2015-09-28 Thread jeff saremi

Is there anyway to let spark know ahead of time what size of RDD to expect as a result of a flatmap() operation? And would that help in terms of performance? For instance, if I have an RDD of 1million rows and I know that my flatMap() will produce 100million rows, is there a way to indicate that

How to make sense of Spark log entries

2015-10-03 Thread jeff saremi

There are executor logs and driver logs. Most of them are not intuitive enough to mean anything to us. Are there any notes, documents, talks on how to decipher these logs and troubleshoot our applications' performance as a result? thanks Jeff

How to install a Spark Package?

2015-10-04 Thread jeff saremi

So that it is available even in offline mode? I can't seem to be able to find any notes on thatthanksjeff

RE: How to install a Spark Package?

2015-10-05 Thread jeff saremi

-0700 CC: user@spark.apache.org To: jeffsar...@hotmail.com Are you talking about package which is listed onhttp://spark-packages.org The package should come with installation instructions, right ? On Oct 4, 2015, at 8:55 PM, jeff saremi wrote: So that it is available even in offline mode? I c

SequenceFile and object reuse

2015-11-13 Thread jeff saremi

So we tried reading a sequencefile in Spark and realized that all our records have ended up becoming the same. THen one of us found this: Note: Because Hadoop's RecordReader class re-uses the same Writable object for each record, directly caching the returned RDD or directly passing it to an ag

RE: SequenceFile and object reuse

2015-11-19 Thread jeff saremi

ad of creating Java objects. As you've pointed out, this is at the expense of making the code more verbose when caching. -Sandy On Fri, Nov 13, 2015 at 10:29 AM, jeff saremi wrote: So we tried reading a sequencefile in Spark and realized that all our records have ended up becoming the same

pyspark question: create RDD from csr_matrix

2015-09-22 Thread jeff saremi

i've tried desperately to create an RDD from a matrix i have. Every combination failed. I have a sparse matrix returned from a call to dv = DictVectorizer()sv_tf = dv.fit_transform(tf) which is supposed to be a matrix of document terms and their frequencies. I need to convert this to an

Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

I'd like to know how -- From within Java/spark -- I can access the dependent files which i deploy using "--files" option on the command line?

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

i wish someone added this to the documentation From: jeff saremi Sent: Thursday, January 19, 2017 9:56 AM To: Sidney Feiner Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Thanks Sidney From: Sidney

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

Thanks Sidney From: Sidney Feiner Sent: Thursday, January 19, 2017 9:52 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Every executor creates a directory with your submitted files and you can access every file&#

Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

I have this function which does a regex matching in scala. I test it in the REPL I get expected results. I use it as a UDF in sparkSQL i get completely incorrect results. Function: class UrlFilter (filters: Seq[String]) extends Serializable { val regexFilters = filters.map(new Regex(_)) r

Re: Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

never mind! I has a space at the end of my data which was not showing up in manual testing. thanks From: jeff saremi Sent: Tuesday, June 20, 2017 2:48:06 PM To: user@spark.apache.org Subject: Bizzare diff in behavior between scala REPL and sparkSQL UDF I have

What is the equivalent of mapPartitions in SpqrkSQL?

2017-06-25 Thread jeff saremi

You can do a map() using a select and functions/UDFs. But how do you process a partition using SQL?

Re: What is the equivalent of mapPartitions in SpqrkSQL?

2017-06-25 Thread jeff saremi

ach in SQL. From: Ryan Sent: Sunday, June 25, 2017 7:18:32 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: What is the equivalent of mapPartitions in SpqrkSQL? Why would you like to do so? I think there's no need for us to explicitly ask for a forEachPartition in spark sql because tu

Re: What is the equivalent of mapPartitions in SpqrkSQL?

2017-06-28 Thread jeff saremi

ing as such forcing us to stay conservative and just make do without sql. I'm sure we're not alone here. From: Aaron Perrin Sent: Tuesday, June 27, 2017 4:50:25 PM To: Ryan; jeff saremi Cc: user@spark.apache.org Subject: Re: What is the equivalent of map

Is there "EXCEPT ALL" in Spark SQL?

2017-07-06 Thread jeff saremi

I tried this query in 1.6 and it failed: SELECT * FROM Table1 EXCEPT ALL SELECT * FROM Table2 Exception in thread "main" java.lang.RuntimeException: [1.32] failure: ``('' expected but `all' found thanks Jeff

Re: Is there "EXCEPT ALL" in Spark SQL?

2017-07-06 Thread jeff saremi

EXCEPT is not the same as EXCEPT ALL Had they implemented EXCEPT ALL in SparkSQL one could have easily obtained EXCEPT by adding a disctint() to the results From: hareesh makam Sent: Thursday, July 6, 2017 12:48:18 PM To: jeff saremi Cc: user@spark.apache.org

How to list only erros for a stage

2017-07-24 Thread jeff saremi

On the Spark status UI you can click Stages on the menu and see Active (and completed stages). For the active stage, you can see Succeeded/Total and a count of failed ones in paranthesis. I'm looking for a way to go straight to the failed tasks and list the errors. Currently I must go into deta

Re: How to list only erros for a stage

2017-07-25 Thread jeff saremi

Thank you. That helps From: 周康 Sent: Monday, July 24, 2017 8:04:51 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: How to list only erros for a stage May be you can click Header Status cloumn of Task section,then failed task will appear first. 2017

How to configure spark on Yarn cluster

2017-07-27 Thread jeff saremi

I have the simplest job which i'm running against 100TB of data. The job keeps failing with ExecutorLostFailure's on containers killed by Yarn for exceeding memory limits I have varied the executor-memory from 32GB to 96GB, the spark.yarn.executor.memoryOverhead from 8192 to 36000 and similar c

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi

From: yohann jardin Sent: Thursday, July 27, 2017 11:15:39 PM To: jeff saremi; user@spark.apache.org Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data

Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

We have a not too complex and not too large spark job that keeps dying with this error I have researched it and I have not seen any convincing explanation on why I am not using a shuffle service. Which server is the one that is refusing the connection? If I go to the server that is being report

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi

nk you included. Thank you. Yes this is the same problem however it looks like no one has come up with a solution for this problem yet From: yohann jardin Sent: Friday, July 28, 2017 10:47:40 AM To: jeff saremi; user@spark.apache.org Subject: Re: How to configur

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

ark.network.timeout=1000s ^ From: Juan Rodríguez Hortalá Sent: Friday, July 28, 2017 4:20:40 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232 Hi Je

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

asking this on a tangent: Is there anyway for the shuffle data to be replicated to more than one server? thanks From: jeff saremi Sent: Friday, July 28, 2017 4:38:08 PM To: Juan Rodríguez Hortalá Cc: user@spark.apache.org Subject: Re: Job keeps aborting because

How can i remove the need for calling cache

2017-08-01 Thread jeff saremi

Calling cache/persist fails all our jobs (i have posted 2 threads on this). And we're giving up hope in finding a solution. So I'd like to find a workaround for that: If I save an RDD to hdfs and read it back, can I use it in more than one operation? Example: (using cache) // do a whole bunch

Re: How can i remove the need for calling cache

2017-08-01 Thread jeff saremi

same effect as in my sample code without the use of cache(). If I use myrdd.count() would that be a good alternative? thanks From: lucas.g...@gmail.com Sent: Tuesday, August 1, 2017 11:23:04 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: How can i r

Re: How can i remove the need for calling cache

2017-08-01 Thread jeff saremi

Thanks Vadim. I'll try that From: Vadim Semenov Sent: Tuesday, August 1, 2017 12:05:17 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: How can i remove the need for calling cache You can use `.checkpoint()`: ``` val sc: SparkContext sc.setCheckpoi

Re: How can i remove the need for calling cache

2017-08-01 Thread jeff saremi

minimized even without an explicit cache call. On Tue, Aug 1, 2017 at 11:05 AM, jeff saremi mailto:jeffsar...@hotmail.com>> wrote: Calling cache/persist fails all our jobs (i have posted 2 threads on this). And we're giving up hope in finding a solution. So I'd like to find a wor

Re: How can i remove the need for calling cache

2017-08-02 Thread jeff saremi

hoping for From: Vadim Semenov Sent: Tuesday, August 1, 2017 12:05:17 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: How can i remove the need for calling cache You can use `.checkpoint()`: ``` val sc: SparkContext sc.setCheckpointDir("hdfs:///tmp/checkpointDirectory&quo

Re: How can i remove the need for calling cache

2017-08-02 Thread jeff saremi

thanks Vadim. yes this is a good option for us. thanks From: Vadim Semenov Sent: Wednesday, August 2, 2017 6:24:40 PM To: Suzen, Mehmet Cc: jeff saremi; user@spark.apache.org Subject: Re: How can i remove the need for calling cache So if you just save an RDD to

Continue reading dataframe from file despite errors

2017-09-12 Thread jeff saremi

I'm using a statement like the following to load my dataframe from some text file Upon encountering the first error, the whole thing throws an exception and processing stops. I'd like to continue loading even if that results in zero rows in my dataframe. How can i do that? thanks spark.read.

Re: Continue reading dataframe from file despite errors

2017-09-12 Thread jeff saremi

.scala:250) ____ From: jeff saremi Sent: Tuesday, September 12, 2017 2:32:03 PM To: user@spark.apache.org Subject: Continue reading dataframe from file despite errors I'm using a statement like the following to load my dataframe from some text file Upon encountering the first error,

Re: Continue reading dataframe from file despite errors

2017-09-12 Thread jeff saremi

thanks Suresh. it worked nicely From: Suresh Thalamati Sent: Tuesday, September 12, 2017 2:59:29 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Continue reading dataframe from file despite errors Try the CSV Option(“mode”, "dropmalformed”),

Multiple Sources found for csv

2017-09-12 Thread jeff saremi

I have this line which works in the spark interactive console but it fails in Intellij Using Spark 2.1.1 in both cases: Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.

Re: Multiple Sources found for csv

2017-09-12 Thread jeff saremi

ot;com.databricks... ____ From: jeff saremi Sent: Tuesday, September 12, 2017 3:38:00 PM To: user@spark.apache.org Subject: Multiple Sources found for csv I have this line which works in the spark interactive console but it fails in Intellij Using Spark 2.1.1 in both cases: Exceptio

39 matches

Mail list logo