date:20171001

Re: More instances = slower Spark job

2017-10-01 Thread Steve Loughran

On 28 Sep 2017, at 15:27, Daniel Siegmann mailto:dsiegm...@securityscorecard.io>> wrote: Can you kindly explain how Spark uses parallelism for bigger (say 1GB) text file? Does it use InputFormat do create multiple splits and creates 1 partition per split? Also, in case of S3 or NFS, how does

Re: More instances = slower Spark job

2017-10-01 Thread Steve Loughran

> On 28 Sep 2017, at 14:45, ayan guha wrote: > > Hi > > Can you kindly explain how Spark uses parallelism for bigger (say 1GB) text > file? Does it use InputFormat do create multiple splits and creates 1 > partition per split? Yes, Input formats give you their splits, this is usually used to

Re: More instances = slower Spark job

2017-10-01 Thread Jeroen Miller

Vadim's "scheduling within an application" approach turned out to be excellent, at least on a single node with the CPU usage reaching about 90%. I directly implemented the code template that Vadim kindly provided: parallel_collection_paths.foreach( path => { val lines = spa

Re: More instances = slower Spark job

2017-10-01 Thread Gourav Sengupta

Hi Jeroen, I do not believe that I completely agree with the idea that you will be spending more time and memory that way. But if that was also the case why are you not using data frames and UDF? Regards, Gourav On Sun, Oct 1, 2017 at 6:17 PM, Jeroen Miller wrote: > On Fri, Sep 29, 2017 at 1

Re: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

2017-10-01 Thread Gerard Maas

Hammad, The recommended way to implement this logic would be to: Create a SparkSession. Create a Streaming Context using the SparkContext embedded in the SparkSession Use the single SparkSession instance for the SQL operations within the foreachRDD. It's important to note that spark operations c

Re: NullPointerException error while saving Scala Dataframe to HBase

2017-10-01 Thread Marco Mistroni

Hi The question is getting to the list. I have no experience in hbase ...though , having seen similar stuff when saving a df somewhere else...it might have to do with the properties you need to set to let spark know it is dealing with hbase? Don't u need to set some properties on the spark context

Fwd: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

2017-10-01 Thread Hammad

Hello, *Background:* I have Spark Streaming context; SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("TransformerStreamPOC"); conf.set("spark.driver.allowMultipleContexts", "true"); *<== this* JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(60));

Re: More instances = slower Spark job

2017-10-01 Thread Jeroen Miller

On Fri, Sep 29, 2017 at 12:20 AM, Gourav Sengupta wrote: > Why are you not using JSON reader of SPARK? Since the filter I want to perform is so simple, I do not want to spend time and memory to deserialise the JSON lines. Jeroen --

How to find the temporary views' DDL

2017-10-01 Thread Sun, Keith

Hello, Is there a way to find the DDL of the “temporary” view created in current session with spark sql: For example : create or replace temporary view tmp_v as select c1 from table table_x; “Show create table “ does not work for this case as it is not a table . “Describe” could show the c

Re: Error - Spark reading from HDFS via dataframes - Java

2017-10-01 Thread Anastasios Zouzias

Hi, Set the inferschema option to true in spark-csv. you may also want to set the mode option. See readme below https://github.com/databricks/spark-csv/blob/master/README.md Best, Anastasios Am 01.10.2017 07:58 schrieb "Kanagha Kumar" : Hi, I'm trying to read data from HDFS in spark as datafr

Re: More instances = slower Spark job

Re: More instances = slower Spark job

Re: More instances = slower Spark job

Re: More instances = slower Spark job

Re: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

Re: NullPointerException error while saving Scala Dataframe to HBase

Fwd: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

Re: More instances = slower Spark job

How to find the temporary views' DDL

Re: Error - Spark reading from HDFS via dataframes - Java

10 matches

Site Navigation

Mail list logo

Footer information