Re: [Spark 2.x Core] Job writing out an extra empty part-0000* file

2018-04-19 Thread klrmowse
well... it turns out, that extra part-* file goes away when i limit --num-executors to 1 or 2 (leaving it to default maxes it out, which in turn gives an extra empty part-file) i guess the test data i'm using only requires that many executors -- Sent from: http://apache-spark-user-list.1001

Re: How to bulk insert using spark streaming job

2018-04-19 Thread scorpio
You need to insert per partition per batch. Normally database drivers meant for spark have bulk update feature built in. They take a RDD and do a bulk insert per partition. In case db driver you are using doesn't provide this feature, you can aggregate records per partition and then send out to db

Re: How to bulk insert using spark streaming job

2018-04-19 Thread ayan guha
by writing code, I suppose :) Jokes apart, I think you need to articulate the problem with more details for others to help. Do you mean you want to batch up data in memory and then write as a chunk? Where do want to insert? Etc etc... On Fri, Apr 20, 2018 at 1:08 PM, amit kumar singh wrote: > H

How to bulk insert using spark streaming job

2018-04-19 Thread amit kumar singh
How to bulk insert using spark streaming job Sent from my iPhone

Re: Stream writing parquet files

2018-04-19 Thread Christopher Piggott
As a follow-up question, what happened to org.apache.spark.sql.parquet.RowWriteSupport ? It seems like it would help me. On Thu, Apr 19, 2018 at 9:23 PM, Christopher Piggott wrote: > I am trying to write some parquet files and running out of memory. I'm > giving my workers each 16GB and the da

Stream writing parquet files

2018-04-19 Thread Christopher Piggott
I am trying to write some parquet files and running out of memory. I'm giving my workers each 16GB and the data is 102 columns * 65536 rows - not really all that much. The content of each row is a short string. I am trying to create the file by dynamically allocating a StructType of StructField

[Mesos] Are InverseOffers ignored?

2018-04-19 Thread Prateek Sharma
Hello everyone, I wanted to know how Spark currently handles Mesos InverseOffers? My requirement is to be able to dynamically shrink the number of executors/tasks. As far as I could tell by looking at the code, there is no explicit handler for the acceptInverseOffer event from mesos, so I am guess

Re: Spark parse fixed length file [Java]

2018-04-19 Thread lsn24
Thanks for the response JayeshLalwani. Clearly in my case the issue was with my approach, not with the memory. The job was taking much longer time even for smaller dataset. Thanks again! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Spark parse fixed length file [Java]

2018-04-19 Thread lsn24
I was able to solve it by writing a java method (to slice and dice data) and invoking the method/function from spark.map. This transformed the data way faster than my previous approach. Thanks geoHeil for the pointer. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

INSERT INTO TABLE_PARAMS fails during ANALYZE TABLE

2018-04-19 Thread Michael Shtelma
Hi everybody, I wanted to test CBO with enabled histograms. In order to do this, I have enabled property spark.sql.statistics.histogram.enabled. In this test derby was used as a database for hive metastore. The problem is, that in some cases, the values, that are inserted to table TABLE_PARAMS e

hdfs file partition

2018-04-19 Thread 崔苗
Hi, when I create a dataset by reading a json file from hdfs ,I found the partition number of the dataset not equals to the file blocks, so what define the partition number of the dataset when I read file from hdfs ?