well... it turns out, that extra part-* file goes away when i limit
--num-executors to 1 or 2 (leaving it to default maxes it out, which in turn
gives an extra empty part-file)
i guess the test data i'm using only requires that many executors
--
Sent from: http://apache-spark-user-list.1001
You need to insert per partition per batch. Normally database drivers meant
for spark have bulk update feature built in. They take a RDD and do a bulk
insert per partition.
In case db driver you are using doesn't provide this feature, you can
aggregate records per partition and then send out to db
by writing code, I suppose :) Jokes apart, I think you need to articulate
the problem with more details for others to help.
Do you mean you want to batch up data in memory and then write as a chunk?
Where do want to insert? Etc etc...
On Fri, Apr 20, 2018 at 1:08 PM, amit kumar singh
wrote:
> H
How to bulk insert using spark streaming job
Sent from my iPhone
As a follow-up question, what happened
to org.apache.spark.sql.parquet.RowWriteSupport ? It seems like it would
help me.
On Thu, Apr 19, 2018 at 9:23 PM, Christopher Piggott
wrote:
> I am trying to write some parquet files and running out of memory. I'm
> giving my workers each 16GB and the da
I am trying to write some parquet files and running out of memory. I'm
giving my workers each 16GB and the data is 102 columns * 65536 rows - not
really all that much. The content of each row is a short string.
I am trying to create the file by dynamically allocating a StructType of
StructField
Hello everyone,
I wanted to know how Spark currently handles Mesos InverseOffers?
My requirement is to be able to dynamically shrink the number of
executors/tasks.
As far as I could tell by looking at the code, there is no explicit
handler for the acceptInverseOffer event from mesos, so I am guess
Thanks for the response JayeshLalwani. Clearly in my case the issue was with
my approach, not with the memory.
The job was taking much longer time even for smaller dataset.
Thanks again!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
I was able to solve it by writing a java method (to slice and dice data) and
invoking the method/function from spark.map. This transformed the data way
faster than my previous approach.
Thanks geoHeil for the pointer.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
Hi everybody,
I wanted to test CBO with enabled histograms.
In order to do this, I have enabled property
spark.sql.statistics.histogram.enabled.
In this test derby was used as a database for hive metastore.
The problem is, that in some cases, the values, that are inserted to table
TABLE_PARAMS e
Hi,
when I create a dataset by reading a json file from hdfs ,I found the partition
number of the dataset not equals to the file blocks,
so what define the partition number of the dataset when I read file from hdfs ?
11 matches
Mail list logo