When I restart my streaming program??this bug found And it will kill my
program
I am using spark 1.4.1
15/11/22 03:20:00 WARN CheckpointWriter: Error in attempt 1 of writing
checkpoint to hdfs://streaming/user/dm/order_predict/streaming_
v2/10/checkpoint/checkpoint-144813360
org.apa
Those are empty partitions. I don't see the number of partitions specified
in code. That then implies the default parallelism config is being used and
is set to a very high number, the sum of empty + non empty files.
Regards
Sab
On 21-Nov-2015 11:59 pm, "Andy Davidson"
wrote:
> I start working o
want to read file data and check if file line data is present in Cassandra
if it's present then needs to merge otherwise fresh insert to C*. File data
just contains name,address in json format, in Cassandra student table have
UUID as primary key and there is secondry index on name
Once data is me
Hi
I am trying to figure which Datastore I can use for storing data to be
used with GraphX. Is there a good Graph database out there which I can use
for storing Graph data for efficient data storage/retireval.
thanks,
ravi
Hi,
I would like to know how/where are the serialized closures shipped: are they
sent once per executors or copied to each task? From my understanding they
are copied with each tasks but in the online documentation there is
misleading information.
For example, on the
http://spark.apache.org/docs
I start working on a very simple ETL pipeline for a POC. It reads a in a
data set of tweets stored as JSON strings on in HDFS and randomly selects 1%
of the observations and writes them to HDFS. It seems to run very slowly.
E.G. To write 4720 observations takes 1:06:46.577795. I
Also noticed that R
Hi
I have few doubts
1.does rdd.saveasNewAPIHadoopFile(outputdir,keyclass,valueclass,ouputformat
class)-> shuffles data or it will always create same no of files in output
dir as number of partitions in rdd.
2. How to use multiple outputs in saveasNewAPIHadoopFile to have file name
generated fro
Hi all,
I am having problem of understanding how RDD will be partitioned after
calling mapToPair function.
Could anyone give me more information about parititoning in this function?
I have a simple application doing following job:
JavaPairInputDStream messages =
KafkaUtils.createDirectStream(...
Instead of sending the results of the one spark app directly to the other
one, you could write the results to a Kafka topic which is consumed by your
other spark application.
On Fri, Nov 20, 2015 at 12:07 PM Saiph Kappa wrote:
> I think my problem persists whether I use Kafka or sockets. Or am I
Hi,
I found if the column value is too long, spark shell only show a partial result.
such as:
sqlContext.sql("select url from tableA”).show(10)
it cannot show the whole URL here. so how to adjust it? Thanks
-
To unsubscr
Hello,
I have a set of X data (around 30M entry), I have to do a batch to merge
data which are similar, at the end I will have around X/2 data.
At this moment, i've done the basis : open files, map to an usable Ojbect,
but I'm stuck at the merge part...
The merge condition is composed from vario
Hi guys
Is it possible to add a new partition to a persistent table using Spark SQL
? The following call works and data gets written in the correct
directories, but no partition metadata is not added to the Hive metastore.
In addition I see nothing preventing any arbitrary schema being appended to
12 matches
Mail list logo