of files generated.
>
> On 28 Nov 2016 8:29 p.m., "Kevin Tran" wrote:
>
>> Hi Denny,
>> Thank you for your inputs. I also use 128 MB but still too many files
>> generated by Spark app which is only ~14 KB each ! That's why I'm asking if
&g
tion
> Data
> Storage Tips for Optimal Spark Performance
> <https://spark-summit.org/2015/events/data-storage-tips-for-optimal-spark-performance/>.
>
>
> On Sun, Nov 27, 2016 at 9:44 PM Kevin Tran wrote:
>
>> Hi Everyone,
>> Does anyone know what is the best p
Hi Everyone,
Does anyone know what is the best practise of writing parquet file from
Spark ?
As Spark app write data to parquet and it shows that under that directory
there are heaps of very small parquet file (such as
e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only
15KB
Hi Everyone,
Does anyone know how could we extract timestamp from Kafka message in Spark
streaming ?
JavaPairInputDStream messagesDStream =
KafkaUtils.createDirectStream(
ssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParam
Hi Everyone,
I tried in cluster mode on YARN
* spark-submit --jars /path/sqldriver.jar
* --driver-class-path
* spark-env.sh
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/path/*"
* spark-defaults.conf
spark.driver.extraClassPath
spark.executor.extraClassPath
None of them works for me !
Does a
h worker-0] INFO
org.apache.spark.executor.Executor - Finished task 0.0 in stage 12.0 (TID
12). 2518 bytes result sent to driver
Does anyone have any ideas?
On Wed, Sep 7, 2016 at 7:30 PM, Kevin Tran wrote:
> Hi Everyone,
> Does anyone know why call() function being called *3 tim
Hi Everyone,
Does anyone know why call() function being called *3 times* for each
message arrive
JavaDStream message = messagesDStream.map(new
>> Function, String>() {
>
> @Override
>
> public String call(Tuple2 tuple2) {
>
> return tuple2._2();
>
> }
>
> });
>
>
>>
>
> message.foreachRDD(rdd -> {
; *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetar
Hi everyone,
Please give me your opinions on what is the best ID Generator for ID field
in parquet ?
UUID.randomUUID();
AtomicReference currentTime = new
AtomicReference<>(System.currentTimeMillis());
AtomicLong counter = new AtomicLong(0);
Thanks,
Kevin.
https://issues.apache.org/jir
; loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
Hi,
Does anyone know what is the best practises to store data to parquet file?
Does parquet file has limit in size ( 1TB ) ?
Should we use SaveMode.APPEND for long running streaming app ?
How should we store in HDFS (directory structure, ... )?
Thanks,
Kevin.
Hi,
I wrote to parquet file as following:
++
|word|
++
|THIS IS MY CHARACTERS ...|
|// ANOTHER LINE OF CHAC...|
++
These lines are not full text and it is being trimmed down.
Does anyone know how many chacters StringType
Hi Everyone,
Does anyone know how to write parquet file after parsing data in Spark
Streaming?
Thanks,
Kevin.
13 matches
Mail list logo