Hello, Kylin and Spark users,
A doc is newly added in Apache Kylin website on how to using Kylin as a
data source in Spark;
This can help the users who want to use Spark to analysis the aggregated
Cube data.
https://kylin.apache.org/docs23/tutorial/spark.html
Thanks for your attention.
--
Best
Thank you for reply.
I checked WEB UI and found that the total number of tasks is 10.
So, I changed the number of cores from 1 to 10, then it works well.
But I haven't figure out what is happening.
My assumption is that each Job consists of 10 tasks in default and each
task occupies 1 core.
S
I have written four lines of simple spark program to process data in Phoenix
table:
queryString = getQueryFullString( );// Get data from Phoenix table select
col from table
JavaPairRDD phRDD = jsc.newAPIHadoopRDD(
configuration,
Ph
I am reading data from Kafka using structured streaming and I need to save
the data to InfluxDB. In the regular Dstreams based approach I did this as
follows:
val messages:DStream[(String, String)] = kafkaStream.map(record =>
(record.topic, record.value))
messages.foreachRDD { rdd =>
Yes. The JSON files compressed by Flume or Spark work well with Spark. But
the json files compressed by myself cannot be read by spark due to codec
problem. It seems sparking can read files compressed by hadoop snappy(
https://code.google.com/archive/p/hadoop-snappy/) only
Regard,
Junfeng Chen
O
I made some snappy compressed json file with normal snappy codec(
https://github.com/xerial/snappy-java ) , which seems cannot be read by
Spark correctly.
So how to make existed snappy file recognized by spark? Any tools to
convert them?
Thanks@!
Regard,
Junfeng Chen
Hi,
I am having this peculiar problem with our spark jobs in our cluster, where
the spark job ends with a message:
No current assignment for partition iomkafkaconnector-deliverydata-dev-2
We have a setup where we have 4 kafka partitions and 4 spark executors, so
each partition should be directl