Hi Gaurav,
You can process IoT data using Spark. But where will you store the
raw/processed data - Cassandra, Hive, HBase?
You might want to look at the Hadoop cluster for data storage and
processing (Spark using Yarn).
For processing streaming data, you might also explor
Hi,
Good day.
My setup:
1. Single node Hadoop 2.7.3 on Ubuntu 16.04.
2. Hive 2.1.1 with metastore in MySQL.
3. Spark 2.1.0 configured using hive-site.xml to use MySQL metastore.
4. The VERSION table contains SCHEMA_VERSION = 2.1.0
Hive
Hi,
Good day.
My setup:
1. Single node Hadoop 2.7.3 on Ubuntu 16.04.
2. Hive 2.1.1 with metastore in MySQL.
3. Spark 2.1.0 configured using hive-site.xml to use MySQL metastore.
4. The VERSION table contains SCHEMA_VERSION = 2.1.0
Hive
Hi,
I am new to Spark. I would like to learn Spark.
I think I should learn version 2.0.2.
Or should I still go for version 1.6.x and then come to version 2.0.2?
Please advise.
Thanks in advance.
Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohi
Hi,
The aws CLI already has your access key aid and secret access
key when you initially configured it.
Is your s3 bucket without any access restrictions?
Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga
From: Ashic Mahtab [mailto:a
Hi
Can you look at Apache Drill as sql engine on hive?
Lohith
Sent from my Sony Xperia™ smartphone
Tapan Upadhyay wrote
Thank you everyone for guidance.
Jorn our motivation is to move bulk of adhoc queries to hadoop so that we have
enough bandwidth on our DB for imp batch/queries.
Hi Kramer,
Some options:
1. Store in Cassandra with TTL = 24 hours. When you read the full
table, you get the latest 24 hours data.
2. Store in Hive as ORC file and use timestamp field to filter out the
old data.
3. Try windowing in spark or flink (have not used ei
If all sql results have same set of columns you could UNION all the dataframes
Create an empty df and Union all
Then reassign new df to original df before next union all
Not sure if it is a good idea, but it works
Lohith
Sent from my Sony Xperia™ smartphone
Divya Gehlot wrote
Hi,
Hi,
If you can also format the condition file as a csv file similar
to the main file, then you can join the two dataframes and select only required
columns.
Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga
From: Divya Gehlot [mailto:divya.htco...@gm
Hi Arun,
You can do df.agg(max(,,), min(..)).
Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga
From: Arunkumar Pillai [mailto:arunkumar1...@gmail.com]
Sent: Thursday, February 04, 2016 14.53
To: user@spark.apache.org
Subject: Need to user univariate
10 matches
Mail list logo