Hi,
We are newbies learning spark. We are running Scala query against our
Parquet table. Whenever we fire query in Jupyter, results are shown in
page, Only part of results are shown in UI. So we are trying to store
the results into table which is Parquet format. By default, In Spark all
the ta
+1,
Even see performance degradation while comparing SPark SQL with Hive.
We have table of 260 columns. We have executed in hive and SPARK. In Hive, it
is taking 66 sec for 1 gb of data whereas in Spark, it is taking 4 mins of time.
On 6/9/2016 3:19 PM, Gavin Yue wrote:
Could you print out the s
Hi,
Is there any way to dynamically execute a string which has scala code
against spark engine. We are dynamically creating scala file, we would
like to submit this scala file to Spark, but currently spark accepts
only JAR file has input from Remote Job submission. Is there any other
way to s
want to do so?
Ideally there would be a better approach than solving such problems as
mentioned below.
A sample example would help to understand the problem.
Regards,
Kiran
From: Mahender Sarangam
<mailto:mahender.bigd...@outlook.com>
Date: Wednesday, October 26, 2016 at 2:05 PM
To: user <m
Hi,
We are converting our hive logic which is using lateral view and explode
functions. Is there any builtin function in scala for performing lateral
view explore.
Below is our query in Hive. temparray is temp table with c0 and c1 columns
SELECT id, CONCAT_WS(',', collect_list(LineID)) as Li
Hi All,
Is there any support of theta join in SPARK. We want to identify the
country based on range on IP Address (we have in our DB)
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
I'm new to spark and big data, we are doing some poc and building our
warehouse application using Spark. Can any one share with me guidance
like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any
sample architecture diagram.
-Mahens
Hi,
I'm new to Spark and Scala, need help on transforming Nested JSON using Scala.
We have upstream returning JSON like
{
"id": 100,
"text": "Hello, world."
Users : [ "User1": {
"name": "Brett",
"id": 200,
"Type" : "Employee"
"empid":"2"
},
"Use
Hi,
Can any one share with me nice tutorials on Spark with Scala like
videos, blogs for beginners. Mostly focusing on writing scala code.
Thanks in advance.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
Does anyone has good architecture document/design principle for building
warehouse application using Spark.
Is it better way of having Hive Context created with HQL and perform
transformation or Directly loading files in dataframe and perform data
transformation.
We need to implement SCD
Hi,
We are storing our final transformed data in Hive table in JSON format. while
storing data into table, all the null fields are converted into \\N. while
reading table, we are seeing \\N instead of NULL. We tried setting
ALTER TABLE sample set SERDEPROPERTIES ('serialization.null.format' = "\
I’m trying to read multiple .json.gz files from a Blob storage path using the
below scala code. But I’m unable to read the data from the files or print the
schema. If the files are not compressed as .gz then we are able to read all the
files into the Dataframe.
I’ve even tried giving *.gz but n
a folder containing
multiple gz files.
From: Mahender Sarangam
<mailto:mahender.bigd...@outlook.com>
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Unable to read multiple JSON.Gz File.
I’m trying to read multiple .json.gz
Hi,
We have daily data pull which pulls almost 50 GB of data from upstream system.
We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data
into Hive Target table and Now we are copying whole hive target table to SQL
esp. SQL Staging Table & implement merge from staging
14 matches
Mail list logo