Hello Asmath,
We had a similar challenge recently.
When you write back to hive, you are creating files on HDFS, and it depends on
your batch window.
If you increase your batch window lets say from 1 min to 5 mins you will end up
creating 5x times less.
The other factor is your partitioning. F
ld not read all the content, which is probably also not
> happening.
>
> On 24. Oct 2017, at 18:16, Siva Gudavalli <mailto:gudavalli.s...@yahoo.com.INVALID>> wrote:
>
>>
>> Hello,
>>
>> I have an update here.
>>
>> spark SQL is push
xplain at
:33
== Physical Plan ==
TakeOrderedAndProject(limit=10, orderBy=[id#192 DESC], output=[id#192])
+- ConvertToSafe
+- Project [id#192]
+- Filter (usr#199 = AA0YP)
+- HiveTableScan [id#192,usr#199], MetastoreRelation default, hlogsv5,
None, [(cdt#189 = 20171003),(usrpartkey#191 = hhhUsers)]
Hello,
I am working with Spark SQL to query Hive Managed Table (in Orc Format)
I have my data organized by partitions and asked to set indexes for each
50,000 Rows by setting ('orc.row.index.stride'='5')
lets say -> after evaluating partition there are around 50 files in which
data is
Hello,
I have my data stored in parquet file format. My data Is already partitioned by
dates and keyNow I want my data in each file to be sorted by a new Code column.
date1 -> key1
-> paqfile1
->paqfile2
->key2
->paqfile1
->paqfile2
date2
k
> Java serialization.
>
> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli
> wrote:
>
>> hello,
>>
>> i am writing a spark streaming application to read data from kafka. I am
>> using no receiver approach and enabled checkpointing to make sure I am not
>> rea
hello,
i am writing a spark streaming application to read data from kafka. I am
using no receiver approach and enabled checkpointing to make sure I am not
reading messages again in case of failure. (exactly once semantics)
i have a quick question how checkpointing needs to be configured to handle
Ref:https://issues.apache.org/jira/browse/SPARK-11953
In Spark 1.3.1 we have 2 methods i.e.. CreateJdbcTable and InsertIntoJdbc
They are replaced with write.jdbc() in Spark 1.4.1
CreateJDBCTable allows to perform CREATE TABLE ... i.e... DDL on the table
followed by INSERT (DML)
InsertIntoJDBC
Hi,
I am trying to write a dataframe from Spark 1.4.1 to oracle 11g
I am using
dataframe.write.mode(SaveMode.Append).jdbc(url,tablename, properties)
this is always trying to create a Table.
I would like to insert records to an existing table instead of creating a
new one each single time. Plea