Hi Team,
i want to update a col3 in table 1 if col1 from table2 is less than col1 in
table1 and update each record in table 1.I 'am not getting the correct
output.
Table 1:
col1|col2|col3
2020-11-17T20:50:57.777+|1|null
Table 2:
col1|col2|col3
2020-11-17T21:19:06.508+|1|win
2020-11-17T20
Hi All,
I have a following info.in the data column.
<1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new
packt=20 orgin=null address=null dest=fgjglgl
here I want to create a separate column for the above key value pairs after
the integer <1000> separated by spaces.
Is there a
Hi,
When im using option 1,it is completely overwrite the whole table.this is
not expected here.im running for multiple tables with different hours.
When im using option 2,im getting the following error
Predicate references non-partition column 'json_feeds_flatten_data'. Only
the partition colum
Hi Team,
I'm very new to spark structured streaming.could you please guide me how to
Schedule/Orchestrate spark structured streaming job.Any scheduler similar
like airflow.I knew airflow doesn't support streaming jobs.
Thanks
Anbu
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.c
Hi Team,
I'm facing weird behavior in the pyspark dataframe(databricks delta spark
3.0.0 supported)
I have tried the below two options to write the processed datafame data into
delta table with respect to the partition columns in the table.Actually
overwrite mode completely overwrite the whole ta
Hi Team,
While working on the json data and we flattened the unstrucured data into
structured format.so here we are having spark data types like
Array> fields and Array data type columns
in the databricks delta table.
while loading the data from databricks spark connector to snowflake we
noticed
Hello All,
I have a column in a dataframe which i struct type.I want to find the size
of the column in bytes.it is getting failed while loading in snowflake.
I could see size functions avialable to get the length.how to calculate the
size in bytes for a column in pyspark dataframe.
pyspark.sql.f
Hello All,
We have a data in a column in pyspark dataframe having array of struct type
having multiple nested fields present.if the value is not blank it will save
the data in the same array of struct type in spark delta table.
please advise on the below case:
if the same column coming as blank
Hello Sir,
Could you please advise the below scenario in pyspark 2.4.3 in data-bricks
to load the data into the delta table.
I want to load the dataframe with this column "data" into the table as Map
type in the data-bricks spark delta table.could you please advise on this
scenario.how to convert
Hi sir,
Could you please help me on the below two cases in the databricks pyspark
data processing terabytes of json data read from aws s3 bucket.
case 1:
currently I'm reading multiple tables sequentially to get the day count
from each table
for ex: table_list.csv having one column with multip
Thank you Farhan so much for the help.
please help me on the design approach of this problem.what is the best way
to achieve this code to get the results better.
I have some clarification on the code.
want to take daily record count of ingestion source vs databricks delta lake
table vs snowflake
Hi,
I have a question on the design of monitoring pyspark script on the large
number of source json data coming from more than 100 kafka topics.
These multiple topics are store under separate bucket in aws s3.each of the
kafka topics having more Terabytes of json data with respect to the
partition
Hello,
version = spark 2.4.3
I have 3 different sources json logs data which having same schema(same
columns order) in the raw data and want to add one new column as
"src_category" for all the 3 different source to distinguish the source
category and merge all the 3 different sources into th
Hi,
I have a raw source data frame having 2 columns as below
timestamp
2019-11-29 9:30:45
message_log
<123>NOV 29 10:20:35 ips01 sfids: connection:
tcp,bytes:104,user:unknown,url:unknown,host:127.0.0.1
how do we break above each key value as separate columns using
Hello Guha,
The number of keys will be different for each event id.for example if the
event id:005 it is has 10 keys then i have to flatten all those 10 keys in
the final output.here there is no fixed number of keys for each event id.
001 -> 2 keys
002 -> 4 keys
003 -> 5 keys
above event id h
Hello Sir,
I have a scenario to flatten the different combinations of map type(key
value) in a column called eve_data like below:
How do we flatten the map type into proper columns using pyspark
1) Source Dataframe having 2 columns(event id,data)
eve_id,eve_data
001, "k1":"abc",
"k2":"
Hi All,
I have a scenario in (Spark scala/Hive):
Day 1:
i have a file with 5 columns which needs to be processed and loaded into
hive tables.
day2:
Next day the same feeds(file) has 8 columns(additional fields) which needs
to be processed and loaded into hive tables
How do we approach this pro
Hi All,
Could you please help me to fix the below issue using spark 2.4 , scala 2.12
How do we extract's the multiple values in the given file name pattern using
spark/scala regular expression.please
give me some idea on the below approach.
object Driver {
private val filePattern =
xyzabc_so
Hello All, Could you please help me to fix the below questions
Question 1:
I have tried the below options while writing the final data in a csv file to
ignore double quotes in the same csv file .nothing is worked. I'm using
spark version 2.2 and scala version 2.11 .
option("quote", "\"")
.optio
Thanks Jacek Laskowski Sir.but i didn't get the point here
please advise the below one are you expecting:
dataset1.as("t1)
join(dataset3.as("t2"),
col(t1.col1) === col(t2.col1), JOINTYPE.Inner )
.join(dataset4.as("t3"), col(t3.col1) === col(t1.col1),
JOINTYPE.Inner)
.select("id",lit(refe
Hi Sir,
Could you please advise to fix the below issue in the withColumn in the
spark 2.2 scala 2.11 joins
def processing(spark:SparkSession,
dataset1:Dataset[Reference],
dataset2:Dataset[DataCore],
dataset3:Dataset[ThirdPartyData] ,
dataset4:Dataset[OtherData]
date:String):Dataset[DataMerge
21 matches
Mail list logo