Hi,
SPARK is just one of the technologies out there now, there are several
other technologies far outperforming SPARK or at least as good as SPARK.
Regards,
Gourav
On Sat, Jul 2, 2022 at 7:42 PM Sid wrote:
> So as per the discussion, shuffle stages output is also stored on disk and
> not in
First of all, define "far outperforming". For sure, there is no GOD
system that does everything perfectly.
In which use-cases are you referring to? It would be interesting to the
community to see some comparisons.
a.
On 5/7/22 12:29, Gourav Sengupta wrote:
Hi,
SPARK is just one of the tec
Hi all,
We are trying to read csv/json files that have been snappy/lz4 compressed
with spark. Files were compressed with the lz4 command line tool and the
python snappy library.
Both did not succeed, while other formats (bzip2 & gzip) worked fine.
I've read in some places that the codec is not f
Hi Team,
I still need help in understanding how reading works exactly?
Thanks,
Sid
On Mon, Jun 20, 2022 at 2:23 PM Sid wrote:
> Hi Team,
>
> Can somebody help?
>
> Thanks,
> Sid
>
> On Sun, Jun 19, 2022 at 3:51 PM Sid wrote:
>
>> Hi,
>>
>> I already have a partitioned JSON dataset in s3 like
"*but I am getting the issue of the duplicate column which was present in
the old dataset.*"
So you have answered your question!
spark.read.option("multiline","true").json("path").filter(
col("edl_timestamp")>last_saved_timestamp) As you have figured out, spark
read all the json files in "path" t
Ehh.. What is "*duplicate column*" ? I don't think Spark supports that.
duplicate column = duplicate rows
tir. 5. jul. 2022 kl. 22:13 skrev Bjørn Jørgensen :
> "*but I am getting the issue of the duplicate column which was present in
> the old dataset.*"
>
> So you have answered your question!
Hi.
I’ve spent the last couple of hours trying to chase down an issue with
writing/reading parquet files. I was trying to save (and then read in) a
parquet file with a schema that sets my non-nullability details correctly.
After having no success for some time, I posted to Stack Overflow abou