Incomplete data when reading from S3

2016-03-18 Thread Blaž Šnuderl
hy this happens and possible fixes? Regards, Blaž Šnuderl

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Blaž Šnuderl
Try setting s3 credentials using keys specified here https://github.com/Aloisius/hadoop-s3a/blob/master/README.md Blaz On Dec 30, 2015 6:48 PM, "KOSTIANTYN Kudriavtsev" < kudryavtsev.konstan...@gmail.com> wrote: > Dear Spark community, > > I faced the following issue with trying accessing data on

Re: pyspark: calculating row deltas

2016-01-10 Thread Blaž Šnuderl
This can be done using spark.sql and window functions. Take a look at https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html On Sun, Jan 10, 2016 at 11:07 AM, Franc Carter wrote: > > Sure, for a dataframe that looks like this > > ID Year Value > 1 2012 100 > 1

Pyspark 2.1.0 weird behavior with repartition

2017-01-30 Thread Blaž Šnuderl
I am loading a simple text file using pyspark. Repartitioning it seems to produce garbage data. I got this results using spark 2.1 prebuilt for hadoop 2.7 using pyspark shell. >>> sc.textFile("outc").collect() [u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l'] >>> sc.textFil