; on the data and the analysis you want to do.
>
> > On 21. Feb 2018, at 21:54, Kane Kim wrote:
> >
> > Hello,
> >
> > Which format is better supported in spark, parquet or orc?
> > Will spark use internal sorting of parquet/orc files (and how to test
> that)?
> > Can spark save sorted parquet/orc files?
> >
> > Thanks!
>
Hello,
Which format is better supported in spark, parquet or orc?
Will spark use internal sorting of parquet/orc files (and how to test that)?
Can spark save sorted parquet/orc files?
Thanks!
e it look like your time is correct when it is skewed.
>
> cheers
>
> On Fri, Feb 13, 2015 at 5:51 AM, Kane Kim wrote:
>
>> The thing is that my time is perfectly valid...
>>
>> On Tue, Feb 10, 2015 at 10:50 PM, Akhil Das
>> wrote:
>>
>>> Its with th
> telnet s3.amazonaws.com 80
> GET / HTTP/1.0
>
>
> [image: Inline image 1]
>
> Thanks
> Best Regards
>
> On Wed, Feb 11, 2015 at 6:43 AM, Kane Kim wrote:
>
>> I'm getting this warning when using s3 input:
>> 15/02/11 00:58:37 WARN RestStorageService:
I'm getting this warning when using s3 input:
15/02/11 00:58:37 WARN RestStorageService: Adjusted time offset in response
to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the
time by approximately 0 seconds. Retrying connection.
After that there are tons of 403/forbidden erro
sometimes I'm getting this exception:
Traceback (most recent call last):
File "/opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py", line 162,
in manager
code = worker(sock)
File "/opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py", line 64,
in worker
outfile.flush()
IOError:
Found it - used saveAsHadoopFile
On Mon, Feb 9, 2015 at 9:11 AM, Kane Kim wrote:
> Hi, how to compress output with gzip using python api?
>
> Thanks!
>
Hi, how to compress output with gzip using python api?
Thanks!
on my integration EC2 cluster and got odd results
> for stopping the workers (no workers found) but the start script... seemed
> to work. My integration cluster was running and functioning after executing
> both scripts, but I also didn't make any changes to spark-env either.
>
>
Hi,
I'm trying to change setting as described here:
http://spark.apache.org/docs/1.2.0/ec2-scripts.html
export SPARK_WORKER_CORES=6
Then I ran ~/spark-ec2/copy-dir /root/spark/conf to distribute to slaves,
but without any effect. Do I have to restart workers?
How to do that with spark-ec2?
Than
I submit spark job from machine behind firewall, I can't open any incoming
connections to that box, does driver absolutely need to accept incoming
connections? Is there any workaround for that case?
Thanks.
I'm getting SequenceFile doesn't work with GzipCodec without native-hadoop
code! Where to get those libs and where to put it in the spark?
Also can I save plain text file (like saveAsTextFile) as gzip?
Thanks.
On Wed, Feb 4, 2015 at 11:10 PM, Kane Kim wrote:
> How to save
How to save RDD with gzip compression?
Thanks.
I'm trying to process 5TB of data, not doing anything fancy, just
map/filter and reduceByKey. Spent whole day today trying to get it
processed, but never succeeded. I've tried to deploy to ec2 with the
script provided with spark on pretty beefy machines (100 r3.2xlarge
nodes). Really frustrated tha
How I can reduce number of output files? Is there a parameter to saveAsTextFile?
Thanks.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
I'm trying to process a large dataset, mapping/filtering works ok, but
as long as I try to reduceByKey, I get out of memory errors:
http://pastebin.com/70M5d0Bn
Any ideas how I can fix that?
Thanks.
-
To unsubscribe, e-mail: us
Related question - is execution of different stages optimized? I.e.
map followed by a filter will require 2 loops or they will be combined
into single one?
On Tue, Jan 20, 2015 at 4:33 AM, Bob Tiernay wrote:
> I found the following to be a good discussion of the same topic:
>
> http://apache-spar
I want to add some java options when submitting application:
--conf "spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder"
But looks like it doesn't get set. Where I can add it to make it working?
Thanks.
--
18 matches
Mail list logo