Hi,
I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
minPartitions).
How can I control the no.of tasks by increasing the split size? With
default split size of 250 MB, several tasks are created. But I would like
to have a specific no.of tasks created while reading from H
Suggest you reading «Hadoop Application Architectures» (orelly) by Mark Grover,
Ted Malaska and others. There you can find some answers for your questions.
> 10 окт. 2017 г., в 9:00, Mahender Sarangam
> написал(а):
>
> Hi,
>
> I'm new to spark and big data, we are doing some poc and buildin
I have also tried these. And none of them actually compile.
dataset.map(new MapFunction>>() {
@Override
public Seq> call(String input) throws Exception {
List> temp = new ArrayList<>();
temp.add(new HashMap());
return JavaConverters.asScalaBufferConverter(temp).asS
I need to have a location column inside my Dataframe so that I can do
spatial queries and geometry operations. Are there any third-party packages
that perform this kind of operations. I have seen a few like Geospark and
megalan but they don't support operations where spatial and logical
operators c
> 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
>
> Hi
>
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext
>
> I would say that jdbc is better since it uses HIVE that is based on
> map-reduce / TEZ and then works on disk.
> Using spark
Hi All,
I am constantly hitting an error : "ApplicationMaster:
SparkContext did not initialize after waiting for 100 ms" while running my
Spark code in yarn cluster mode.
Here is the command what I am using :* spark-submit --master yarn
--deploy-mode cluster spark_code.py*
Hi
My environment:
Windows 10,
Spark 1.6.1 built for Hadoop 2.6.0 Build
Python 2.7
Java 1.8
Issue:
Go to C:\Spark
The command:
bin\spark-submit --master local C:\Spark\examples\src\main\python\pi.py 10
gives:
File "", line 1
bin\spark-submit --master local C:\Spark\examples\src\main\python\p
Write your own input format/datasource or split the file yourself beforehand
(not recommended).
> On 10. Oct 2017, at 09:14, Kanagha Kumar wrote:
>
> Hi,
>
> I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
> minPartitions).
>
> How can I control the no.of tasks by
That is not correct, IMHO. If I am not wrong, Spark will still load data in
executor, by running some stats on the data itself to identify
partitions
On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 wrote:
>
> > 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
> >
> > Hi
> >
> > I wonder the differences accessing
I have not tested this, but you should be able to pass on any map-reduce
like conf to underlying hadoop config.essentially you should be able to
control behaviour of split as you can do in a map-reduce program (as Spark
uses the same input format)
On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke
Hi,
Which spatial operations do you require exactly? Also, I don't follow what
you mean by combining logical operators?
I have created a library that wraps Lucene's spatial functionality here:
https://github.com/zouzias/spark-lucenerdd/wiki/Spatial-search
You could give a try to the library, it
Try increasing the `spark.yarn.am.waitTime` parameter, it's by default set
to 100ms which might not be enough in certain cases.
On Tue, Oct 10, 2017 at 7:02 AM, Debabrata Ghosh
wrote:
> Hi All,
> I am constantly hitting an error : "ApplicationMaster:
> SparkContext did not in
that's probably better be directed to the AWS support
On Sun, Oct 8, 2017 at 9:54 PM, Tushar Sudake wrote:
> Hello everyone,
>
> I'm using 'r4.8xlarge' instances on EMR for my Spark Application.
> To each node, I'm attaching one 512 GB EBS volume.
>
> By logging in into nodes I tried verifying t
What about someting like gromesa?
Anastasios Zouzias schrieb am Di. 10. Okt. 2017 um
15:29:
> Hi,
>
> Which spatial operations do you require exactly? Also, I don't follow what
> you mean by combining logical operators?
>
> I have created a library that wraps Lucene's spatial functionality here:
Thanks for the inputs!!
I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the
size I wanted to read. It didn't take any effect.
I also tried passing in spark.dfs.block.size, with all the params set to
the same value.
JavaSparkContext.fromSparkContext(spark.sparkContext()).te
There’s a number of packages for geospatial analysis, depends on the features
you need. Here are a few I know of and/or have used:
Magellan: https://github.com/harsha2010/magellan
MrGeo: https://github.com/ngageoint/mrgeo
GeoMesa: http://www.geomesa.org/documentation/tutorials/spark.html
GeoSpark
Hi all,
GeoMesa integrates with Spark SQL and allows for queries like:
select * from chicago where case_number = 1 and st_intersects(geom,
st_makeBox2d(st_point(-77, 38), st_point(-76, 39)))
GeoMesa does this by calling package protected Spark methods to
implement geospatial user defined typ
Something along the line of:
Dataset df = spark.read().json(jsonDf); ?
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Saturday, October 07, 2017 2:31 AM
To: user @spark
Subject: How to convert Array of Json rows into Dataset of specific columns in
Spark 2.2.0?
I have a Dataset ds which
why can't you do this in Magellan?
Can you post a sample query that you are trying to run that has spatial and
logical operators combined? Maybe I am not understanding the issue properly
Ram
On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad wrote:
> I need to have a location column inside my Datafr
Have you seen this:
https://stackoverflow.com/questions/42796561/set-hadoop-configuration-values-on-spark-submit-command-line
? Please try and let us know.
On Wed, Oct 11, 2017 at 2:53 AM, Kanagha Kumar
wrote:
> Thanks for the inputs!!
>
> I passed in spark.mapred.max.split.size, spark.mapred.mi
Maybe you need to set the parameters for the mapreduce api and not the mapred
api. I do not have in mind now how they differ but the Hadoop web page should
tell you ;-)
> On 10. Oct 2017, at 17:53, Kanagha Kumar wrote:
>
> Thanks for the inputs!!
>
> I passed in spark.mapred.max.split.size, s
Thanks Ayan!
Finally it worked!! Thanks a lot everyone for the inputs!
Once I prefixed the params with "spark.hadoop", I see the no.of tasks
getting reduced.
I'm setting the following params:
--conf spark.hadoop.dfs.block.size
--conf spark.hadoop.mapreduce.input.fileinputformat.split.minsize
Is Hive from Spark via JDBC working for you? In case it does, I would be
interested in your setup :-)
We can't get this working. See bug here, especially my last comment:
https://issues.apache.org/jira/browse/SPARK-21063
Regards
Andreas
--
Sent from: http://apache-spark-user-list.1001560.n3.na
I am able to connect to Spark via JDBC - tested with Squirrel. I am referencing
all the jars of current Spark distribution under
/usr/hdp/current/spark2-client/jars/*
Thanks,
Reema
-Original Message-
From: weand [mailto:andreas.we...@gmail.com]
Sent: Tuesday, October 10, 2017 5:14 PM
Hi,
I do not think that SPARK will automatically determine the partitions.
Actually it does not automatically determine the partitions. In case a
table has a few million records, it all goes through the driver.
Ofcourse, I have only tried JDBC connections in AURORA, Oracle and Postgres.
Regards
Thanks Vadim!
Sent from my iPhone
> On 10-Oct-2017, at 11:09 PM, Vadim Semenov
> wrote:
>
> Try increasing the `spark.yarn.am.waitTime` parameter, it's by default set to
> 100ms which might not be enough in certain cases.
>
>> On Tue, Oct 10, 2017 at 7:02 AM, Debabrata Ghosh
>> wrote:
>> H
Thanks guy for the response.
Basically I am migrating an oracle pl/sql procedure to spark-java. In
oracle I have a table with geometry column, on which I am able to do a
"where col = 1 and geom.within(another_geom)"
I am looking for a less complicated port in to spark for which queries. I
will g
27 matches
Mail list logo