015 13:16, "Shailesh Birari" wrote:
>
>> Hi,
>>
>> I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
>> of RAM.
>> I have around 600,000+ Json files on HDFS. Each file is small around 1KB
>> in
>> size. Total data is around
Hi,
I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
of RAM.
I have around 600,000+ Json files on HDFS. Each file is small around 1KB in
size. Total data is around 16GB. Hadoop block size is 256MB.
My application reads these files with sc.textFile() (or sc.jsonFile() tri
Hello,
I want to use Spark sql to aggregate some columns of the data.
e.g. I have huge data with some columns as:
time, src, dst, val1, val2
I want to calculate sum(val1) and sum(val2) for all unique pairs of src and
dst.
I tried by forming SQL query
SELECT a.time, a.src, a.dst, sum(
Hi SM,
Apologize for delayed response.
No, the issue is with Spark 1.2.0. There is a bug in Spark 1.2.0.
Recently Spark have latest 1.3.0 release so it might have fixed in it.
I am not planning to test it soon, may be after some time.
You can try for it.
Regards,
Shailesh
--
View this messa
ng "spark.shuffle.blockTransferService" to "nio".
>
> On Sun, Jan 25, 2015 at 6:28 PM, Shailesh Birari
> wrote:
>
>> Can anyone please let me know ?
>> I don't want to open all ports on n/w. So, am interested in the property
>> by
>> w
Can anyone please let me know ?
I don't want to open all ports on n/w. So, am interested in the property by
which this new port I can configure.
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-How-to-change-Default-Random-port-tp21306p2
Hello,
Recently, I have upgraded my setup to Spark 1.2 from Spark 1.1.
I have 4 node Ubuntu Spark Cluster.
With Spark 1.1, I used to write Spark Scala program in Eclipse on my Windows
development host and submit the job on Ubuntu Cluster, from Eclipse (Windows
machine).
As on my network not all
0.1 is
> guaranteed to work, as should any other version from the past few years).
>
> On Tue, Jan 20, 2015 at 6:16 PM, Shailesh Birari
> wrote:
>
>> Hi Frank,
>>
>> Its a normal eclipse project where I added Scala and Spark libraries as
>> user libraries.
>
are using Maven (or what) to build, but if you can pull up
> your builds dependency tree, you will likely find com.google.guava being
> brought in by one of your dependencies.
>
> Regards,
>
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-3
are mixing versions
> of Spark then, with some that still refer to unshaded Guava. Make sure
> you are not packaging Spark with your app and that you don't have
> other versions lying around.
>
> On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari
> wrote:
> > Hello,
> >
Hello,
I recently upgraded my setup from Spark 1.1 to Spark 1.2.
My existing applications are working fine on ubuntu cluster.
But, when I try to execute Spark MLlib application from Eclipse (Windows
node) it gives java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions exception.
Not
Have you tried to set the host name/port to your Windows machine ?
Also specify the following ports for Spark. Make sure the ports you
mentioned should not be blocked (on windows machine).
spark.fileserver.port
spark.broadcast.port
spark.replClassServer.port
spark.blockManager.port
spark.executor.
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable().
I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than
earlier).
I also tried by changing schema to use Long data type in some fields but
seems conversion takes more time.
Is there any way to specify index ?
Hello,
I have written an Spark SQL application which reads data from HDFS and
query on it.
The data size is around 2GB (30 million records). The schema and query I am
running is as below.
The query takes around 05+ seconds to execute.
I tried by adding
rdd.persist(StorageLevel.MEMORY_AND
Thanks by setting driver host to Windows and specifying some ports (like
driver, fileserver, broadcast etc..) it worked perfectly. I need to specify
those ports as not all ports are open on my machine.
For, driver host name, I was assuming Spark should get it, as in case of
linux we are not settin
Yes, this is doable.
I am submitting the Spark job using
JavaSparkContext spark = new JavaSparkContext(sparkMaster,
"app name", System.getenv("SPARK_HOME"),
new String[] {"application JAR"});
To run this first you have to create the application jar and in above API
specify
Can anyone please help me here ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p17552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Some more update.
Now, I tried with by setting spark.driver.host to Spark Master node and
spark.driver.port to 51800 (available open port), but its failing with bind
error. I was hoping that it will start the driver on supplied host:port and
as its unix node there should not be any issue.
Can you
Hello,
I am able to submit Job on Spark cluster from Windows desktop. But the
executors are not able to run.
When I check the Spark UI (which is on Windows, as Driver is there) it shows
me JAVA_HOME, CLASS_PATH and other environment variables related to Windows.
I tried by setting spark.executor.e
Thanks Sameer for quick reply.
I will try to implement it.
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962p16970.html
Sent from the Apache Spark User List mailing list archive at Nabb
Hello,
Spark 1.1.0, Hadoop 2.4.1
I have written a Spark streaming application. And I am getting
FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath).
Here is brief what I am is trying to do.
My application is creating text file stream using Java Stream context. The
input file is on
Hi Xianguri,
After setting SVD to smaller value (200) its working.
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15179.html
Sent from the Apache Spark User List mailing
Note, the data is random numbers (double).
Any suggestions/pointers will be highly appreciated.
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html
Sent from the A
23 matches
Mail list logo