Let us assume that you want to build an integration test setup where
you run all participating components in Docker.
You create a docker-compose.yml with four Docker images, something like this:
# Start docker-compose.yml
version: '2'
services:
myapp:
build: myapp_dir
links:
- ka
Hi,
Any hint about getting the location of a particular RDD partition on the
cluster? a workaround?
Parallelize method on RDDs partitions the RDD into splits as specified or
per as per the default parallelism configuration. Does parallelize actually
distribute the partitions into the cluster
Hi All,IS NOT NULL is not working in programmatic sql. check below for
input output and code.
Input
10,IN
11,PK
12,US
13,UK
14,US
15,IN
16,
17,AS
18,AS
19,IR
20,As
val cntdat = sc.textFile("/user/poc_hortonworks/radha/gsd/sample.txt");
case class CNT (id:Int , code : String)
val cntdf = cntd
Hi All,IS NOT NULL is not working in programmatic sql. check below for input
output and code.
Input
10,IN
11,PK
12,US
13,UK
14,US
15,IN
16,
17,AS
18,AS
19,IR
20,As
val cntdat = sc.textFile("/user/poc_hortonworks/radha/gsd/sample.txt");
case class CNT (id:Int , code : String)
val cntdf = cnt
It doesn't look like you have a NULL field, You have a string-value
field with an empty string.
On Sun, Jul 10, 2016 at 3:19 PM, Radha krishna wrote:
> Hi All,IS NOT NULL is not working in programmatic sql. check below for input
> output and code.
>
> Input
>
> 10,IN
> 11,PK
> 12,US
> 13,UK
Ok thank you, how to achieve the requirement.
On Sun, Jul 10, 2016 at 8:44 PM, Sean Owen wrote:
> It doesn't look like you have a NULL field, You have a string-value
> field with an empty string.
>
> On Sun, Jul 10, 2016 at 3:19 PM, Radha krishna wrote:
> > Hi All,IS NOT NULL is not working in
Hi
with sqlContext we can register a UDF like
this: sqlContext.udf.register("sample_fn", sample_fn _ )
But this UDF is limited to that particular sqlContext only. I wish to make
the registration persistent, so that I can access the same UDF in any
subsequent sqlcontext.
Or is there any other way to
I want to apply null comparison to a column in sqlcontext.sql, is there any
way to achieve this?
On Jul 10, 2016 8:55 PM, "Radha krishna" wrote:
> Ok thank you, how to achieve the requirement.
>
> On Sun, Jul 10, 2016 at 8:44 PM, Sean Owen wrote:
>
>> It doesn't look like you have a NULL field,
I can't seem to find a link the the Spark KEYS file. I am trying to
validate the sigs on the 1.6.2 release artifacts and I need to
import 0x7C6C105FFC8ED089. Is there a KEYS file available for
download somewhere? Apologies if I am just missing an obvious link.
Phil
---
Hi,
So far I have been using Spark "embedded" in my app. Now, I'd like to run it on
a dedicated server.
I am that far:
- fresh ubuntu 16, server name is mocha / ip 10.0.100.120, installed scala
2.10, installed Spark 1.6.2, recompiled
- Pi test works
- UI on port 8080 works
Log says:
Spark Comm
Not sure where you see " 0x7C6C105FFC8ED089". I think the release is signed
with the key https://people.apache.org/keys/committer/pwendell.asc .
I think this tutorial can be helpful:
http://www.apache.org/info/verification.html
On Mon, Jul 11, 2016 at 12:57 AM, Phil Steitz wrote:
> I can't seem
I tested that:
I set:
_JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true
SPARK_LOCAL_IP=10.0.100.120
I still have the warning in the log:
16/07/10 14:10:13 WARN Utils: Your hostname, micha resolves to a loopback
address: 127.0.1.1; using 10.0.100.120 instead (on interface eno1)
16/07/10 14:10:13 WAR
Hi,
I know i am asking again, but I tried running the same thing on mac as well
as some answers on the internet suggested it could be an issue with the
windows environment, but still nothing works.
Can anyone atleast suggest whether its a bug with spark or is it something
else?
Would be really g
On 7/10/16 10:57 AM, Shuai Lin wrote:
> Not sure where you see " 0x7C6C105FFC8ED089". I
That's the key ID for the key below.
> think the release is signed with the
> key https://people.apache.org/keys/committer/pwendell.asc .
Thanks! That key matches. The project should publish a KEYS file
[1]
Hi,
One of solutions to use `spark-csv` (See:
https://github.com/databricks/spark-csv#features).
To load NULL, you can use `nullValue` there.
// maropu
On Mon, Jul 11, 2016 at 1:14 AM, Radha krishna wrote:
> I want to apply null comparison to a column in sqlcontext.sql, is there
> any way to
It appears like i had issues in my /etc/hosts... it seems ok now
> On Jul 10, 2016, at 2:13 PM, Jean Georges Perrin wrote:
>
> I tested that:
>
> I set:
>
> _JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true
> SPARK_LOCAL_IP=10.0.100.120
> I still have the warning in the log:
>
> 16/07/10 14:10:13
I have my dev environment on my Mac. I have a dev Spark server on a freshly
installed physical Ubuntu box.
I had some connection issues, but it is now all fine.
In my code, running on the Mac, I have:
1 SparkConf conf = new
SparkConf().setAppName("myapp").setMaster("spark://10.0
Hi everybody,
I installed Spark 1.6.1, I have two parquet files, but when I need show
registers using unionAll, Spark crash I don't understand what happens.
But when I use show() only one parquet file this is work correctly.
code with fault:
path = '/data/train_parquet/'
train_df = sqlContext.r
Hi,
What's the schema in the parquets?
Also, could you show us the stack trace when the error happens?
// maropu
On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey wrote:
> Hi everybody,
>
> I installed Spark 1.6.1, I have two parquet files, but when I need show
> registers using unionAll, Spark cra
Hi everybody,
We are using Spark to query big data and currently we’re using Zeppelin to
provide a UI for technical users.
Now we also need to provide a UI for business users so we use Oracle BI tools
and set up a Spark Thrift Server (STS) for it.
When I run both Zeppelin and STS throw error:
I
Is this terminating the execution or spark application still runs after
this error?
One thing for sure, it is looking for local file on driver (ie your mac) @
location: file:/Users/jgp/Documents/Data/restaurants-data.json
On Mon, Jul 11, 2016 at 12:33 PM, Jean Georges Perrin wrote:
>
> I have m
Good for the file :)
No it goes on... Like if it was waiting for something
jg
> On Jul 10, 2016, at 22:55, ayan guha wrote:
>
> Is this terminating the execution or spark application still runs after this
> error?
>
> One thing for sure, it is looking for local file on driver (ie your mac)
Hi
Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on
YARN for few months now without much issue.
On Mon, Jul 11, 2016 at 12:48 PM, Chanh Le wrote:
> Hi everybody,
> We are using Spark to query big data and currently we’re using Zeppelin to
> provide a UI for technical us
Yes, that is expected to move on. If it looks it is waiting for something,
my first instinct would be to check network connectivity such as your
cluster must have access back to your Mac to read the file (it is probably
waiting to time out)
On Mon, Jul 11, 2016 at 12:59 PM, Jean Georges Perrin wr
Hi Ayan,
It is brilliant idea. Thank you every much. I will try this way.
Regards,
Chanh
> On Jul 11, 2016, at 10:01 AM, ayan guha wrote:
>
> Hi
>
> Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on
> YARN for few months now without much issue.
>
> On Mon, Jul 11,
Hi,
ISTM multiple sparkcontexts are not recommended in spark.
See: https://issues.apache.org/jira/browse/SPARK-2243
// maropu
On Mon, Jul 11, 2016 at 12:01 PM, ayan guha wrote:
> Hi
>
> Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on
> YARN for few months now without
The log explicitly said "java.lang.OutOfMemoryError: Java heap space", so
you need to allocate more JVM memory for spark?
// maropu
On Mon, Jul 11, 2016 at 11:59 AM, Javier Rey wrote:
> Also the problem appears when I used clause: unionAll
>
> 2016-07-10 21:58 GMT-05:00 Javier Rey :
>
>> This i
Hi,
Since the DataSet will be the major API in spark2.0, why mllib will
DataFrame-based, and 'future development will focus on the DataFrame-based API.’
Any plan will change mllib form DataFrame-based to DataSet-based?
=
Thanks,
lujinhong
--
Hi Team,
I have a spark application up & running on a 10 node Standalone cluster.
When i launch the application in cluster mode i am able to create separate
log file for driver & executors (common for all executors).
But, my requirement is to create separate log file for each executors. Is it
fe
>
> at least links to the keys used to sign releases on the
> download page
+1 for that.
On Mon, Jul 11, 2016 at 3:35 AM, Phil Steitz wrote:
> On 7/10/16 10:57 AM, Shuai Lin wrote:
> > Not sure where you see " 0x7C6C105FFC8ED089". I
>
> That's the key ID for the key below.
> > think the releas
I would suggest you run the scala version of the example first, so you can
tell whether it's a problem of the data you provided or a problem of the
java code.
On Mon, Jul 11, 2016 at 2:37 AM, Biplob Biswas
wrote:
> Hi,
>
> I know i am asking again, but I tried running the same thing on mac as
>
Hi,
I can use JDBC connection to connect from Squirrel client to Spark thrift
server and this works fine.
I have Zeppelin 0.6.o that works OK with the default spark interpreter.
I configured JDBC interpreter to connect to Spark thrift server as follows
[image: Inline images 1]
I can use beeli
Hi Ayan,
I tested It works fine but one more confuse is If my (technical) users want to
write some code in zeppelin to apply thing into Hive table?
Zeppelin and STS can’t share Spark Context that mean we need separated process?
Is there anyway to use the same Spark Context of STS?
Regards,
Chan
DataFrame is a kind of special case of Dataset, so they mean the same thing.
Actually the ML pipeline API will accept Dataset[_] instead of DataFrame in
Spark 2.0.
We can say that MLlib will focus on the Dataset-based API for futher
development more accurately.
Thanks
Yanbo
2016-07-10 20:35 GMT-0
Hi Swaroop,
Would you mind to share your code that others can help you to figure out
what caused this error?
I can run the isotonic regression examples well.
Thanks
Yanbo
2016-07-08 13:38 GMT-07:00 dsp :
> Hi I am trying to perform Isotonic Regression on a data set with 9 features
> and a label
Hi
When you say "Zeppelin and STS", I am assuming you mean "Spark Interpreter"
and "JDBC interpreter" respectively.
Through Zeppelin, you can either run your own spark application (by using
Zeppelin's own spark context) using spark interpreter OR you can access
STS, which is a spark application
36 matches
Mail list logo