"Something like that" I've never tried it out myself so I'm only
guessing having a brief look at the API.
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Sat,
Jacek, so I create cache in ForeachWriter, in all "process()" I write to it
and on close I flush? Something like that?
2017-02-09 12:42 GMT-08:00 Jacek Laskowski :
> Hi,
>
> Yes, that's ForeachWriter.
>
> Yes, it works with element by element. You're looking for mapPartition
> and ForeachWriter h
Hi Nick,
Because we use *RandomSignProjectionLSH*, there is only one parameter for
LSH is the number of hashes. I try with small number of hashes (2) but the
error is still happens. And it happens when I call similarity join. After
transformation, the size of dataset is about 4G.
2017-02-11 3:07
What other params are you using for the lsh transformer?
Are the issues occurring during transform or during the similarity join?
On Fri, 10 Feb 2017 at 05:46, nguyen duc Tuan wrote:
> hi Das,
> In general, I will apply them to larger datasets, so I want to use LSH,
> which is more scaleable t
Bumping this thread.
Translating "where not(username is not null)" into a filter of
[IsNotNull(username),
Not(IsNotNull(username))] seems like a rather severe bug.
Spark 1.6.2:
explain select count(*) from parquet_table where not( username is not null)
== Physical Plan ==
TungstenAggregate(key=
Hello Community,
I have the following Python code that calls an external command:
rdd.pipe('run.sh', env=os.environ).collect()
run.sh can either exit with status 1 or 0, how could I get the exit code
from Python? Thanks!
Xuchen
Thank you very much for your answers, Now i understand better what i have
to do! Thank you!
On Wed, 8 Feb 2017 at 22:37, Gourav Sengupta
wrote:
> Hi,
>
> I am not quite sure of your used case here, but I would use spark-submit
> and submit sequential jobs as steps to an EMR cluster.
>
>
> Regar
This isn't related to the progress bar, it just happened while in that
section of code. Something else is taking memory in the driver, usually a
broadcast table or something else that requires a lot of memory and happens
on the driver.
You should check your driver memory settings and the query pla
Hi all,
I've read the docs for Spark SQL 2.1.0 but I'm still having issues with the
warehouse and related details.
I'm not using Hive proper, so my hive-site.xml consists only of:
javax.jdo.option.ConnectionURL
jdbc:derby:;databaseName=/mnt/data/spark/metastore_db;create=true
I've set "sp
hi Das,
In general, I will apply them to larger datasets, so I want to use LSH,
which is more scaleable than the approaches as you suggested. Have you
tried LSH in Spark 2.1.0 before ? If yes, how do you set the
parameters/configuration to make it work ?
Thanks.
2017-02-10 19:21 GMT+07:00 Debasish
If it is 7m rows and 700k features (or say 1m features) brute force row
similarity will run fine as well...check out spark-4823...you can compare
quality with approximate variant...
On Feb 9, 2017 2:55 AM, "nguyen duc Tuan" wrote:
> Hi everyone,
> Since spark 2.1.0 introduces LSH (http://spark.ap
Hello Spark fans,
I would like to inform you about our tool we want to share in big data
community. I think it can be also handy for Spark users.
We created a new utility - HDFS Shell to work with HDFS data more easily.
https://github.com/avast/hdfs-shell
*Feature highlights*
- HDFS DFS command
Hi,
I'am new to Spark-Streaming and want to run some end-to-end-tests with Spark
and Kafka.
My program is running but at the kafka topic nothing arrives. Can someone
please help me?
Where is my mistake, has someone a runnig example of writing a DStream to Kafka
0.10.1.0?
The program looks
Hi,
I have multiple hive configurations(hive-site.xml) and because of that only
I am not able to add any hive configuration in spark *conf* directory. I
want to add this configuration file at start of any *spark-submit* or
*spark-shell*. This conf file is huge so *--conf* is not a option for me.
Did anybody get above mail?
Thanks
On Fri, Feb 10, 2017 at 11:51 AM, Shivam Sharma <28shivamsha...@gmail.com>
wrote:
> Hi,
>
> I have multiple hive configurations(hive-site.xml) and because of that
> only I am not able to add any hive configuration in spark *conf* directory.
> I want to add this
15 matches
Mail list logo