date:20170210

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Jacek Laskowski

"Something like that" I've never tried it out myself so I'm only guessing having a brief look at the API. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat,

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Egor Pahomov

Jacek, so I create cache in ForeachWriter, in all "process()" I write to it and on close I flush? Something like that? 2017-02-09 12:42 GMT-08:00 Jacek Laskowski : > Hi, > > Yes, that's ForeachWriter. > > Yes, it works with element by element. You're looking for mapPartition > and ForeachWriter h

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-10 Thread nguyen duc Tuan

Hi Nick, Because we use *RandomSignProjectionLSH*, there is only one parameter for LSH is the number of hashes. I try with small number of hashes (2) but the error is still happens. And it happens when I call similarity join. After transformation, the size of dataset is about 4G. 2017-02-11 3:07

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-10 Thread Nick Pentreath

What other params are you using for the lsh transformer? Are the issues occurring during transform or during the similarity join? On Fri, 10 Feb 2017 at 05:46, nguyen duc Tuan wrote: > hi Das, > In general, I will apply them to larger datasets, so I want to use LSH, > which is more scaleable t

Re: Strange behavior with 'not' and filter pushdown

2017-02-10 Thread Everett Anderson

Bumping this thread. Translating "where not(username is not null)" into a filter of [IsNotNull(username), Not(IsNotNull(username))] seems like a rather severe bug. Spark 1.6.2: explain select count(*) from parquet_table where not( username is not null) == Physical Plan == TungstenAggregate(key=

Getting exit code of pipe()

2017-02-10 Thread Xuchen Yao

Hello Community, I have the following Python code that calls an external command: rdd.pipe('run.sh', env=os.environ).collect() run.sh can either exit with status 1 or 0, how could I get the exit code from Python? Thanks! Xuchen

Re: [Spark Context]: How to add on demand jobs to an existing spark context?

2017-02-10 Thread Cosmin Posteuca

Thank you very much for your answers, Now i understand better what i have to do! Thank you! On Wed, 8 Feb 2017 at 22:37, Gourav Sengupta wrote: > Hi, > > I am not quite sure of your used case here, but I would use spark-submit > and submit sequential jobs as steps to an EMR cluster. > > > Regar

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-10 Thread Ryan Blue

This isn't related to the progress bar, it just happened while in that section of code. Something else is taking memory in the driver, usually a broadcast table or something else that requires a lot of memory and happens on the driver. You should check your driver memory settings and the query pla

SQL warehouse dir

2017-02-10 Thread Joseph Naegele

Hi all, I've read the docs for Spark SQL 2.1.0 but I'm still having issues with the warehouse and related details. I'm not using Hive proper, so my hive-site.xml consists only of: javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/mnt/data/spark/metastore_db;create=true I've set "sp

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-10 Thread nguyen duc Tuan

hi Das, In general, I will apply them to larger datasets, so I want to use LSH, which is more scaleable than the approaches as you suggested. Have you tried LSH in Spark 2.1.0 before ? If yes, how do you set the parameters/configuration to make it work ? Thanks. 2017-02-10 19:21 GMT+07:00 Debasish

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-10 Thread Debasish Das

If it is 7m rows and 700k features (or say 1m features) brute force row similarity will run fine as well...check out spark-4823...you can compare quality with approximate variant... On Feb 9, 2017 2:55 AM, "nguyen duc Tuan" wrote: > Hi everyone, > Since spark 2.1.0 introduces LSH (http://spark.ap

HDFS Shell tool

2017-02-10 Thread Vitásek , Ladislav

Hello Spark fans, I would like to inform you about our tool we want to share in big data community. I think it can be also handy for Spark users. We created a new utility - HDFS Shell to work with HDFS data more easily. https://github.com/avast/hdfs-shell *Feature highlights* - HDFS DFS command

Write JavaDStream to Kafka (how?)

2017-02-10 Thread Gutwein, Sebastian

Hi, I'am new to Spark-Streaming and want to run some end-to-end-tests with Spark and Kafka. My program is running but at the kafka topic nothing arrives. Can someone please help me? Where is my mistake, has someone a runnig example of writing a DStream to Kafka 0.10.1.0? The program looks

Add hive-site.xml at runtime

2017-02-10 Thread Shivam Sharma

Hi, I have multiple hive configurations(hive-site.xml) and because of that only I am not able to add any hive configuration in spark *conf* directory. I want to add this configuration file at start of any *spark-submit* or *spark-shell*. This conf file is huge so *--conf* is not a option for me.

Re: Add hive-site.xml at runtime

2017-02-10 Thread Shivam Sharma

Did anybody get above mail? Thanks On Fri, Feb 10, 2017 at 11:51 AM, Shivam Sharma <28shivamsha...@gmail.com> wrote: > Hi, > > I have multiple hive configurations(hive-site.xml) and because of that > only I am not able to add any hive configuration in spark *conf* directory. > I want to add this

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: Practical configuration to run LSH in Spark 2.1.0

Re: Practical configuration to run LSH in Spark 2.1.0

Re: Strange behavior with 'not' and filter pushdown

Getting exit code of pipe()

Re: [Spark Context]: How to add on demand jobs to an existing spark context?

Re: Driver hung and happend out of memory while writing to console progress bar

SQL warehouse dir

Re: Practical configuration to run LSH in Spark 2.1.0

Re: Practical configuration to run LSH in Spark 2.1.0

HDFS Shell tool

Write JavaDStream to Kafka (how?)

Add hive-site.xml at runtime

Re: Add hive-site.xml at runtime

15 matches

Site Navigation

Mail list logo

Footer information