date:20170620

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B

Hadoop version 2.7.3 On Tue, Jun 20, 2017 at 11:12 PM, yohann jardin wrote: > Which version of Hadoop are you running on? > > *Yohann Jardin* > Le 6/21/2017 à 1:06 AM, N B a écrit : > > Ok some more info about this issue to see if someone can shine a light on > what could be going on. I turned o

Re: Spark 2.1.1 and Hadoop version 2.2 or 2.7?

2017-06-20 Thread yohann jardin

https://spark.apache.org/docs/2.1.0/building-spark.html#specifying-the-hadoop-version Version Hadoop v2.2.0 only is the default build version, but other versions can still be built. The package you downloaded is prebuilt for Hadoop 2.7 as said on the download page, don't worry. Yohann Jardin L

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread yohann jardin

Which version of Hadoop are you running on? Yohann Jardin Le 6/21/2017 à 1:06 AM, N B a écrit : Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process and this is

RE: Merging multiple Pandas dataframes

2017-06-20 Thread Mendelson, Assaf

If you do an action, most intermediate calculations would be gone for the next iteration. What I would do is persist every iteration, then after some (say 5) I would write to disk and reload. At that point you should call unpersist to free the memory as it is no longer relevant. Thanks,

Spark 2.1.1 and Hadoop version 2.2 or 2.7?

2017-06-20 Thread N B

I had downloaded the pre build package labeled "Spark 2.1.1 prebuilt with Hadoop 2.7 or later" from the direct download link on spark.apache.org. However, I am seeing compatibility errors running against a deployed HDFS 2.7.3. (See my earlier message about Flume DStream producing 0 records after H

RE: Using Spark as a simulator

2017-06-20 Thread Mahesh Sawaiker

I have already seen on example where data is generated using spark, no reason to think it's a bad idea as far as I know. You can check the code here, I m not very sure but I think there is something there which generates data for the TPCDS benchmark and you can provide how much data you want in

Unsubscribe

2017-06-20 Thread Palash Gupta

Unsubscribe Thanks & Best Regards, Engr. Palash Gupta Consultant, OSS/CEM/Big Data Skype: palash2494 https://www.linkedin.com/in/enggpalashgupta

Re: appendix

2017-06-20 Thread Wenchen Fan

you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL. > On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql") >

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Jean Georges Perrin

After investigation, it looks like my Spark 2.1.1 jars got corrupted during download - all good now... ;) > On Jun 20, 2017, at 4:14 PM, Jean Georges Perrin wrote: > > Hey all, > > i was giving a run to 2.1.1 and got an error on one of my test program: > > package net.jgp.labs.spark.l000_ing

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B

Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process and this is what gets thrown in the logs and keeps throwing it even after the downed HDFS node is restarted. Usi

Re: Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

never mind! I has a space at the end of my data which was not showing up in manual testing. thanks From: jeff saremi Sent: Tuesday, June 20, 2017 2:48:06 PM To: user@spark.apache.org Subject: Bizzare diff in behavior between scala REPL and sparkSQL UDF I have

Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

I have this function which does a regex matching in scala. I test it in the REPL I get expected results. I use it as a UDF in sparkSQL i get completely incorrect results. Function: class UrlFilter (filters: Seq[String]) extends Serializable { val regexFilters = filters.map(new Regex(_)) r

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Michael Armbrust

It's in the spark-catalyst_2.11-2.1.1.jar since the logical query plans and optimization also need to know about types. On Tue, Jun 20, 2017 at 1:14 PM, Jean Georges Perrin wrote: > Hey all, > > i was giving a run to 2.1.1 and got an error on one of my test program: > > package net.jgp.labs.spar

How to bootstrap Spark Kafka direct with the previous state in case of a code upgrade

2017-06-20 Thread SRK

Hi, How do we bootstrap the streaming job with the previous state when we do a code change and redeploy? We use updateStateByKey to maintain the state and store session objects and LinkedHashMaps in the checkpoint. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.

Re: "Sharing" dataframes...

2017-06-20 Thread Jean Georges Perrin

Thanks Vadim & Jörn... I will look into those. jg > On Jun 20, 2017, at 2:12 PM, Vadim Semenov > wrote: > > You can launch one permanent spark context and then execute your jobs within > the context. And since they'll be running in the same context, they can share > data easily. > > These t

org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Jean Georges Perrin

Hey all, i was giving a run to 2.1.1 and got an error on one of my test program: package net.jgp.labs.spark.l000_ingestion; import java.util.Arrays; import java.util.List; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import org

Re: "Sharing" dataframes...

2017-06-20 Thread Vadim Semenov

You can launch one permanent spark context and then execute your jobs within the context. And since they'll be running in the same context, they can share data easily. These two projects provide the functionality that you need: https://github.com/spark-jobserver/spark-jobserver#persistent-context-

Re: "Sharing" dataframes...

2017-06-20 Thread Jörn Franke

You could all express it in one program, alternatively ignite in memory file system or the ignite sharedrdd ( not sure if dataframe is supported) > On 20. Jun 2017, at 19:46, Jean Georges Perrin wrote: > > Hey, > > Here is my need: program A does something on a set of data and produces > resu

Re: Merging multiple Pandas dataframes

2017-06-20 Thread Saatvik Shah

Hi Assaf, Thanks for the suggestion on checkpointing - I'll need to read up more on that. My current implementation seems to be crashing with a GC memory limit exceeded error if Im keeping multiple persist calls for a large number of files. Thus, I was also thinking about the constant calls to p

"Sharing" dataframes...

2017-06-20 Thread Jean Georges Perrin

Hey, Here is my need: program A does something on a set of data and produces results, program B does that on another set, and finally, program C combines the data of A and B. Of course, the easy way is to dump all on disk after A and B are done, but I wanted to avoid this. I was thinking of c

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B

BTW, this is running on Spark 2.1.1. I have been trying to debug this issue and what I have found till now is that it is somehow related to the Spark WAL. The directory named /receivedBlockMetadata seems to stop getting written to after the point of an HDFS node being killed and restarted. I have

Unsubscribe

2017-06-20 Thread Anita Tailor

Unsubscribe Sent from my iPhone

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Jules Damji

And we will having a webinar on July 27 going into some more details. Stay tuned. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 20, 2017, at 7:00 AM, Michael Mior wrote: > > It's still in the early stages, but check out Deep Learning Pipelines from > Databricks

Re: Using Spark as a simulator

2017-06-20 Thread Jörn Franke

It is fine, but you have to design it that generated rows are written in large blocks for optimal performance. The most tricky part with data generation is the conceptual part, such as probabilistic distribution etc You have to check as well that you use a good random generator, for some cases

Using Spark as a simulator

2017-06-20 Thread Esa Heikkinen

Hi Spark is a data analyzer, but would it be possible to use Spark as a data generator or simulator ? My simulation can be very huge and i think a parallelized simulation using by Spark (cloud) could work. Is that good or bad idea ? Regards Esa Heikkinen

spark higher order functions

2017-06-20 Thread AssafMendelson

Hi, I have seen that databricks have higher order functions (https://docs.databricks.com/_static/notebooks/higher-order-functions.html, https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html) which basically allows to do generic ope

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Michael Mior

It's still in the early stages, but check out Deep Learning Pipelines from Databricks https://github.com/databricks/spark-deep-learning -- Michael Mior mm...@apache.org 2017-06-20 0:36 GMT-04:00 Gaurav1809 : > Hi All, > > Similar to how we have machine learning library called ML, do we have > a

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Correction. On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog wrote: > , Below is the query, looks like from physical plan, the query is same as > that of cqlsh, > > val query = s"""(select * from model_data > where TimeStamp > \'$timeStamp+\' and TimeStamp <= > \'$startTS+\' >

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

, Below is the query, looks like from physical plan, the query is same as that of cqlsh, val query = s"""(select * from model_data where TimeStamp > \'$timeStamp+\' and TimeStamp <= \'$startTS+\' and MetricID = $metricID)""" println("Model query" + query) val df

Re: Cassandra querying time stamps

2017-06-20 Thread Riccardo Ferrari

Hi, Personally I would inspect how dates are managed. How does your spark code looks like? What does the explain say. Does TimeStamp gets parsed the same way? Best, On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog wrote: > Hello, > > I have a table as below > > CREATE TABLE analytics_db.ml_forecas

Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Hello, I have a table as below CREATE TABLE analytics_db.ml_forecast_tbl ( "MetricID" int, "TimeStamp" timestamp, "ResourceID" timeuuid "Value" double, PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID") ) select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" > '20

Unsubscribe

2017-06-20 Thread praba karan

Unsubscribe Sent from Yahoo Mail on Android

Re: Spark Streaming - Increasing number of executors slows down processing rate

2017-06-20 Thread Biplob Biswas

Hi Edwin, I have faced a similar issue as well and this behaviour is very abrupt. I even created a question on StackOverflow but no solution yet. https://stackoverflow.com/questions/43496205/spark-job-processing-time-increases-to-4s-without-explanation For us, we sometimes had this constant delay

spark2.1 and kafka0.10

2017-06-20 Thread lk_spark

hi,all : https://issues.apache.org/jira/browse/SPARK-19680 is this issue have any method to patch it ? I met the same problem. 2017-06-20 lk_spark

Re: Flume DStream produces 0 records after HDFS node killed

Re: Spark 2.1.1 and Hadoop version 2.2 or 2.7?

Re: Flume DStream produces 0 records after HDFS node killed

RE: Merging multiple Pandas dataframes

Spark 2.1.1 and Hadoop version 2.2 or 2.7?

RE: Using Spark as a simulator

Unsubscribe

Re: appendix

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

Re: Flume DStream produces 0 records after HDFS node killed

Re: Bizzare diff in behavior between scala REPL and sparkSQL UDF

Bizzare diff in behavior between scala REPL and sparkSQL UDF

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

How to bootstrap Spark Kafka direct with the previous state in case of a code upgrade

Re: "Sharing" dataframes...

org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

Re: "Sharing" dataframes...

Re: "Sharing" dataframes...

Re: Merging multiple Pandas dataframes

"Sharing" dataframes...

Re: Flume DStream produces 0 records after HDFS node killed

Unsubscribe

Re: Do we anything for Deep Learning in Spark?

Re: Using Spark as a simulator

Using Spark as a simulator

spark higher order functions

Re: Do we anything for Deep Learning in Spark?

Re: Cassandra querying time stamps

Re: Cassandra querying time stamps

Re: Cassandra querying time stamps

Cassandra querying time stamps

Unsubscribe

Re: Spark Streaming - Increasing number of executors slows down processing rate

spark2.1 and kafka0.10

34 matches

Site Navigation

Mail list logo

Footer information