date:20190115

Re:Re:cache table vs. parquet table performance

2019-01-15 Thread 大啊

So I think cache large data is not a best practice. At 2019-01-16 12:24:34, "大啊" wrote: Hi ,Tomas. Thanks for your question give me some prompt.But the best way use cache usually stores smaller data. I think cache large data will consume memory or disk space too much. Spill the cached data in

Re:cache table vs. parquet table performance

2019-01-15 Thread 大啊

Hi ,Tomas. Thanks for your question give me some prompt.But the best way use cache usually stores smaller data. I think cache large data will consume memory or disk space too much. Spill the cached data in parquet format maybe a good improvement. At 2019-01-16 02:20:56, "Tomas Bartalos" wrote:

RE: dataset best practice question

2019-01-15 Thread kevin.r.mellott

Hi Mohit, I’m not sure that there is a “correct” answer here, but I tend to use classes whenever the input or output data represents something meaningful (such as a domain model object). I would recommend against creating many temporary classes for each and every transformation step as that

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-15 Thread Jiaan Geng

Glad to hear this. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-15 Thread Jeff Zhang

Congrats, Great work Dongjoon. Dongjoon Hyun 于2019年1月15日周二下午3:47写道： > We are happy to announce the availability of Spark 2.2.3! > > Apache Spark 2.2.3 is a maintenance release, based on the branch-2.2 > maintenance branch of Spark. We strongly recommend all 2.2.x users to > upgrade to this st

Re: How to force-quit a Spark application?

2019-01-15 Thread Marcelo Vanzin

You should check the active threads in your app. Since your pool uses non-daemon threads, that will prevent the app from exiting. spark.stop() should have stopped the Spark jobs in other threads, at least. But if something is blocking one of those threads, or if something is creating a non-daemon

How to force-quit a Spark application?

2019-01-15 Thread Pola Yao

I submitted a Spark job through ./spark-submit command, the code was executed successfully, however, the application got stuck when trying to quit spark. My code snippet: ''' { val spark = SparkSession.builder.master(...).getOrCreate val pool = Executors.newFixedThreadPool(3) implicit val xc = E

dataset best practice question

2019-01-15 Thread Mohit Jaggi

Fellow Spark Coders, I am trying to move from using Dataframes to Datasets for a reasonably large code base. Today the code looks like this: df_a= read_csv df_b = df.withColumn ( some_transform_that_adds_more_columns ) //repeat the above several times With datasets, this will require defining ca

cache table vs. parquet table performance

2019-01-15 Thread Tomas Bartalos

Hello, I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB. I tried to make some improvement with cache table: cache table event_jan_01 as

DFS Pregel performance vs simple Java DFS implementation

2019-01-15 Thread daveb

Hi, Considering a directed graph with 15,000 vertices and 14,000 edges, I wonder why GraphX (Pregel) takes much more time than the java implementation of a graph to get all the vertices from a vertex to the leaf? By the nature of the graph, we can almost consider it as a tree. The java implementa

SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-15 Thread Xiangrui Meng

Hi all, I want to re-send the previous SPIP on introducing a DataFrame-based graph component to collect more feedback. It supports property graphs, Cypher graph queries, and graph algorithms built on top of the DataFrame API. If you are a GraphX user or your workload is essentially graph queries,

SparkSql query on a port and peocess queries

2019-01-15 Thread Soheil Pourbafrani

Hi, In my problem data is stored on both Database and HDFS. I create an application that according to the query, Spark load data, process the query and return the answer. I'm looking for a service that gets SQL queries and returns the answers (like Databases command line). Is there a way that my

Re:Re:cache table vs. parquet table performance

Re:cache table vs. parquet table performance

RE: dataset best practice question

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

Re: How to force-quit a Spark application?

How to force-quit a Spark application?

dataset best practice question

cache table vs. parquet table performance

DFS Pregel performance vs simple Java DFS implementation

SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

SparkSql query on a port and peocess queries

12 matches

Site Navigation

Mail list logo

Footer information