date:20190321

Re: Manually reading parquet files.

2019-03-21 Thread Ryan Blue

You're getting InternalRow instances. They probably have the data you want, but the toString representation doesn't match the data for InternalRow. On Thu, Mar 21, 2019 at 3:28 PM Long, Andrew wrote: > Hello Friends, > > > > I’m working on a performance improvement that reads additional parquet

Fwd: Cross Join

2019-03-21 Thread asma zgolli

-- Forwarded message - From: asma zgolli Date: jeu. 21 mars 2019 à 18:15 Subject: Cross Join To: Hello , I need to cross my data and i'm executing a cross join on two dataframes . C = A.crossJoin(B) A has 50 records B has 5 records the result im getting with spark 2.0 is a da

Spark streaming error - Query terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,… 26 more fields != b#1291L

2019-03-21 Thread kineret M

I try to read a stream using my custom data source (v2, using spark 2.3), and it fails *in the second iteration* with the following exception while reading prune columns:Query [id=xxx, runId=yyy] terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,... 26 more field

How shall I configure the Spark executor memory size and the Alluxio worker memory size on a machine?

2019-03-21 Thread u9g

Hey, We have a cluster of 10 nodes each of which consists 128GB memory. We are about to running Spark and Alluxio on the cluster. We wonder how shall allocate the memory to the Spark executor and the Alluxio worker on a machine? Are there some recommendations? Thanks! Best, Andy Li