Fantastic - glad to see that it's in the pipeline!
On Wed, Jan 7, 2015 at 11:27 AM, Michael Armbrust
wrote:
> I want to support this but we don't yet. Here is the JIRA:
> https://issues.apache.org/jira/browse/SPARK-3851
>
> On Tue, Jan 6, 2015 at 5:23 PM, Adam Gilmore
&g
for high performance, but there is
potential of new fields being added to the JSON structure, so we want to be
able to handle that every time we encode to Parquet (we'll be doing it
"incrementally" for performance).
On Mon, Jan 5, 2015 at 3:44 PM, Adam Gilmore wrote:
> I saw
s/latest/sql-programming-guide.html#configuration
>
> On Mon, Jan 5, 2015 at 3:38 PM, Adam Gilmore
> wrote:
>
>> Hi all,
>>
>> I have a question regarding predicate pushdown for Parquet.
>>
>> My understanding was this would use the metadata in Parquet's
&g
Hi all,
I have a question regarding predicate pushdown for Parquet.
My understanding was this would use the metadata in Parquet's blocks/pages
to skip entire chunks that won't match without needing to decode the values
and filter on every value in the table.
I was testing a scenario where I had
Just an update on this - I found that the script by Amazon was the culprit
- not exactly sure why. When I installed Spark manually onto the EMR (and
did the manual configuration of all the EMR stuff), it worked fine.
On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore
wrote:
> Hi all,
>
>
ngle Parquet file (which is an HDFS
> directory with multiple part-files) are identical.
>
> On 12/22/14 1:11 PM, Adam Gilmore wrote:
>
>Hi all,
>
> I understand that parquet allows for schema versioning automatically in
> the format; however, I'm not sure whether S
Hi all,
I understand that parquet allows for schema versioning automatically in the
format; however, I'm not sure whether Spark supports this.
I'm saving a SchemaRDD to a parquet file, registering it as a table, then
doing an insertInto with a SchemaRDD with an extra column.
The second SchemaRDD
Hi all,
I've just launched a new Amazon EMR cluster and used the script at:
s3://support.elasticmapreduce/spark/install-spark
to install Spark (this script was upgraded to support 1.2).
I know there are tools to launch a Spark cluster in EC2, but I want to use
EMR.
Everything installs fine; ho