, because schema verification is a good thing i
> would assume?
>
> On Tue, Feb 9, 2016 at 3:25 PM, Alexandr Dzhagriev
> wrote:
>
>> Hi Koert,
>>
>> As far as I can see you are using derby:
>>
>> Using direct SQL, underlying DB is DERBY
>>
&
Hi Koert,
As far as I can see you are using derby:
Using direct SQL, underlying DB is DERBY
not mysql, which is used for the metastore. That means, spark couldn't find
hive-site.xml on your classpath. Can you check that, please?
Thanks, Alex.
On Tue, Feb 9, 2016 at 8:58 PM, Koert Kuipers wro
Hello all,
I looked through the cassandra spark integration (
https://github.com/datastax/spark-cassandra-connector) and couldn't find
any usages of the BulkOutputWriter (
http://www.datastax.com/dev/blog/bulk-loading) - an awesome tool for
creating local sstables, which could be later uploaded to
Hi Sebastian,
Do you have any updates on the issue? I faced with pretty the same problem
and disabling kryo + raising the spark.network.timeout up to 600s helped.
So for my job it takes about 5 minutes to broadcast the variable (~5GB in
my case) but then it's fast. I mean much faster than shufflin
On Mon, Feb 1, 2016 at 9:55 AM, Alexandr Dzhagriev
> wrote:
>
>> Hi,
>>
>> That's another thing: that the Record case class should be outside. I ran
>> it as spark-submit.
>>
>> Thanks, Alex.
>>
>> On Mon, Feb 1, 2016 at 6:41 PM, Ted
e.spark.sql.Dataset.(Dataset.scala:80)
> at org.apache.spark.sql.Dataset.(Dataset.scala:91)
> at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:488)
> at
> org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:71)
> ... 53 elided
>
> On Mon, Feb 1, 2016 at 9:09 AM, Ale
nfun$checkAnalysis$
1.org
$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:130)
Thanks, Alex.
On Mon, Feb 1, 2016 at 6:03 PM, Alexandr Dzhagriev wrote:
> Hi Ted,
>
> That doesn't help neither as one method delegates to
Have you tried:
>
> agg(collect_list($"b")
>
> On Mon, Feb 1, 2016 at 8:50 AM, Alexandr Dzhagriev
> wrote:
>
>> Hello,
>>
>> I'm trying to run the following example code:
>>
>> import org.apache.spark.sql.hive.HiveContext
>&g
Hello,
I'm trying to run the following example code:
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.functions._
case class RecordExample(a: Int, b: String)
object ArrayExample {
def main(args: Array[String]) {
va
Hello Sateesh,
I think you can use a file stream, e.g.
streamingContext.fileStream[KeyClass, ValueClass,
InputFormatClass](dataDirectory)
to create a stream and then process the RDDs as you are doing now.
Thanks, Alex.
On Thu, Jan 28, 2016 at 10:56 AM, Sateesh Karuturi <
sateesh.karutu...@gma
10 matches
Mail list logo