Hi I want to ask an issue I have faced while using Spark. I load dataframes
from parquet files. Some dataframes' parquet have lots of partitions, >10
million rows.
Running "where id = x" query on dataframe scans all partitions. When saving
to rdd object/parquet there is a partition column. The men
I have a hunch I want to share: I feel that data is not being deallocated in
memory (at least like in 1.3). Once it goes in-memory it just stays there.
Spark SQL works fine, the same query when run on a new shell won't throw
that error, but when run on a shell which has been used for other queries
I will second this. I very rarely used to get out-of-memory errors in 1.3.
Now I get these errors all the time. I feel that I could work on 1.3
spark-shell for long periods of time without spark throwing that error,
whereas in 1.4 the shell needs to be restarted or gets killed frequently.
--
Vie
_Master.columns("LeadSource","Utm_Source","Utm_Medium","Utm_Campaign"),
"left")
When I do this I get error: too many arguments for method apply.
Thanks
Bipin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mu
wide tables.
>
> Cheng
>
>
> On 6/15/15 5:48 AM, Bipin Nag wrote:
>
> HI Davies,
>
> I have tried recent 1.4 and 1.5-snapshot to 1) open the parquet and save
> it again or 2 apply schema to rdd and save dataframe as parquet but now I
> get this error (right in t
bug. My error
doesn't show up in newer versions, so this is the problem to fix now.
Thanks
On 13 June 2015 at 06:31, Davies Liu wrote:
> Maybe it's related to a bug, which is fixed by
> https://github.com/apache/spark/pull/6558 recently.
>
> On Fri, Jun 12, 2015 at 5:3
have to change it
properly.
Thanks for helping out.
Bipin
On 12 June 2015 at 14:57, Cheng Lian wrote:
> On 6/10/15 8:53 PM, Bipin Nag wrote:
>
> Hi Cheng,
>
> I am using Spark 1.3.1 binary available for Hadoop 2.6. I am loading an
> existing parquet file, then repartitioni
,
1, lastpk, 1, JdbcRDD.resultSetToObjectArray)
myRDD.saveAsObjectFile("rawdata/"+name);
For applying schema and saving the parquet:
val myschema = schemamap(name)
val myrdd =
sc.objectFile[Array[Object]]("/home/bipin/rawdata/"+name).map(x =>
org.apache.spark.s
Hi,
When I try to save my data frame as a parquet file I get the following
error:
java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to
org.apache.spark.sql.types.Decimal
at
org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:220)
wrote:
> I suspect that Bookings and Customerdetails both have a PolicyType field,
> one is string and the other is an int.
>
>
> Cheng
>
>
> On 6/8/15 9:15 PM, Bipin Nag wrote:
>
> Hi Jeetendra, Cheng
>
> I am using following code for joining
>
> val
e joined DataFrame whose PolicyType is string to an
>> existing Parquet file whose PolicyType is int? The exception indicates that
>> Parquet found a column with conflicting data types.
>>
>> Cheng
>>
>>
>> On 6/8/15 5:29 PM, bipin wrote:
>>
>>
Hi I get this error message when saving a table:
parquet.io.ParquetDecodingException: The requested schema is not compatible
with the file schema. incompatible types: optional binary PolicyType (UTF8)
!= optional int32 PolicyType
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompa
quot;/home/bipin/rawdata/"+name)
But I get
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to
org.apache.spark.sql.Row
How to work around this. Is there a better way.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Create-dataf
1,2,3.
On 29 April 2015 at 18:04, Manoj Awasthi wrote:
> Sorry but I didn't fully understand the grouping. This line:
>
> >> The group must only take the closest previous trigger. The first one
> hence shows alone.
>
> Can you please explain further?
>
>
>
Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet out
of them such that I can infer that Customer registered event from 1to 2 and
if p
I have looked into sqlContext documentation but there is nothing on how to
merge two data-frames. How can I do this ?
Thanks
Bipin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-merge-two-dataframes-with-same-schema-tp22606.html
Sent from the
Hi all,
I am facing an issue, whenever I run a job on my mesos cluster, I cannot see
any progress on my terminal. It shows :
[Stage 0:>(0 + 0) /
204]
I have setup the cluster on AWS EC2 manually. I first run mesos master and
slaves, then run s
Looks a good option. BTW v3.0 is round the corner.
http://slick.typesafe.com/news/2015/04/02/slick-3.0.0-RC3-released.html
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Microsoft-SQL-jdbc-support-from-spark-sql-tp22399p22521.html
Sent from the Apach
I am running the queries from spark-sql. I don't think it can communicate
with thrift server. Can you tell how I should run the quries to make it
work.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Microsoft-SQL-jdbc-support-from-spark-sql-tp22399p22516.ht
I was running the spark shell and sql with --jars option containing the paths
when I got my error. What is the correct way to add jars I am not sure. I
tried placing the jar inside the directory you said but still get the error.
I will give the code you posted a try. Thanks.
--
View this message
Hi I imported a table from mssql server with Sqoop 1.4.5 in parquet format.
But when I try to load it from Spark shell, it throws error like :
scala> val df1 = sqlContext.load("/home/bipin/Customer2")
scala.collection.parallel.CompositeThrowable: Multiple exceptions thrown
duri
s some thoughts - credit to Cheng Lian for this -
> about making the JDBC data source extensible for third party support
> possibly via slick.
>
>
> On Mon, Apr 6, 2015 at 10:41 PM bipin wrote:
>
>> Hi, I am trying to pull data from ms-sql server. I have tried using the
Hi, I am trying to pull data from ms-sql server. I have tried using the
spark.sql.jdbc
CREATE TEMPORARY TABLE c
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:sqlserver://10.1.0.12:1433\;databaseName=dbname\;",
dbtable "Customer"
);
But it shows java.sql.SQLException: No suitable driver fou
23 matches
Mail list logo