Hi everyone,
I´m trying to read a text file with UTF-16LE but I´m getting weird
characters like this:
�� W h e n
My code is this one:
sparkSession
.read
.format("text")
.option("charset", "UTF-16LE")
.load("textfile.txt")
I´m using Spark 2.3.1. Any idea to fix it
Hi,
I have the following issue,
case class Item (c1: String, c2: String, c3: Option[BigDecimal])
import sparkSession.implicits._
val result = df.as[Item].groupByKey(_.c1).mapGroups((key, value) => { value
})
But I get the following error in compilation time:
Unable to find encoder for type stor
I´m trying to build an application where is necessary to do bulkGets and
bulkLoad on Hbase.
I think that I could use this component
https://github.com/hortonworks-spark/shc
*Is it a good option??*
But* I can't import it in my project*. Sbt cannot resolve hbase
connector
This is my build.sbt:
Hi.
I'm testing "spark testing base". For example:
class MyFirstTest extends FunSuite with SharedSparkContext{
def tokenize(f: RDD[String]) = {
f.map(_.split("").toList)
}
test("really simple transformation"){
val input = List("hi", "hi miguel", "bye")
val expected = List(List(
i.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ
>>
>> Thanks
>> Best Regards
>>
>> On Sun, Nov 29, 2015 at 9:48 PM, Masf wrote:
>>
>>> Hi
>>>
>>> Is it possible to debug spark locally with IntelliJ or another IDE?
>>>
>>> Thanks
>>>
>>> --
>>> Regards.
>>> Miguel Ángel
>>>
>>
>>
>
--
Saludos.
Miguel Ángel
Hi Ardo
Some tutorial to debug with Intellij?
Thanks
Regards.
Miguel.
On Sun, Nov 29, 2015 at 5:32 PM, Ndjido Ardo BAR wrote:
> hi,
>
> IntelliJ is just great for that!
>
> cheers,
> Ardo.
>
> On Sun, Nov 29, 2015 at 5:18 PM, Masf wrote:
>
>> Hi
>>
&
Hi
Is it possible to debug spark locally with IntelliJ or another IDE?
Thanks
--
Regards.
Miguel Ángel
filter function
> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext>
> as the second argument.
>
> Thanks
> Best Regards
>
> On Wed, Aug 19, 2015 at 10:46 PM, Masf wrote:
>
>> Hi.
>>
>> I'd like
Hi.
I have a dataframe and I want to insert these data into parquet partitioned
table in Hive.
In Spark 1.4 I can use
df.write.partitionBy("x","y").format("parquet").mode("append").saveAsTable("tbl_parquet")
but in Spark 1.3 I can't. How can I do it?
Thanks
--
Regards
Miguel
Hi.
I'd like to read Avro files using this library
https://github.com/databricks/spark-avro
I need to load several files from a folder, not all files. Is there some
functionality to filter the files to load?
And... Is is possible to know the name of the files loaded from a folder?
My problem is
Hi.
I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner
join between these dataframes, the result contains 200 partitions. *Why?*
df1.join(df2, df1("id") === df2("id"), "Inner") => returns 200 partitions
Thanks!!!
--
Regards.
Miguel Ángel
Hi.
I think that it's possible to do:
*df.select($"*", lit(null).as("col17", lit(null).as("col18",
lit(null).as("col19",, lit(null).as("col26")*
Any other advice?
Miguel.
On Wed, May 27, 2015 at 5:02 PM, Masf wrote:
> Hi.
>
Hi.
I have a DataFrame with 16 columns (df1) and another with 26 columns(df2).
I want to do a UnionAll. So, I want to add 10 columns to df1 in order to
have the same number of columns in both dataframes.
Is there some alternative to "withColumn"?
Thanks
--
Regards.
Miguel Ángel
3) minimum,
> sum(case when endrscp>100 then 1 else 0 end test from j'
>
> Let me know if this works.
> On 26 May 2015 23:47, "Masf" wrote:
>
>> Hi
>> I don't know how it works. For example:
>>
>> val result = joinedData.groupBy("co
o it?
Thanks
Regards.
Miguel.
On Tue, May 26, 2015 at 12:35 AM, ayan guha wrote:
> Case when col2>100 then 1 else col2 end
> On 26 May 2015 00:25, "Masf" wrote:
>
>> Hi.
>>
>> In a dataframe, How can I execution a conditional sentence in a
>>
Hi.
In a dataframe, How can I execution a conditional sentence in a
aggregation. For example, Can I translate this SQL statement to DataFrame?:
SELECT name, SUM(IF table.col2 > 100 THEN 1 ELSE table.col1)
FROM table
GROUP BY name
Thanks
--
Regards.
Miguel
Hi.
I have a spark application where I store the results into table (with
HiveContext). Some of these columns allow nulls. In Scala, this columns are
represented through Option[Int] or Option[Double].. Depend on the data type.
For example:
*val hc = new HiveContext(sc)*
*var col1: Option[Ingeger
Hi Eric.
Q1:
When I read parquet files, I've tested that Spark generates so many
partitions as parquet files exist in the path.
Q2:
To reduce the number of partitions you can use rdd.repartition(x), x=>
number of partitions. Depend on your case, repartition could be a heavy task
Regards.
Miguel
Hi guys
Regarding to parquet files. I have Spark 1.2.0 and reading 27 parquet files
(250MB/file), it lasts 4 minutes.
I have a cluster with 4 nodes and it seems me too slow.
The "load" function is not available in Spark 1.2, so I can't test it
Regards.
Miguel.
On Mon, Apr 13, 2015 at 8:12 PM,
00)?
>
> --- Original Message ---
>
> From: "Masf"
> Sent: April 9, 2015 11:45 PM
> To: user@spark.apache.org
> Subject: Increase partitions reading Parquet File
>
> Hi
>
> I have this statement:
>
> val file =
> SqlContext.parquetfile("hd
Hi.
I'm using Spark SQL 1.2. I have this query:
CREATE TABLE test_MA STORED AS PARQUET AS
SELECT
field1
,field2
,field3
,field4
,field5
,COUNT(1) AS field6
,MAX(field7)
,MIN(field8)
,SUM(field9 / 100)
,COUNT(field10)
,SUM(IF(field11 < -500, 1, 0))
,MAX(field12)
,SUM(IF(field13 = 1, 1, 0))
,SUM(I
2015 at 7:53 AM, Masf wrote:
>
>>
>> Hi.
>>
>> In Spark SQL 1.2.0, with HiveContext, I'm executing the following
>> statement:
>>
>> CREATE TABLE testTable STORED AS PARQUET AS
>> SELECT
>> field1
>> FROM table1
>>
>>
Hi.
In Spark SQL 1.2.0, with HiveContext, I'm executing the following statement:
CREATE TABLE testTable STORED AS PARQUET AS
SELECT
field1
FROM table1
*field1 is SMALLINT. If table1 is in text format all it's ok, but if table1
is in parquet format, spark returns the following error*:
15/04/
Hi Ted.
Spark 1.2.0 an Hive 0.13.1
Regards.
Miguel Angel.
On Tue, Mar 31, 2015 at 10:37 AM, Ted Yu wrote:
> Which Spark and Hive release are you using ?
>
> Thanks
>
>
>
> > On Mar 27, 2015, at 2:45 AM, Masf wrote:
> >
> > Hi.
> >
> > In HiveC
y/limits.conf set the next values:
>
> Have you done the above modification on all the machines in your Spark
> cluster ?
>
> If you use Ubuntu, be sure that the /etc/pam.d/common-session file
> contains the following line:
>
> session required pam_limits.so
>
>
> On M
the machines to get the ulimit effect (or
> relogin). What operation are you doing? Are you doing too many
> repartitions?
>
> Thanks
> Best Regards
>
> On Mon, Mar 30, 2015 at 4:52 PM, Masf wrote:
>
>> Hi
>>
>> I have a problem with temp data in Spark.
Hi
I have a problem with temp data in Spark. I have fixed
spark.shuffle.manager to "SORT". In /etc/secucity/limits.conf set the next
values:
* softnofile 100
* hardnofile 100
In spark-env.sh set ulimit -n 100
I've restarted the spark service and it
Hi.
In HiveContext, when I put this statement "DROP TABLE IF EXISTS TestTable"
If TestTable doesn't exist, spark returns an error:
ERROR Hive: NoSuchObjectException(message:default.TestTable table not found)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_
ow function support in 1.4.0. But it's not a promise
> yet.
>
> Cheng
>
> On 3/26/15 7:27 PM, Arush Kharbanda wrote:
>
> Its not yet implemented.
>
> https://issues.apache.org/jira/browse/SPARK-1442
>
> On Thu, Mar 26, 2015 at 4:39 PM, Masf wrote:
>
>
Hi.
Are the Windowing and Analytics functions supported in Spark SQL (with
HiveContext or not)? For example in Hive is supported
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
Some tutorial or documentation where I can see all features supported by
Spark SQ
Hi
Spark 1.2.1 uses Scala 2.10. Because of this, your program fails with scala
2.11
Regards
On Thu, Mar 19, 2015 at 8:17 PM, Vijayasarathy Kannan wrote:
> My current simple.sbt is
>
> name := "SparkEpiFast"
>
> version := "1.0"
>
> scalaVersion := "2.11.4"
>
> libraryDependencies += "org.apach
it).
> Can you try HiveContext for now?
>
> On Fri, Mar 13, 2015 at 4:48 AM, Masf wrote:
>
>> Hi.
>>
>> I have a query in Spark SQL and I can not covert a value to BIGINT:
>> CAST(column AS BIGINT) or
>> CAST(0 AS BIGINT)
>>
>> The output is:
Hi.
I'm running Spark 1.2.0. I have HiveContext and I execute the following
query:
select sum(field1 / 100) from table1 group by field2;
field1 in hive metastore is a smallint. The schema detected by hivecontext
is a int32:
fileSchema: message schema {
optional int32 field1;
...
t; means here.
>
> On Mon, Mar 16, 2015 at 11:11 AM, Masf wrote:
> > Hi all.
> >
> > When I specify the number of partitions and save this RDD in parquet
> format,
> > my app fail. For example
> >
> > selectTest.coalesce(28).saveAsParquetFile("hdfs
Hi all.
When I specify the number of partitions and save this RDD in parquet
format, my app fail. For example
selectTest.coalesce(28).saveAsParquetFile("hdfs://vm-clusterOutput")
However, it works well if I store data in text
selectTest.coalesce(28).saveAsTextFile("hdfs://vm-clusterOutput")
M
Hi.
I have a query in Spark SQL and I can not covert a value to BIGINT:
CAST(column AS BIGINT) or
CAST(0 AS BIGINT)
The output is:
Exception in thread "main" java.lang.RuntimeException: [34.62] failure:
``DECIMAL'' expected but identifier BIGINT found
Thanks!!
Regards.
Miguel Ángel
read recursively.
>
>
> You could give it a try
> https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz
>
> Thanks
> Best Regards
>
> On Wed, Mar 11, 2015 at 9:45 PM, Masf wrote:
>
>> Hi all
>>
>> Is it possible to read recursively folders to read parquet files?
>>
>>
>> Thanks.
>>
>> --
>>
>>
>> Saludos.
>> Miguel Ángel
>>
>
>
--
Saludos.
Miguel Ángel
Hi all
Is it possible to read recursively folders to read parquet files?
Thanks.
--
Saludos.
Miguel Ángel
38 matches
Mail list logo