,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
>> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com wrote:
>> More of jars and files and app
com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com wrote:
>> Hi All,
>>
>> I have a directory which has 12 files. I want to rea
ow me at https://twitter.com/jaceklaskowski
>>
>>
>>> On Sat, Jul 23, 2016 at 3:11 PM, Mail.com wrote:
>>> Hi All,
>>>
>>> Where should we us spark context stop vs close. Should we stop the context
>>> first and then close.
>>
Hi All,
I have a directory which has 12 files. I want to read the entire file so I am
reading it as wholeTextFiles(dirpath, numPartitions).
I run spark-submit as --num-executors 12 --executor-cores 1
and numPartitions 12.
However, when I run the job I see that the stage which reads the direct
Hi All,
Where should we us spark context stop vs close. Should we stop the context
first and then close.
Are general guidelines around this. When I stop and later try to close I get
RPC already closed error.
Thanks,
Pradeep
--
Hbase Spark module will be available with Hbase 2.0. Is that out yet?
> On Jul 22, 2016, at 8:50 PM, Def_Os wrote:
>
> So it appears it should be possible to use HBase's new hbase-spark module, if
> you follow this pattern:
> https://hbase.apache.org/book.html#_sparksql_dataframes
>
> Unfortuna
Hi All,
Can someone please confirm if streaming direct approach for reading Kafka is
still experimental or can it be used for production use.
I see the documentation and talk from TD suggesting the advantages of the
approach but docs state it is an "experimental" feature.
Please suggest
Than
Hi All,
Can you please advise best practices to running streaming jobs in Production
that reads from Kafka.
How do we trigger them - through a start script and best ways to monitor the
application is running and send alert when down etc.
Thanks,
Pradeep
--
fka 0.10
>
>> On Wed, May 25, 2016 at 9:41 PM, Mail.com wrote:
>> Hi All,
>>
>> I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran
>> spark streaming in debug mode, but do not see any log saying it connected to
>> Kafka or topic etc
Hi All,
I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran spark
streaming in debug mode, but do not see any log saying it connected to Kafka or
topic etc. How could I enable that.
My spark streaming job runs but no messages are fetched from the RDD. Please
suggest.
Th
Yes.
Sent from my iPhone
> On May 20, 2016, at 10:11 AM, Sahil Sareen wrote:
>
> I'm not sure if this happens on small files or big ones as I have a mix of
> them always.
> Did you see this only for big files?
>
>> On Fri, May 20, 2016 at 7:36 PM, Mail.com wrot
Hi Sahil,
I have seen this with high GC time. Do you ever get this error with small
volume files
Pradeep
> On May 20, 2016, at 9:32 AM, Sahil Sareen wrote:
>
> Hey all
>
> I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my
> application performance due to these erro
opic name, KafkaUtils doesn't
> fetch any messages. So, check you have specified the topic name correctly.
>
> ~Muthu
> ____
> From: Mail.com [pradeep.mi...@mail.com]
> Sent: Monday, May 16, 2016 9:33 PM
> To: Ramaswamy, Muthuraman
Hi Yogesh,
Can you try map operation and get what you need. Whatever parser you are using.
You could also look at spark-XML package .
Thanks,
Pradeep
> On May 19, 2016, at 4:39 AM, Yogesh Vyas wrote:
>
> Hi,
> I had xml files which I am reading through textFileStream, and then
> filtering out
Adding back users.
> On May 18, 2016, at 11:49 AM, Mail.com wrote:
>
> Hi Uladzimir,
>
> I run is as below.
>
> Spark-submit --class com.test --num-executors 4 --executor-cores 5 --queue
> Dev --master yarn-client --driver-memory 512M --executor-memory 512M test.ja
Hi Muthu,
Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for simple
string messages.
Console producer and consumer work fine. But spark always reruns empty RDD. I
am using Receiver based Approach.
Thanks,
Pradeep
> On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman
> w
rdpress.com
>
>
>> On 15 May 2016 at 13:19, Mail.com wrote:
>> Hi ,
>>
>> I have seen multiple videos on spark tuning which shows how to determine #
>> cores, #executors and memory size of the job.
>>
>> In all that I have seen, it seems each job h
Hi ,
I have seen multiple videos on spark tuning which shows how to determine #
cores, #executors and memory size of the job.
In all that I have seen, it seems each job has to be given the max resources
allowed in the cluster.
How do we factor in input size as well? I am processing a 1gb compr
Hi All,
I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled
cluster. HDP 2.3.2
Is there any document I can refer to.
Thanks,
Pradeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For
Hi Arun,
Could you try using Stax or JaxB.
Thanks,
Pradeep
> On May 12, 2016, at 8:35 PM, Hyukjin Kwon wrote:
>
> Hi Arunkumar,
>
>
> I guess your records are self-closing ones.
>
> There is an issue open here, https://github.com/databricks/spark-xml/issues/92
>
> This is about XmlInputFor
use coalesce(1) (the spelling is wrong) and after that
> do the partitions.
>
> Regards,
> Gourav
>
>> On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote:
>> Hi,
>>
>> I have to write tab delimited file and need to have one directory for each
>> unique value
Hi,
I have to write tab delimited file and need to have one directory for each
unique value of a column.
I tried using spark-csv with partitionBy and seems it is not supported. Is
there any other option available for doing this?
Regards,
Pradeep
Can you try once by creating your own schema file and using it to read the XML.
I had similar issue but got that resolved by custom schema and by specifying
each attribute in that.
Pradeep
> On May 1, 2016, at 9:45 AM, Hyukjin Kwon wrote:
>
> To be more clear,
>
> If you set the rowTag as "
is designed for that.
>
> - Harjit
>> On Apr 26, 2016, at 7:19 PM, Mail.com wrote:
>>
>>
>> Hi All,
>> I am reading entire directory of gz XML files with wholeTextFiles.
>>
>> I understand as it is gz and with wholeTextFiles the individual files are
Hi All,
I am reading entire directory of gz XML files with wholeTextFiles.
I understand as it is gz and with wholeTextFiles the individual files are not
splittable but why the entire directory is read by one executor, single task. I
have provided number of executors as number of files in that
> Hi
I have a dataframe and need to write to a tab separated file using spark 1.4
and Java.
Can some one please suggest.
Thanks,
Pradeep
I get an error with a message that state what is max number of cores allowed.
> On Apr 20, 2016, at 11:21 AM, Shushant Arora
> wrote:
>
> I am running a spark application on yarn cluster.
>
> say I have available vcors in cluster as 100.And I start spark application
> with --num-executors 20
You might look at using JaxB or Stax. If it is simple enough use data frames
auto generated scheme.
Pradeep
> On Apr 18, 2016, at 6:37 PM, Jinan Alhajjaj wrote:
>
> Thank you for your help.
> I would like to parse the XML file using Java not scala . Can you please
> provide me with exsample
28 matches
Mail list logo