from:"sujeet jog"

Re: Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog

once you're going to be asking them a > question > > On Wed, Mar 14, 2018 at 1:37 PM, sujeet jog wrote: > >> >> Input is a json request, which would be decoded in myJob() & processed >> further. >> >> Not sure what is wrong with below code,

Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog

Input is a json request, which would be decoded in myJob() & processed further. Not sure what is wrong with below code, it emits errors as unimplemented methods (runJob/validate), any pointers on this would be helpful, jobserver-0.8.0 object MyJobServer extends SparkSessionJob { type JobData

running Spark-JobServer in eclipse

2018-03-03 Thread sujeet jog

Is there a way to run Spark-JobServer in eclipse ?.. any pointers in this regard ?

read parallel processing spark-cassandra

2018-02-13 Thread sujeet jog

Folks, I have a time series table with each record being 350 columns. the primary key is ((date, bucket), objectid, timestamp) objective is to read 1 day worth of data, which comes to around 12k partitions, each partition has around 25MB of data, I see only 1 task active during the read operati

Spark Docker

2017-12-25 Thread sujeet jog

Folks, Can you share your experience of running spark under docker on a single local / standalone node. Anybody using it under production environments ?, we have a existing Docker Swarm deployment, and i want to run Spark in a seperate FAT VM hooked / controlled by docker swarm I know there is

running dockerized spark applications in DC/OS

2017-08-31 Thread sujeet jog

Folks, Does any body have production experience in running dockerized spark application on DC/OS, and can the spark cluster run other than spark stand alone mode ?.. What are the major differences between running spark with Mesos Cluster manager Vs running Spark as dockerized container under DC/

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Correction. On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog wrote: > , Below is the query, looks like from physical plan, the query is same as > that of cqlsh, > > val query = s"""(select * from model_data > where TimeStamp > \'$timeStamp+000

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

t#9048,GlobalThresholdMax#9049,GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054] PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID), EqualTo(MetricID,1)], ReadSchema: struct wrote: > Hi, > > Personally I would inspect how dates are managed. How does your spark code >

Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Hello, I have a table as below CREATE TABLE analytics_db.ml_forecast_tbl ( "MetricID" int, "TimeStamp" timestamp, "ResourceID" timeuuid "Value" double, PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID") ) select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" > '20

Re: JSON Arrays and Spark

2016-10-11 Thread sujeet jog

I generally use Play Framework Api's for comple json structures. https://www.playframework.com/documentation/2.5.x/ScalaJson#Json On Wed, Oct 12, 2016 at 11:34 AM, Kappaganthu, Sivaram (ES) < sivaram.kappagan...@adp.com> wrote: > Hi, > > > > Does this mean that handling any Json with kind of bel

Convert RDD to JSON Rdd and append more information

2016-09-20 Thread sujeet jog

Hi, I have a Rdd of n rows, i want to transform this to a Json RDD, and also add some more information , any idea how to accomplish this . ex : - i have rdd with n rows with data like below , , 16.9527493170273,20.1989561393151,15.7065424947394 17.9527493170273,21.1989561393151,15.70654249

Partition n keys into exacly n partitions

2016-09-12 Thread sujeet jog

Hi, Is there a way to partition set of data with n keys into exactly n partitions. For ex : - tuple of 1008 rows with key as x tuple of 1008 rows with key as y and so on total 10 keys ( x, y etc ) Total records = 10080 NumOfKeys = 10 i want to partition the 10080 elements into exactly 10 pa

Re: iterating over DataFrame Partitions sequentially

2016-09-10 Thread sujeet jog

Sep 9, 2016 at 11:45 AM, Jakob Odersky wrote: > > Hi Sujeet, > > > > going sequentially over all parallel, distributed data seems like a > > counter-productive thing to do. What are you trying to accomplish? > > > > regards, > > --Jakob > > > > On

iterating over DataFrame Partitions sequentially

2016-09-09 Thread sujeet jog

Hi, Is there a way to iterate over a DataFrame with n partitions sequentially, Thanks, Sujeet

Re: Dataframe write to DB , loosing primary key index & data types.

2016-08-24 Thread sujeet jog

There was a inherent bug in my code which did this, On Wed, Aug 24, 2016 at 8:07 PM, sujeet jog wrote: > Hi, > > I have a table with definition as below , when i write any records to this > table, the varchar(20 ) gets changes to text, and it also losses the > primary key index,

Dataframe write to DB , loosing primary key index & data types.

2016-08-24 Thread sujeet jog

Hi, I have a table with definition as below , when i write any records to this table, the varchar(20 ) gets changes to text, and it also losses the primary key index, any idea how to write data with spark SQL without loosing the primary key index & data types. ? MariaDB [analytics]> show columns

Re: call a mysql stored procedure from spark

2016-08-15 Thread sujeet jog

; disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 14 August 2016 at 17:42, Michael Armbrust >> wrote: >> >>> As described here >>> <http://spark.ap

call a mysql stored procedure from spark

2016-08-13 Thread sujeet jog

Hi, Is there a way to call a stored procedure using spark ? thanks, Sujeet

Re: update specifc rows to DB using sqlContext

2016-08-11 Thread sujeet jog

t; loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > >

Re: update specifc rows to DB using sqlContext

2016-08-11 Thread sujeet jog

disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 9 August 2016 at 13:39, sujeet jog wrote: > >> Hi, >> >> Is it possible to update certain columnr records in DB from sp

update specifc rows to DB using sqlContext

2016-08-09 Thread sujeet jog

Hi, Is it possible to update certain columnr records in DB from spark, for example i have 10 rows with 3 columns which are read from Spark SQL, i want to update specific column entries and write back to DB, but since RDD"s are immutable i believe this would be difficult, is there a workaroun

Re: how to run local[k] threads on a single core

2016-08-05 Thread sujeet jog

thread to CPU >> affinity. >> > On Aug 4, 2016, at 14:27, sujeet jog wrote: >> > >> > Is there a way we can run multiple tasks concurrently on a single core >> in local mode. >> > >> > for ex :- i have 5 partition ~ 5 tasks, and only a

how to run local[k] threads on a single core

2016-08-03 Thread sujeet jog

Is there a way we can run multiple tasks concurrently on a single core in local mode. for ex :- i have 5 partition ~ 5 tasks, and only a single core , i want these tasks to run concurrently, and specifiy them to use /run on a single core. The machine itself is say 4 core, but i want to utilize on

Re: Load selected rows with sqlContext in the dataframe

2016-07-22 Thread sujeet jog

Thanks Todd. On Thu, Jul 21, 2016 at 9:18 PM, Todd Nist wrote: > You can set the dbtable to this: > > .option("dbtable", "(select * from master_schema where 'TID' = '100_0')") > > HTH, > > Todd > > > On Thu, Jul 21, 2016 at

Load selected rows with sqlContext in the dataframe

2016-07-21 Thread sujeet jog

I have a table of size 5GB, and want to load selective rows into dataframe instead of loading the entire table in memory, For me memory is a constraint hence , and i would like to peridically load few set of rows and perform dataframe operations on it, , for the "dbtable" is there a way to perf

Re: Using R code as part of a Spark Application

2016-06-29 Thread sujeet jog

it used to? >>>> >>>> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh >>>> wrote: >>>> > There is some new SparkR functionality coming in Spark 2.0, such as >>>> > "dapply". You could use SparkR to load a Parquet file and then run >&

Re: Using R code as part of a Spark Application

2016-06-29 Thread sujeet jog

try Spark pipeRDD's , you can invoke the R script from pipe , push the stuff you want to do on the Rscript stdin, p On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau wrote: > Hello, > > > > I want to use R code as part of spark application (the same way I would do > with Scala/Python). I want to

Re: Spark jobs

2016-06-29 Thread sujeet jog

check if this helps, from multiprocessing import Process def training() : print ("Training Workflow") cmd = spark/bin/spark-submit ./ml.py & " os.system(cmd) w_training = Process(target = training) On Wed, Jun 29, 2016 at 6:28 PM, Joaquin Alzola wrote: > Hi, >

Re: Can we use existing R model in Spark

2016-05-30 Thread sujeet jog

Try to invoke a R script from Spark using rdd pipe method , get the work done & and receive the model back in RDD. for ex :- . rdd.pipe("") On Mon, May 30, 2016 at 3:57 PM, Sun Rui wrote: > Unfortunately no. Spark does not support loading external modes (for > examples, PMML) for now. > May

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog

cut answer to NOT to use local mode in prod. >> Others may have different opinions on this. >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog

; > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog

id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 28 May 2016 at 18:03, sujeet jog wrote: > >> Thanks Ted, >> >> Thanks Mich, yes i see that i can run two applications by submitting >> these, probably Driv

Re: local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog

this Job. If you start the next JVM then >>> assuming it is working, it will be using port 4041 and so forth. >>> >>> >>> In actual fact try the command "free" to see how much free memory you >>> have. >>> >>> >>> HTH >&g

local Vs Standalonecluster production deployment

2016-05-28 Thread sujeet jog

Hi, I have a question w.r.t production deployment mode of spark, I have 3 applications which i would like to run independently on a single machine, i need to run the drivers in the same machine. The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4 cores. For deployment in st

sparkApp on standalone/local mode with multithreading

2016-05-25 Thread sujeet jog

I had few questions w.r.t to Spark deployment & and way i want to use, It would be helpful if you can answer few. I plan to use Spark on a embedded switch, which has limited set of resources, like say 1 or 2 dedicated cores and 1.5GB of memory, want to model a network traffic with time series a

Re: Scala vs Python for Spark ecosystem

2016-04-19 Thread sujeet jog

It depends on the trade off's you wish to have, Python being a interpreted language, speed of execution will be lesser, but it being a very common language used across, people can jump in hands on quickly Scala programs run in java environment, so it's obvious you will get good execution speed,

Re: Aggregate subsequenty x row values together.

2016-03-28 Thread sujeet jog

e a bit more ? > > Since the row keys are not sorted in your example, there is a chance that > you get indeterministic results when you aggregate on groups of two > successive rows. > > Thanks > > On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog wrote: > >> Hi, >> &

Aggregate subsequenty x row values together.

2016-03-28 Thread sujeet jog

Hi, I have a RDD like this . [ 12, 45 ] [ 14, 50 ] [ 10, 35 ] [ 11, 50 ] i want to aggreate values of first two rows into 1 row and subsequenty the next two rows into another single row... i don't have a key to aggregate for using some of the aggregate pyspark functions, how to achieve it ?

Run External R script from Spark

2016-03-20 Thread sujeet jog

Hi, I have been working on a POC on some time series related stuff, i'm using python since i need spark streaming and sparkR is yet to have a spark streaming front end, couple of algorithms i want to use are not yet present in Spark-TS package, so I'm thinking of invoking a external R script for

39 matches

Mail list logo