I think this question was asked just a week ago? same company and setup. https://mail-archives.apache.org/mod_mbox/spark-user/202104.mbox/%3CLNXP123MB2604758548BE38E8D3F369EC8A7B9%40LNXP123MB2604.GBRP123.PROD.OUTLOOK.COM%3E
On Wed, Apr 7, 2021 at 11:17 AM SRITHALAM, ANUPAMA (Risk Value Stream) <[email protected]> wrote: > Classification: Limited > > Hi Team, > > > > We are trying to use Gradient Boosting Classification algorithm and in > Python we tried using Sklearn library and in Pyspark we are using ML > library. > > > > We have around 45k dataset which is used for training and that dataset is > taking around 3 to 4 hours in python but in Pyspark it is taking more than > 18 hours for the same hyper parameters used between Python and Pyspark. > > > > We tried Pyspark by repartitioning the dataframe and can see a little > improvement in performance but still we are not able to get timings near to > Python. > > > > We have live run which need to evaluation predictions for 40million plus > data and data resides in Hadoop. So it is difficult to get that huge amount > to data to different system and convert to Pandas dataframe and run against > Python. > > > > So we are trying to train the same model against Pyspark so, that I can do > the evaluation against trained model in Pyspark but, here the concern that > we have is the time taken for training is very high and we want to check > what will be the general approach followed in these kind of scenarios. > > > > > > Thanks, > > Anupama. > > Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. > Registered in Scotland no. SC95000. Telephone: 0131 225 4555. > > Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. > Registered in England and Wales no. 2065. Telephone 0207626 1500. > > Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. > Registered in Scotland no. SC327000. Telephone: 03457 801 801. > > Lloyds Bank Corporate Markets plc. Registered office: 25 Gresham Street, > London EC2V 7HN. Registered in England and Wales no. 10399850. > > Scottish Widows Schroder Personal Wealth Limited. Registered Office: 25 > Gresham Street, London EC2V 7HN. Registered in England and Wales no. > 11722983. > > Lloyds Bank plc, Bank of Scotland plc and Lloyds Bank Corporate Markets > plc are authorised by the Prudential Regulation Authority and regulated by > the Financial Conduct Authority and Prudential Regulation Authority. > > Scottish Widows Schroder Personal Wealth Limited is authorised and > regulated by the Financial Conduct Authority. > > Lloyds Bank Corporate Markets Wertpapierhandelsbank GmbH is a wholly-owned > subsidiary of Lloyds Bank Corporate Markets plc. Lloyds Bank Corporate > Markets Wertpapierhandelsbank GmbH has its registered office at > Thurn-und-Taxis Platz 6, 60313 Frankfurt, Germany. The company is > registered with the Amtsgericht Frankfurt am Main, HRB 111650. Lloyds Bank > Corporate Markets Wertpapierhandelsbank GmbH is supervised by the > Bundesanstalt für Finanzdienstleistungsaufsicht. > > Halifax is a division of Bank of Scotland plc. > > HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in > Scotland no. SC218813. > > This e-mail (including any attachments) is private and confidential and > may contain privileged material. If you have received this e-mail in error, > please notify the sender and delete it (including any attachments) > immediately. You must not copy, distribute, disclose or use any of the > information in it or any attachments. Telephone calls may be monitored or > recorded. >
