On Wednesday, December 16, 2015 at 8:57:30 PM UTC-5, Josef Pktd wrote: > On Wednesday, December 16, 2015 at 9:50:35 AM UTC-5, Robert wrote: > > On Wednesday, December 16, 2015 at 6:34:21 AM UTC-5, Mark Lawrence wrote: > > > On 16/12/2015 10:44, Robert wrote: > > > > Hi, > > > > > > > > When I run the following code, there is an error: > > > > > > > > ValueError: For numerical factors, num_columns must be an int > > > > > > > > > > > > ================ > > > > import numpy as np > > > > import pandas as pd > > > > from patsy import dmatrices > > > > from sklearn.linear_model import LogisticRegression > > > > > > > > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25, > > > > 3.5,4.0,4.25,4.5,4.75,5.0,5.5] > > > > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1] > > > > > > > > zipped = list(zip(X,y)) > > > > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f']) > > > > > > > > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe") > > > > ======================= > > > > > > > > I have check 'df' is this type: > > > > ============= > > > > type(df) > > > > Out[25]: pandas.core.frame.DataFrame > > > > ============= > > > > > > > > I cannot figure out where the problem is. Can you help me? > > > > Thanks. > > > > > > > > Error message: > > > > .......... > > > > > > > > > > > > --------------------------------------------------------------------------- > > > > ValueError Traceback (most recent call > > > > last) > > > > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>() > > > > 17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f']) > > > > 18 > > > > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, > > > > return_type="dataframe") > > > > 20 > > > > 21 y = np.ravel(y) > > > > > > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc > > > > in dmatrices(formula_like, data, eval_env, NA_action, return_type) > > > > 295 eval_env = EvalEnvironment.capture(eval_env, reference=1) > > > > 296 (lhs, rhs) = _do_highlevel_design(formula_like, data, > > > > eval_env, > > > > --> 297 NA_action, return_type) > > > > 298 if lhs.shape[1] == 0: > > > > 299 raise PatsyError("model is missing required outcome > > > > variables") > > > > > > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc > > > > in _do_highlevel_design(formula_like, data, eval_env, NA_action, > > > > return_type) > > > > 150 return iter([data]) > > > > 151 design_infos = _try_incr_builders(formula_like, > > > > data_iter_maker, eval_env, > > > > --> 152 NA_action) > > > > 153 if design_infos is not None: > > > > 154 return build_design_matrices(design_infos, data, > > > > > > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc > > > > in _try_incr_builders(formula_like, data_iter_maker, eval_env, > > > > NA_action) > > > > 55 data_iter_maker, > > > > 56 eval_env, > > > > ---> 57 NA_action) > > > > 58 else: > > > > 59 return None > > > > > > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc > > > > in design_matrix_builders(termlists, data_iter_maker, eval_env, > > > > NA_action) > > > > 704 factor_states[factor], > > > > 705 > > > > num_columns=num_column_counts[factor], > > > > --> 706 categories=None) > > > > 707 else: > > > > 708 assert factor in cat_levels_contrasts > > > > > > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc > > > > in __init__(self, factor, type, state, num_columns, categories) > > > > 86 if self.type == "numerical": > > > > 87 if not isinstance(num_columns, int): > > > > ---> 88 raise ValueError("For numerical factors, > > > > num_columns " > > > > 89 "must be an int") > > > > 90 if categories is not None: > > > > > > > > ValueError: For numerical factors, num_columns must be an int > > > > > > > > > > Slap the ValueError into a search engine and the first hit is > > > https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q > > This was fixed in patsy 0.4.1 as discussed in this statsmodels thread. > You need to upgrade patsy from 0.4.0. > > AFAIR, the type checking was too strict and broke with recent numpy versions. > > Josef > > > > > > > > -- > > > My fellow Pythonistas, ask not what our language can do for you, ask > > > what you can do for our language. > > > > > > Mark Lawrence > > > > Hi, > > I don't see a solution to my problem. I find the following demo code from > > > > https://patsy.readthedocs.org/en/v0.1.0/API-reference.html#patsy.dmatrix > > > > It doesn't work either on the Canopy. Does it work on your computer? > > Thanks, > > > > ///////////// > > demo_data("a", "x", nlevels=3) > > Out[134]: > > {'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3'], > > 'x': array([ 1.76405235, 0.40015721, 0.97873798, 2.2408932 , > > 1.86755799, > > -0.97727788])} > > > > mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))
Thanks. It is right. -- https://mail.python.org/mailman/listinfo/python-list