Hi Piotr, Piotr Sroka <pio...@cadence.com> wrote on Thu, 6 Jun 2019 16:19:51 +0100:
> Hi Miquel > > > The 05/12/2019 14:24, Miquel Raynal wrote: > >EXTERNAL MAIL > > > > > >EXTERNAL MAIL > > > > > >Hi Piotr, > > > >Sorry for de delay. > > > >Piotr Sroka <pio...@cadence.com> wrote on Thu, 21 Mar 2019 09:33:58 > >+0000: > > > >> The 03/05/2019 19:09, Miquel Raynal wrote: > >> >EXTERNAL MAIL > >> > > >> > > >> >Hi Piotr, > >> > > >> >Piotr Sroka <pio...@cadence.com> wrote on Tue, 19 Feb 2019 16:18:23 > >> >+0000: > >> > > >> >> This patch adds driver for Cadence HPNFC NAND controller. > >> >> > >> >> Signed-off-by: Piotr Sroka <pio...@cadence.com> > >> >> --- > >> >> Changes for v2: > >> >> - create one universal wait function for all events instead of one > >> >> function per event. > >> >> - split one big function executing nand operations to separate > >> >> functions one per each type of operation. > >> >> - add erase atomic operation to nand operation parser > >> >> - remove unnecessary includes. > >> >> - remove unused register defines > >> >> - add support for multiple nand chips > >> >> - remove all code using legacy functions > >> >> - remove chip dependents parameters from dts bindings, they were > >> >> attached to the SoC specific compatible at the driver level > >> >> - simplify interrupt handling > >> >> - simplify timing calculations > >> >> - fix calculation of maximum supported cs signals > >> >> - simplify ecc size calculation > >> >> - remove header file and put whole code to one c file > >> >> --- > >> >> drivers/mtd/nand/raw/Kconfig | 8 + > >> >> drivers/mtd/nand/raw/Makefile | 1 + > >> >> drivers/mtd/nand/raw/cadence-nand-controller.c | 3288 > >> >> ++++++++++++++++++++++++ > >> > > >> >This driver is way too massive, I am pretty sure it can shrink a > >> >little bit more. > >> >[...] > >> > > >> I will try to make it shorer but it will be difucult to achive. It is > >> because - there are a lot of calculation needed for PHY - ECC are > >> interleaved with data (like on marvell-nand or gpmi-nand). > >> Therefore: + RAW mode is complicated + protecting BBM increases > >> number of lines of source code > >> - need to support two DMA engines internal and external (slave) We will > >> see on next patch version what is the result. That page layout looks: > >> > > > >Maybe you don't need to support both internal and external DMA? > > > >I am pretty sure there are rooms for size reduction. > > I describe how it works in general and maybe you help me chose better > solution. > > HW controller can work in 3 modes. PIO - can work in master or slave DMA > CDMA - needs Master DMA for accessing command descriptors. > Generic mode - can use only Slave DMA. > > Generic mode is neccessery to implement functions other than page > program, page read, block erase. So it is essential. I cannot avoid > to use Slave DMA. This deserves a nice comment at the top. > > I change CDMA mode to PIO mode. Then I can use only slave DMA. But CDMA has a > feature which is not present in PIO mode. The feature > gives possibility to point DMA engine two buffers to transfer. It is > used to point data buffer and oob bufer. In PIO mode I would need to > copy data buffer and oob buffer to third buffer. Next transfer data from > third buffer. > In that solution we need to copy all data by CPU and then use DMA. > Controller needs always transfer oob because of HW ECC restrictions. Such > change will decrease performce for all data transfers. > I think performance is more important in that case. What is your > opinion? [...] Indeed > >> > > >> >What is this for? > >> Fucntions enables/disables hardware detection of erased data > >> pages. > > > > >Ok, the name is not very explicit , maybe you could tell this with a > >comment. > > > Ok. > > >> >> + > >> >> +/* hardware initialization */ > >> >> +static int cadence_nand_hw_init(struct cdns_nand_ctrl *cdns_ctrl) > >> >> +{ > >> >> + int status = 0; > >> >> + u32 reg; > >> >> + > >> >> + status = cadence_nand_wait_for_value(cdns_ctrl, CTRL_STATUS, > >> >> + 1000000, > >> >> + CTRL_STATUS_INIT_COMP, > >> >> false); > >> >> + if (status) > >> >> + return status; > >> >> + > >> >> + reg = readl(cdns_ctrl->reg + CTRL_VERSION); > >> >> + > >> >> + dev_info(cdns_ctrl->dev, > >> >> + "%s: cadence nand controller version reg %x\n", > >> >> + __func__, reg); > >> >> + > >> >> + /* disable cache and multiplane */ > >> >> + writel(0, cdns_ctrl->reg + MULTIPLANE_CFG); > >> >> + writel(0, cdns_ctrl->reg + CACHE_CFG); > >> >> + > >> >> + /* clear all interrupts */ > >> >> + writel(0xFFFFFFFF, cdns_ctrl->reg + INTR_STATUS); > >> >> + > >> >> + cadence_nand_get_caps(cdns_ctrl); > >> >> + cadence_nand_read_bch_cfg(cdns_ctrl); > >> > > >> >No, you cannot rely on the bootloader's configuration. And I suppose > >> >this is what the first call to read_bch_cfg does? > >> I do not realy on boot loader. Just read NAND flash > >> controller configuration from read only capabilities registers. > > > >Ok, if these are RO registers, it's fine. But maybe don't call the > >function "read bch config" which suggest that this is something you can > >change. > > > ok. > > >> > >> > >> >> + > >> >> +#define TT_OOB_AREA 1 > >> >> +#define TT_MAIN_OOB_AREAS 2 > >> >> +#define TT_RAW_PAGE 3 > >> >> +#define TT_BBM 4 > >> >> +#define TT_MAIN_OOB_AREA_EXT 5 > >> >> + > >> >> +/* prepare size of data to transfer */ > >> >> +static int > >> >> +cadence_nand_prepare_data_size(struct nand_chip *chip, > >> >> + int transfer_type) > >> >> +{ > >> >> + struct cdns_nand_ctrl *cdns_ctrl = > >> >> to_cdns_nand_ctrl(chip->controller); > >> >> + struct cdns_nand_chip *cdns_chip = to_cdns_nand_chip(chip); > >> >> + u32 sec_size = 0, last_sec_size, offset = 0, sec_cnt = 1; > >> >> + u32 ecc_size = chip->ecc.bytes; > >> >> + u32 data_ctrl_size = 0; > >> >> + u32 reg = 0; > >> >> + > >> >> + if (cdns_ctrl->curr_trans_type == transfer_type) > >> >> + return 0; > >> >> + > >> >> + switch (transfer_type) { > >> > > >> >Please turn the controller driver as dumb as possible. You should not > >> >care which part of the OOB area you are accessing. > >> It is a bit confusing for me how accessing OOB should be implemented. > >> I know that read_oob function is called to check BBM value when BBT is > >> initialized. It is also a bit confusing for me why the raw version is > >> not used for that purpose. In current implementation if you write oob > >> by write_page function next > >> read oob by read_oob function then data will be the same. > >> If I implement dump functions read_oob and write_oob then > >> 1. ECC must be disabled for these functions > >> 2. oob data accessing by write_page/read_page will be different > >> (different offsets) that the data accessing by read_oob/write_oob > >> functions > > > >No, I fear this is not acceptable. > > > >> If above described "functionalities" are acceptable I will change > >> implementation of write_oob and read_oob functions. > >> The write_page and read_page must be implemented in that way as it is now. > >> Let me know which solution is preffered. > > > >If this is too complicated to just write the oob, why not fallback on > >read/write_page (with oob_required and a dummy data buffer)? > > I considered it. Actually, it would simplify the code. The disadvantage > of using the same function is that the each write/read oob will cause full > page > read/write. In current version only last sector is read/write together > with oob. This will affect the performance degradation of oob write/read > function. So I do not know what is more important. 1. OOB functions > performance, > 2. simplier code. Honestly I don't think slowing down a bit OOB access is critical as, with recent software layers like UBI/UBIFS we do not access OOB only that much. So here I would choose 2. >> > >> >> + case TT_OOB_AREA: > >> >> + offset = cdns_chip->main_size - cdns_chip->sector_size; > >> >> + ecc_size = ecc_size * (offset / cdns_chip->sector_size); > >> >> + offset = offset + ecc_size; > >> >> + sec_cnt = 1; > >> >> + last_sec_size = cdns_chip->sector_size > >> >> + + cdns_chip->avail_oob_size; > >> >> + break; > >> >> + case TT_MAIN_OOB_AREA_EXT: > >> >> + sec_cnt = cdns_chip->sector_count; > >> >> + last_sec_size = cdns_chip->sector_size; > >> >> + sec_size = cdns_chip->sector_size; > >> >> + data_ctrl_size = cdns_chip->avail_oob_size; > >> >> + break; > >> >> + case TT_MAIN_OOB_AREAS: > >> >> + sec_cnt = cdns_chip->sector_count; > >> >> + last_sec_size = cdns_chip->sector_size > >> >> + + cdns_chip->avail_oob_size; > >> >> + sec_size = cdns_chip->sector_size; > >> >> + break; > >> >> + case TT_RAW_PAGE: > >> >> + last_sec_size = cdns_chip->main_size + > >> >> cdns_chip->oob_size; > >> >> + break; > >> >> + case TT_BBM: > >> >> + offset = cdns_chip->main_size + cdns_chip->bbm_offs; > >> >> + last_sec_size = 8; > >> >> + break; > >> >> + default: > >> >> + dev_err(cdns_ctrl->dev, "Data size preparation > >> >> failed\n"); > >> >> + return -EINVAL; > >> >> + } > >> >> + > >> >> + reg = 0; > >> >> + reg |= FIELD_PREP(TRAN_CFG_0_OFFSET, offset); > >> >> + reg |= FIELD_PREP(TRAN_CFG_0_SEC_CNT, sec_cnt); > >> >> + writel(reg, cdns_ctrl->reg + TRAN_CFG_0); > >> >> + > >> >> + reg = 0; > >> >> + reg |= FIELD_PREP(TRAN_CFG_1_LAST_SEC_SIZE, last_sec_size); > >> >> + reg |= FIELD_PREP(TRAN_CFG_1_SECTOR_SIZE, sec_size); > >> >> + writel(reg, cdns_ctrl->reg + TRAN_CFG_1); > >> >> + > >> >> + reg = readl(cdns_ctrl->reg + CONTROL_DATA_CTRL); > >> >> + reg &= ~CONTROL_DATA_CTRL_SIZE; > >> >> + reg |= FIELD_PREP(CONTROL_DATA_CTRL_SIZE, data_ctrl_size); > >> >> + writel(reg, cdns_ctrl->reg + CONTROL_DATA_CTRL); > >> >> + > >> >> + cdns_ctrl->curr_trans_type = transfer_type; > >> >> + > >> >> + return 0; > >> >> +} > >> >> + > [...] >> >> + > >> [...] >> + /* > >> >> + * the idea of those calculation is to get the optimum value > >> >> + * for tRP and tRH timings if it is NOT possible to sample data > >> >> + * with optimal tRP/tRH settings the parameters will be extended > >> >> + */ > >> >> + if (sdr->tRC_min <= clk_period && > >> >> + sdr->tRP_min <= (clk_period / 2) && > >> >> + sdr->tREH_min <= (clk_period / 2)) { > >> > > >> >Will this situation really happen? > >> I think yes for follwing values trc_min 20000 ps > >> trp_min 10000 ps > >> treh_min 7000 ps > >> clk_period 20000 ps > > > >Ok, you may add a comment stating that this may be the case in EDO mode > >5. > I did not anwer clearly last time. It was just an example. The result of that > "if" depends on NAND flash device timing mode and NAND flash controller > clock. Minumum value of clk is 20MHz (50ns). So it may be a case for > Asynchronous Mode 1 if > NAND flash controller clock is 20MHz. I will add this info in comment. I am not sure to understand correctly what you mean. Please try to write a nice comment and we'll see. > >> [...] > >> >> + } > >> >> + > >> >> + if (cdns_ctrl->caps2.is_phy_type_dll) { > >> > > >> >Is the else part allowed? > Register accessed in this block does not exists if is_phy_type_dll is 0. So > they are preveted to be accessed. the else is not needed. > >> > > >> following register does not exist if caps2.is_phy_type_dll is 0 >> + > >> u32 tpre_cnt = calc_cycl(tpre, clk_period); > >> >> + u32 tcdqss_cnt = calc_cycl(tcdqss + if_skew, > >> >> clk_period); > >> >> + u32 tpsth_cnt = calc_cycl(tpsth + if_skew, clk_period); > >> >> + > >> >> + u32 trpst_cnt = calc_cycl(trpst + if_skew, clk_period) > >> >> + 1; > >> >> + u32 twpst_cnt = calc_cycl(twpst + if_skew, clk_period) > >> >> + 1; > >> >> + u32 tcres_cnt = calc_cycl(tcres + if_skew, clk_period) > >> >> + 1; > >> >> + u32 tcdqsh_cnt = calc_cycl(tcdqsh + if_skew, > >> >> clk_period) + 5; > >> >> + > >> >> + tcr_cnt = calc_cycl(tcr + if_skew, clk_period); > >> >> + /* > >> >> + * skew not included because this timing defines > >> >> duration of > >> >> + * RE or DQS before data transfer > >> >> + */ > >> >> + tpsth_cnt = tpsth_cnt + 1; > >> >> + reg = FIELD_PREP(TOGGLE_TIMINGS0_TPSTH, tpsth_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TCDQSS, tcdqss_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TPRE, tpre_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TCR, tcr_cnt); > >> >> + t->toggle_timings_0 = reg; > >> >> + dev_dbg(cdns_ctrl->dev, "TOGGLE_TIMINGS_0_SDR\t%x\n", > >> >> reg); > >> >> + > >> >> + //toggle_timings_1 - tRPST,tWPST > >> >> + reg = FIELD_PREP(TOGGLE_TIMINGS1_TCDQSH, tcdqsh_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TCRES, tcres_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TRPST, trpst_cnt); > >> >> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TWPST, twpst_cnt); > >> >> + t->toggle_timings_1 = reg; > >> >> + dev_dbg(cdns_ctrl->dev, "TOGGLE_TIMINGS_1_SDR\t%x\n", > >> >> reg); > >> >> + } > >> [...] > > >> >This function is so complicated !!! How can this even work? Really, it > >> >is hard to get into the code and follow, I am sure you can do > >> >something. > >> Yes it is complicated but works, I will try to simplify it... [...] > > > >Yes please! > > > >> >> + "CS %d already assigned\n", cs); > >> >> + return -EINVAL; > >> >> + } > >> >> + > >> >> + cdns_chip->cs[i] = cs; > >> >> + } > >> >> + > >> >> + chip = &cdns_chip->chip; > >> >> + chip->controller = &cdns_ctrl->controller; > >> >> + nand_set_flash_node(chip, np); > >> >> + > >> >> + mtd = nand_to_mtd(chip); > >> >> + mtd->dev.parent = cdns_ctrl->dev; > >> >> + > >> >> + /* > >> >> + * Default to HW ECC engine mode. If the nand-ecc-mode property > >> >> is given > >> >> + * in the DT node, this entry will be overwritten in > >> >> nand_scan_ident(). > >> >> + */ > >> >> + chip->ecc.mode = NAND_ECC_HW; > >> >> + > >> >> + /* > >> >> + * Save a reference value for timing registers before > >> >> + * ->setup_data_interface() is called. > >> >> + */ > >> >> + cadence_nand_get_timings(cdns_ctrl, &cdns_chip->timings); > >> > > >> >You cannot rely on the Bootloader's configuration. This driver should > >> >derive it. > >> I do not relay on the Bootloader's configuration in any part. I just > >> init timings structure base on current values of registers to do not > >> have rubish in timing structure. Values will be calculated by driver when > >> setup_data_interface is called. In case set_timings is called before > >> setup_data_interface > > > >Does this really happens? I am pretty sure it is taken care of by the > >core. I don't think you should rely on what's in the registers at boot > >time. > Ok I will check it one more time and remove if not needed. > > > > > > >> then we write the same valus to timing registers > >> which are preset in registres. To be shorter timing registers will stay > >> unchanged. >> + ret = nand_scan(chip, cdns_chip->nsels); > >> >> + if (ret) { > >> >> + dev_err(cdns_ctrl->dev, "could not scan the nand > >> >> chip\n"); > >> >> + return ret; > >> >> + } > >> >> + > >> >> + ret = mtd_device_register(mtd, NULL, 0); > >> >> + if (ret) { > >> >> + dev_err(cdns_ctrl->dev, > >> >> + "failed to register mtd device: %d\n", ret); > >> >> + nand_release(chip); > >> > > >> >I think you should call nand_cleanup instead of nand_release here has > >> >the mtd device is not registered yet. > ok > > >> >> + return ret; > >> >> + } > >> >> + > >> >> + list_add_tail(&cdns_chip->node, &cdns_ctrl->chips); > >> >> + > >> >> + return 0; > >> >> +} > >> >> + > >> >> +static int cadence_nand_chips_init(struct cdns_nand_ctrl *cdns_ctrl) > >> >> +{ > >> >> + struct device_node *np = cdns_ctrl->dev->of_node; > >> >> + struct device_node *nand_np; > >> >> + int max_cs = cdns_ctrl->caps2.max_banks; > >> >> + int nchips; > >> >> + int ret; > >> >> + > >> >> + nchips = of_get_child_count(np); > >> >> + > >> >> + if (nchips > max_cs) { > >> >> + dev_err(cdns_ctrl->dev, > >> >> + "too many NAND chips: %d (max = %d CS)\n", > >> >> + nchips, max_cs); > >> >> + return -EINVAL; > >> >> + } > >> >> + > >> >> + for_each_child_of_node(np, nand_np) { > >> >> + ret = cadence_nand_chip_init(cdns_ctrl, nand_np); > >> >> + if (ret) { > >> >> + of_node_put(nand_np); > >> >> + return ret; > >> >> + } > >> > > >> >If nand_chip_init() fails on another chip than the first one, there is > >> >some garbage collection to do. > ok > > >> >> + } > >> >> + > >> >> + return 0; > >> >> +} > >> >> + > >> >> +static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) > >> >> +{ > >> >> + dma_cap_mask_t mask; > >> >> + int ret = 0; > >> >> + > >> >> + cdns_ctrl->cdma_desc = dma_alloc_coherent(cdns_ctrl->dev, > >> >> + > >> >> sizeof(*cdns_ctrl->cdma_desc), > >> >> + > >> >> &cdns_ctrl->dma_cdma_desc, > >> >> + GFP_KERNEL); > >> >> + if (!cdns_ctrl->dma_cdma_desc) > >> >> + return -ENOMEM; > >> >> + > >> >> + cdns_ctrl->buf_size = 16 * 1024; > >> > > >> >s/1024/SZ_1K/ > >> > > >> >> + cdns_ctrl->buf = kmalloc(cdns_ctrl->buf_size, GFP_KERNEL); > >> > > >> >If you use kmalloc here then this buffer will always be DMA-able, > >> >right? > >> Right I have seen such solution in another driver. > >> > >> > >> Thanks for revieving this patch. Please answer on my question how write_oob > >> and read_oob functions should be implemented. > >> > >> > > >> > > >> >Thanks, > >> >Miquèl > >> > >> Thanks > >> Piotr Sroka > > > >Thanks, > >Miquèl > > Thanks > Piotr Thanks, Miquèl