On Fri, Jun 6, 2014 at 10:57 AM, Dan Williams <dan.j.willi...@intel.com> wrote: > On Fri, 2014-06-06 at 09:47 +0800, Ming Lei wrote: >> Hi Tejun, >> >> On Thu, Jun 5, 2014 at 9:41 PM, Tejun Heo <t...@kernel.org> wrote: >> > Hello, >> > >> > (cc'ing ahci_xgene folks) >> > >> > On Thu, Jun 05, 2014 at 09:24:04PM +0800, Ming Lei wrote: >> >> On Thu, Jun 5, 2014 at 9:12 PM, Dan Williams <dan.j.willi...@intel.com> >> >> wrote: >> >> > On Thu, Jun 5, 2014 at 1:53 AM, Ming Lei <ming....@canonical.com> wrote: >> >> >> Hi Dan and Tejun, >> >> >> >> >> >> Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered >> >> >> controllers) causes below sata failure[1] on APM AHCI controller. >> >> >> >> >> >> And the error does disappear after reverting the commit. >> >> > >> >> > Can you give some more details about the workload and the disk? What >> >> > is the APM AHCI? >> >> >> >> The commit causes the APM arm64 based system not bootable, and not >> >> special workload, just when mounting rootfs or reading init files. >> >> >> >> APM AHCI: >> >> drivers/ata/ahci_xgene.c >> > >> > Urgh... so, the controller can't handle tags allocated in round-robin >> > instead of lowest-first? Ming, can you please test with different >> > controllers / disks / ssds and see whether the problem stays with the >> > controller? Loc, can you guys please look into it so that at least >> > the next revision hardware can fix the issue? >> >> The problem has been observed on several apm arm64 boards >> with different disks. >> >> > >> > Provided that the tag allocation itself isn't broken, which isn't too >> > likely given the lack of bug reports on other controllers, we'll need >> > to blacklist ahci_xgene somehow. Either disable NCQ support on it or >> > implement an alternative lowest-first tag allocation for it. If this >> > actually is the controller freaking out about tags allocated in a >> > different order, I'm not sure how much confidence we can put in its >> > NCQ implementation tho and it might be a better idea to simply plug >> > NCQ support on it until we understand the details of the problem. >> > >> > So, something like the following? Ming, can you please verify whether >> > the following makes the machine boot again? >> >> Looks the problem still persists after applying your patch of disabling >> NCQ. > > How about the following patch? > > 8<--------- > libata: allow xgene-ahci to opt-out of tag ordered submission > > From: Dan Williams <dan.j.willi...@intel.com> > > Ming Lei reports: > "Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered > controllers) causes below sata failure[1] on APM AHCI controller. > > And the error does disappear after reverting the commit. > > ata4.00: exception Emask 0x40 SAct 0xff00 SErr 0x800 action 0x6 frozen > ata4: SError: { HostInt } > ata4.00: failed command: READ FPDMA QUEUED > ata4.00: cmd 60/08:40:e0:a4:88/00:00:04:00:00/40 tag 8 ncq 4096 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)" > > Maintain tag ordered submission as the default, but allow xgene-ahci to > continue with the old behavior. > > Reported-by: Ming Lei <ming....@canonical.com> > Signed-off-by: Dan Williams <dan.j.willi...@intel.com> > --- > drivers/ata/ahci_xgene.c | 1 + > drivers/ata/libata-core.c | 42 +++++++++++++++++++++++++++++++++++++++--- > drivers/ata/libata.h | 2 ++ > include/linux/libata.h | 1 + > 4 files changed, 43 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c > index 77c89bf171f1..9be069f5f58a 100644 > --- a/drivers/ata/ahci_xgene.c > +++ b/drivers/ata/ahci_xgene.c
+#include "libata.h" is needed, otherwise build will fail. > @@ -300,6 +300,7 @@ static struct ata_port_operations xgene_ahci_ops = { > .host_stop = xgene_ahci_host_stop, > .hardreset = xgene_ahci_hardreset, > .read_id = xgene_ahci_read_id, > + .qc_new = ata_qc_new_fifo_order, > }; > > static const struct ata_port_info xgene_ahci_port_info = { > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 943cc8b83e59..dd554354791f 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -83,6 +83,7 @@ const struct ata_port_operations ata_base_port_ops = { > .error_handler = ata_std_error_handler, > .sched_eh = ata_std_sched_eh, > .end_eh = ata_std_end_eh, > + .qc_new = ata_qc_new_tag_order, > }; > > const struct ata_port_operations sata_port_ops = { > @@ -4784,14 +4785,17 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) > } > > /** > - * ata_qc_new - Request an available ATA command, for queueing > + * ata_qc_new_tag_order - Request an available ATA command, for queueing > * @ap: target port > * > + * Accomodates controllers that issue commands in tag order rather > + * than FIFO order (Intel AHCI). > + * > * LOCKING: > * None. > */ > > -static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) > +struct ata_queued_cmd *ata_qc_new_tag_order(struct ata_port *ap) > { > struct ata_queued_cmd *qc = NULL; > unsigned int i, tag; > @@ -4819,6 +4823,38 @@ static struct ata_queued_cmd *ata_qc_new(struct > ata_port *ap) > } > > /** > + * ata_qc_new_fifo_order - Request an available ATA command, for queueing > + * @ap: target port > + * > + * Allocate first available tag, for hosts that maintain fifo issue > + * order on the wire, or otherwise cannot handle tag order. > + * > + * LOCKING: > + * None. > + */ > + > +struct ata_queued_cmd *ata_qc_new_fifo_order(struct ata_port *ap) > +{ > + struct ata_queued_cmd *qc = NULL; > + unsigned int i; > + > + /* no command while frozen */ > + if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) > + return NULL; > + > + /* the last tag is reserved for internal command. */ > + for (i = 0; i < ATA_MAX_QUEUE - 1; i++) > + if (!test_and_set_bit(i, &ap->qc_allocated)) { > + qc = __ata_qc_from_tag(ap, i); > + qc->tag = i; > + break; > + } > + > + return qc; > +} > +EXPORT_SYMBOL_GPL(ata_qc_new_fifo_order); > + > +/** > * ata_qc_new_init - Request an available ATA command, and initialize it > * @dev: Device from whom we request an available command structure > * > @@ -4831,7 +4867,7 @@ struct ata_queued_cmd *ata_qc_new_init(struct > ata_device *dev) > struct ata_port *ap = dev->link->ap; > struct ata_queued_cmd *qc; > > - qc = ata_qc_new(ap); > + qc = ap->ops->qc_new(ap); > if (qc) { > qc->scsicmd = NULL; > qc->ap = ap; > diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h > index 45b5ab3a95d5..381ba41de464 100644 > --- a/drivers/ata/libata.h > +++ b/drivers/ata/libata.h > @@ -90,6 +90,8 @@ extern int ata_down_xfermask_limit(struct ata_device *dev, > unsigned int sel); > extern unsigned int ata_dev_set_feature(struct ata_device *dev, > u8 enable, u8 feature); > extern void ata_sg_clean(struct ata_queued_cmd *qc); > +extern struct ata_queued_cmd *ata_qc_new_tag_order(struct ata_port *ap); > +extern struct ata_queued_cmd *ata_qc_new_fifo_order(struct ata_port *ap); > extern void ata_qc_free(struct ata_queued_cmd *qc); > extern void ata_qc_issue(struct ata_queued_cmd *qc); > extern void __ata_qc_complete(struct ata_queued_cmd *qc); > diff --git a/include/linux/libata.h b/include/linux/libata.h > index 5ab4e3a76721..852686837d1c 100644 > --- a/include/linux/libata.h > +++ b/include/linux/libata.h > @@ -880,6 +880,7 @@ struct ata_port_operations { > void (*qc_prep)(struct ata_queued_cmd *qc); > unsigned int (*qc_issue)(struct ata_queued_cmd *qc); > bool (*qc_fill_rtf)(struct ata_queued_cmd *qc); > + struct ata_queued_cmd *(*qc_new)(struct ata_port *ap); > > /* > * Configuration and exception handling After applying this patch again -next tree, the apm mustang board can boot again. Tested-by: Ming Lei <ming....@canonical.com> Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/