date:20170930

Re: [PATCH v3 1/6] gpu: host1x: Enable Tegra186 syncpoint protection

2017-09-30 Thread Mikko Perttunen


On 09/30/2017 05:41 AM, Dmitry Osipenko wrote:

On 28.09.2017 15:50, Mikko Perttunen wrote:

..
diff --git a/drivers/gpu/host1x/hw/channel_hw.c 
b/drivers/gpu/host1x/hw/channel_hw.c
index 8447a56c41ca..b929d7f1e291 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -147,6 +147,9 @@ static int channel_submit(struct host1x_job *job)
  
  	syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
  
+	/* assign syncpoint to channel */

+   host1x_hw_syncpt_assign_to_channel(host, sp, ch);
+


Since you've preserved the comment, what about to extend it with a brief
explanation of what actually the 'assignment' does? Like that CDMA will stop
execution on touching any syncpoint other then the assigned one.


Whoops, I actually forgot to remove that :) I think the best would be to 
remove the comment here and have a more proper description of the 
feature somewhere else.


Mikko

Re: [PATCH v2] VFS: Handle lazytime in do_mount()

2017-09-30 Thread Markus Trippelsdorf

On 2017.09.19 at 17:25 +0200, Lukas Czerner wrote:
> On Tue, Sep 19, 2017 at 12:37:24PM +0200, Markus Trippelsdorf wrote:
> > Since commit e462ec50cb5fa ("VFS: Differentiate mount flags (MS_*) from
> > internal superblock flags") the lazytime mount option didn't get passed
> > on anymore.
> > 
> > Fix the issue by handling the option in do_mount().
> > 
> > Signed-off-by: Markus Trippelsdorf 
> > ---
> >  fs/namespace.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 54059b142d6b..b633838b8f02 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2823,7 +2823,8 @@ long do_mount(const char *dev_name, const char __user 
> > *dir_name,
> > SB_MANDLOCK |
> > SB_DIRSYNC |
> > SB_SILENT |
> > -   SB_POSIXACL);
> > +   SB_POSIXACL |
> > +   SB_LAZYTIME);
> 
> Looks good. Although I still think that this can be per mountpoint options.
> 
> Regardless of that, you can add
> Reviewed-by: Lukas Czerner 

Ping?
Al could you please take look? 
Thanks.

-- 
Markus

Re: [PATCH] RDMA/hns: return 0 rather than return a garbage status value

2017-09-30 Thread Wei Hu (Xavier)


Thanks,  Colin Ian King

Acked-by: Wei Hu (Xavier) 

On 2017/9/30 4:13, Colin King wrote:

From: Colin Ian King 

For the case where hr_qp->state == IB_QPS_RESET, an uninitialized
value in ret is being returned by function hns_roce_v2_query_qp.
Fix this by setting ret to 0 for this specific return condition.

Detected by CoverityScan, CID#1457203 ("Unitialized scalar variable")

Signed-off-by: Colin Ian King 
---
  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 4870b51caab9..791dae72e775 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -2805,6 +2805,7 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, 
struct ib_qp_attr *qp_attr,
  
  	if (hr_qp->state == IB_QPS_RESET) {

qp_attr->qp_state = IB_QPS_RESET;
+   ret = 0;
goto done;
}

Re: [PATCH v3 2/8] platform/x86: dell-smbios: Introduce a WMI-ACPI interface

2017-09-30 Thread Pali Rohár

On Saturday 30 September 2017 02:51:27 Darren Hart wrote:
> > +DELL SMBIOS DRIVER
> > +M: Pali Rohár 
> > +M: Mario Limonciello 
> > +S: Maintained
> > +F: drivers/platform/x86/dell-smbios.*
> 
> Pali, do you agree with this?

Yes, no problem.

> > -static int __init dell_smbios_init(void)
> > +static int dell_smbios_wmi_probe(struct wmi_device *wdev)
> > +{
> > +   /* no longer need the SMI page */
> > +   free_page((unsigned long)buffer);
> > +
> > +   /* WMI buffer should be 32k */
> > +   buffer = (void *)__get_free_pages(GFP_KERNEL, 3);
> 
> Assuming PAGE_SIZE here (I know, this driver, this architecture,
> etc...). But, please use get_order() to determine number of pages
> from a linear size:
> 
> __get_free_pages(GFP_KERNEL, get_order(32768));

I agree that specifying size (instead of count) explicitly lead to more 
readable code.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.

[PATCH 0/6] Replace container_of with list_entry

2017-09-30 Thread Srishti Sharma

Replaces instances of container_of with list_entry to 
access current list element.

Srishti Sharma (6):
  Staging: rtl8188eu: core: Use list_entry instead of container_of
  Staging: rtl8188eu: core: Use list_entry instead of container_of
  Staging: rtl8188eu: core: Use list_entry instead of container_of
  Staging: rtl8188eu: core: Use list_entry instead of container_of
  Staging: rtl8188eu: core: Use list_entry instead of container_of
  Staging: rtl8188eu: core: Use list_entry instead of container_of

 drivers/staging/rtl8188eu/core/rtw_ap.c   | 14 +++---
 drivers/staging/rtl8188eu/core/rtw_mlme.c |  8 
 drivers/staging/rtl8188eu/core/rtw_mlme_ext.c |  4 ++--
 drivers/staging/rtl8188eu/core/rtw_recv.c | 14 +++---
 drivers/staging/rtl8188eu/core/rtw_sta_mgt.c  | 12 ++--
 drivers/staging/rtl8188eu/core/rtw_xmit.c | 14 +++---
 6 files changed, 33 insertions(+), 33 deletions(-)

-- 
2.7.4

[PATCH 1/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of the struct list_head* use list_entry to access
current list element instead of using container_of.
Done using the following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
   (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_recv.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_recv.c 
b/drivers/staging/rtl8188eu/core/rtw_recv.c
index 3fd5f41..af59c16 100644
--- a/drivers/staging/rtl8188eu/core/rtw_recv.c
+++ b/drivers/staging/rtl8188eu/core/rtw_recv.c
@@ -193,7 +193,7 @@ void rtw_free_recvframe_queue(struct __queue *pframequeue,  
struct __queue *pfre
plist = phead->next;
 
while (phead != plist) {
-   hdr = container_of(plist, struct recv_frame, list);
+   hdr = list_entry(plist, struct recv_frame, list);
 
plist = plist->next;
 
@@ -943,7 +943,7 @@ static int validate_recv_ctrl_frame(struct adapter 
*padapter,
xmitframe_plist = xmitframe_phead->next;
 
if (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, 
struct xmit_frame, list);
+   pxmitframe = list_entry(xmitframe_plist, struct 
xmit_frame, list);
 
xmitframe_plist = xmitframe_plist->next;
 
@@ -1347,7 +1347,7 @@ static struct recv_frame *recvframe_defrag(struct adapter 
*adapter,
 
phead = get_list_head(defrag_q);
plist = phead->next;
-   pfhdr = container_of(plist, struct recv_frame, list);
+   pfhdr = list_entry(plist, struct recv_frame, list);
prframe = pfhdr;
list_del_init(&(prframe->list));
 
@@ -1367,7 +1367,7 @@ static struct recv_frame *recvframe_defrag(struct adapter 
*adapter,
plist = plist->next;
 
while (phead != plist) {
-   pnfhdr = container_of(plist, struct recv_frame, list);
+   pnfhdr = list_entry(plist, struct recv_frame, list);
pnextrframe = pnfhdr;
 
/* check the fragment sequence  (2nd ~n fragment frame) */
@@ -1655,7 +1655,7 @@ static int enqueue_reorder_recvframe(struct 
recv_reorder_ctrl *preorder_ctrl,
plist = phead->next;
 
while (phead != plist) {
-   hdr = container_of(plist, struct recv_frame, list);
+   hdr = list_entry(plist, struct recv_frame, list);
pnextattrib = &hdr->attrib;
 
if (SN_LESS(pnextattrib->seq_num, pattrib->seq_num))
@@ -1690,7 +1690,7 @@ static int recv_indicatepkts_in_order(struct adapter 
*padapter, struct recv_reor
if (list_empty(phead))
return true;
 
-   prhdr = container_of(plist, struct recv_frame, list);
+   prhdr = list_entry(plist, struct recv_frame, list);
pattrib = &prhdr->attrib;
preorder_ctrl->indicate_seq = pattrib->seq_num;
}
@@ -1698,7 +1698,7 @@ static int recv_indicatepkts_in_order(struct adapter 
*padapter, struct recv_reor
/*  Prepare indication list and indication. */
/*  Check if there is any packet need indicate. */
while (!list_empty(phead)) {
-   prhdr = container_of(plist, struct recv_frame, list);
+   prhdr = list_entry(plist, struct recv_frame, list);
prframe = prhdr;
pattrib = &prframe->attrib;
 
-- 
2.7.4

[PATCH 2/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of the type struct list_head use list_entry to access
current list element instead of using container_of.
Done using the following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
   (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_sta_mgt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c 
b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
index 22cf362..f9df4ac 100644
--- a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
+++ b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
@@ -152,8 +152,8 @@ u32 _rtw_free_sta_priv(struct   sta_priv *pstapriv)
while (phead != plist) {
int i;
 
-   psta = container_of(plist, struct sta_info,
-   hash_list);
+   psta = list_entry(plist, struct sta_info,
+ hash_list);
plist = plist->next;
 
for (i = 0; i < 16; i++) {
@@ -323,7 +323,7 @@ u32 rtw_free_stainfo(struct adapter *padapter, struct 
sta_info *psta)
plist = phead->next;
 
while (!list_empty(phead)) {
-   prframe = container_of(plist, struct recv_frame, list);
+   prframe = list_entry(plist, struct recv_frame, list);
 
plist = plist->next;
 
@@ -399,7 +399,7 @@ void rtw_free_all_stainfo(struct adapter *padapter)
plist = phead->next;
 
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, hash_list);
+   psta = list_entry(plist, struct sta_info, hash_list);
 
plist = plist->next;
 
@@ -435,7 +435,7 @@ struct sta_info *rtw_get_stainfo(struct sta_priv *pstapriv, 
u8 *hwaddr)
plist = phead->next;
 
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, hash_list);
+   psta = list_entry(plist, struct sta_info, hash_list);
 
if ((!memcmp(psta->hwaddr, addr, ETH_ALEN)) == true) {
/*  if found the matched address */
@@ -493,7 +493,7 @@ u8 rtw_access_ctrl(struct adapter *padapter, u8 *mac_addr)
phead = get_list_head(pacl_node_q);
plist = phead->next;
while (phead != plist) {
-   paclnode = container_of(plist, struct rtw_wlan_acl_node, list);
+   paclnode = list_entry(plist, struct rtw_wlan_acl_node, list);
plist = plist->next;
 
if (!memcmp(paclnode->addr, mac_addr, ETH_ALEN)) {
-- 
2.7.4

[PATCH 3/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of the type struct list_head* use list_entry to access
current list element instead of using container_of.
Done using the following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
  (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_mlme.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme.c 
b/drivers/staging/rtl8188eu/core/rtw_mlme.c
index f663e6c..71b1f8a 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mlme.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mlme.c
@@ -198,7 +198,7 @@ struct wlan_network *rtw_find_network(struct __queue 
*scanned_queue, u8 *addr)
plist = phead->next;
 
while (plist != phead) {
-   pnetwork = container_of(plist, struct wlan_network, list);
+   pnetwork = list_entry(plist, struct wlan_network, list);
if (!memcmp(addr, pnetwork->network.MacAddress, ETH_ALEN))
break;
plist = plist->next;
@@ -223,7 +223,7 @@ void rtw_free_network_queue(struct adapter *padapter, u8 
isfreeall)
plist = phead->next;
 
while (phead != plist) {
-   pnetwork = container_of(plist, struct wlan_network, list);
+   pnetwork = list_entry(plist, struct wlan_network, list);
 
plist = plist->next;
 
@@ -342,7 +342,7 @@ struct  wlan_network
*rtw_get_oldest_wlan_network(struct __queue *scanned_queue)
phead = get_list_head(scanned_queue);
 
for (plist = phead->next; plist != phead; plist = plist->next) {
-   pwlan = container_of(plist, struct wlan_network, list);
+   pwlan = list_entry(plist, struct wlan_network, list);
 
if (!pwlan->fixed) {
if (!oldest || time_after(oldest->last_scanned, 
pwlan->last_scanned))
@@ -421,7 +421,7 @@ void rtw_update_scanned_network(struct adapter *adapter, 
struct wlan_bssid_ex *t
plist = phead->next;
 
while (phead != plist) {
-   pnetwork= container_of(plist, struct wlan_network, 
list);
+   pnetwork= list_entry(plist, struct wlan_network, list);
 
if (is_same_network(&(pnetwork->network), target))
break;
-- 
2.7.4

[PATCH 4/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of the type struct list_head* use list_entry to access
current list element instead of using container_of. Done using the
following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
   (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_ap.c 
b/drivers/staging/rtl8188eu/core/rtw_ap.c
index 32a4837..35c03d8 100644
--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -293,7 +293,7 @@ voidexpire_timeout_chk(struct adapter *padapter)
 
/* check auth_queue */
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, auth_list);
+   psta = list_entry(plist, struct sta_info, auth_list);
plist = plist->next;
 
if (psta->expire_to > 0) {
@@ -327,7 +327,7 @@ voidexpire_timeout_chk(struct adapter *padapter)
 
/* check asoc_queue */
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, asoc_list);
+   psta = list_entry(plist, struct sta_info, asoc_list);
plist = plist->next;
 
if (chk_sta_is_alive(psta) || !psta->expire_to) {
@@ -1149,7 +1149,7 @@ int rtw_acl_add_sta(struct adapter *padapter, u8 *addr)
plist = phead->next;
 
while (phead != plist) {
-   paclnode = container_of(plist, struct rtw_wlan_acl_node, list);
+   paclnode = list_entry(plist, struct rtw_wlan_acl_node, list);
plist = plist->next;
 
if (!memcmp(paclnode->addr, addr, ETH_ALEN)) {
@@ -1209,7 +1209,7 @@ int rtw_acl_remove_sta(struct adapter *padapter, u8 *addr)
plist = phead->next;
 
while (phead != plist) {
-   paclnode = container_of(plist, struct rtw_wlan_acl_node, list);
+   paclnode = list_entry(plist, struct rtw_wlan_acl_node, list);
plist = plist->next;
 
if (!memcmp(paclnode->addr, addr, ETH_ALEN)) {
@@ -1456,7 +1456,7 @@ void associated_clients_update(struct adapter *padapter, 
u8 updated)
 
/* check asoc_queue */
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, asoc_list);
+   psta = list_entry(plist, struct sta_info, asoc_list);
 
plist = plist->next;
 
@@ -1728,7 +1728,7 @@ int rtw_sta_flush(struct adapter *padapter)
 
/* free sta asoc_queue */
while (phead != plist) {
-   psta = container_of(plist, struct sta_info, asoc_list);
+   psta = list_entry(plist, struct sta_info, asoc_list);
 
plist = plist->next;
 
@@ -1856,7 +1856,7 @@ void stop_ap_mode(struct adapter *padapter)
phead = get_list_head(pacl_node_q);
plist = phead->next;
while (phead != plist) {
-   paclnode = container_of(plist, struct rtw_wlan_acl_node, list);
+   paclnode = list_entry(plist, struct rtw_wlan_acl_node, list);
plist = plist->next;
 
if (paclnode->valid) {
-- 
2.7.4

[PATCH 5/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of type struct list_head* use list_entry to access
current list element instead of using container_of. Done using
the following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
   (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_mlme_ext.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme_ext.c 
b/drivers/staging/rtl8188eu/core/rtw_mlme_ext.c
index 611c940..80fe6ae 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mlme_ext.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mlme_ext.c
@@ -1790,7 +1790,7 @@ static void issue_action_BSSCoexistPacket(struct adapter 
*padapter)
u8 *p;
struct wlan_bssid_ex *pbss_network;
 
-   pnetwork = container_of(plist, struct wlan_network, 
list);
+   pnetwork = list_entry(plist, struct wlan_network, list);
 
plist = plist->next;
 
@@ -5470,7 +5470,7 @@ u8 tx_beacon_hdl(struct adapter *padapter, unsigned char 
*pbuf)
xmitframe_plist = xmitframe_phead->next;
 
while (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, 
struct xmit_frame, list);
+   pxmitframe = list_entry(xmitframe_plist, struct 
xmit_frame, list);
 
xmitframe_plist = xmitframe_plist->next;
 
-- 
2.7.4

[PATCH 6/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Srishti Sharma

For variables of type struct list_head* use list_entry to access
current list element instead of using container_of. Done by the
following semantic patch by coccinelle.

@r@
struct list_head* l;
@@

-container_of
+list_entry
  (l,...)

Signed-off-by: Srishti Sharma 
---
 drivers/staging/rtl8188eu/core/rtw_xmit.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c 
b/drivers/staging/rtl8188eu/core/rtw_xmit.c
index be2f46e..29e9ee9 100644
--- a/drivers/staging/rtl8188eu/core/rtw_xmit.c
+++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c
@@ -1401,7 +1401,7 @@ void rtw_free_xmitframe_queue(struct xmit_priv 
*pxmitpriv, struct __queue *pfram
plist = phead->next;
 
while (phead != plist) {
-   pxmitframe = container_of(plist, struct xmit_frame, list);
+   pxmitframe = list_entry(plist, struct xmit_frame, list);
 
plist = plist->next;
 
@@ -1432,7 +1432,7 @@ static struct xmit_frame *dequeue_one_xmitframe(struct 
xmit_priv *pxmitpriv, str
xmitframe_plist = xmitframe_phead->next;
 
if (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
list);
+   pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
list);
 
xmitframe_plist = xmitframe_plist->next;
 
@@ -1473,7 +1473,7 @@ struct xmit_frame *rtw_dequeue_xframe(struct xmit_priv 
*pxmitpriv, struct hw_xmi
sta_plist = sta_phead->next;
 
while (sta_phead != sta_plist) {
-   ptxservq = container_of(sta_plist, struct tx_servq, 
tx_pending);
+   ptxservq = list_entry(sta_plist, struct tx_servq, 
tx_pending);
 
pframe_queue = &ptxservq->sta_pending;
 
@@ -1811,7 +1811,7 @@ static void dequeue_xmitframes_to_sleeping_queue(struct 
adapter *padapter, struc
plist = phead->next;
 
while (phead != plist) {
-   pxmitframe = container_of(plist, struct xmit_frame, list);
+   pxmitframe = list_entry(plist, struct xmit_frame, list);
 
plist = plist->next;
 
@@ -1878,7 +1878,7 @@ void wakeup_sta_to_xmit(struct adapter *padapter, struct 
sta_info *psta)
xmitframe_plist = xmitframe_phead->next;
 
while (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
list);
+   pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
list);
 
xmitframe_plist = xmitframe_plist->next;
 
@@ -1959,7 +1959,7 @@ void wakeup_sta_to_xmit(struct adapter *padapter, struct 
sta_info *psta)
xmitframe_plist = xmitframe_phead->next;
 
while (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, struct 
xmit_frame, list);
+   pxmitframe = list_entry(xmitframe_plist, struct 
xmit_frame, list);
 
xmitframe_plist = xmitframe_plist->next;
 
@@ -2006,7 +2006,7 @@ void xmit_delivery_enabled_frames(struct adapter 
*padapter, struct sta_info *pst
xmitframe_plist = xmitframe_phead->next;
 
while (xmitframe_phead != xmitframe_plist) {
-   pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
list);
+   pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
list);
 
xmitframe_plist = xmitframe_plist->next;
 
-- 
2.7.4

[RFC PATCH v2 2/8] cpuidle: record the overhead of idle entry

2017-09-30 Thread Aubrey Li

Record the overhead of idle entry in micro-second

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/cpuidle.c | 33 +
 include/linux/cpuidle.h   | 14 ++
 kernel/sched/idle.c   |  8 +++-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 60bb64f..4066308 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -302,6 +302,39 @@ void cpuidle_reflect(struct cpuidle_device *dev, int index)
cpuidle_curr_governor->reflect(dev, index);
 }
 
+/* cpuidle_entry_start - record idle entry start */
+void cpuidle_entry_start(void)
+{
+   struct cpuidle_device *dev = cpuidle_get_device();
+
+   if (dev)
+   dev->idle_stat.entry_start = local_clock();
+}
+
+/*
+ * cpuidle_entry_end - record idle entry end, and maintain
+ * the entry overhead average in micro-second
+ */
+void cpuidle_entry_end(void)
+{
+   struct cpuidle_device *dev = cpuidle_get_device();
+   u64 overhead;
+   s64 diff;
+
+   if (dev) {
+   dev->idle_stat.entry_end = local_clock();
+   overhead = div_u64(dev->idle_stat.entry_end -
+   dev->idle_stat.entry_start, NSEC_PER_USEC);
+   diff = overhead - dev->idle_stat.overhead;
+   dev->idle_stat.overhead += diff >> 3;
+   /*
+* limit overhead to 1us
+*/
+   if (dev->idle_stat.overhead == 0)
+   dev->idle_stat.overhead = 1;
+   }
+}
+
 /**
  * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
  */
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fc1e5d7..cad9b71 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -72,6 +72,15 @@ struct cpuidle_device_kobj;
 struct cpuidle_state_kobj;
 struct cpuidle_driver_kobj;
 
+struct cpuidle_stat {
+   u64 entry_start;/* nanosecond */
+   u64 entry_end;  /* nanosecond */
+   u64 overhead;   /* nanosecond */
+   unsigned intpredicted_us;   /* microsecond */
+   boolpredicted;  /* ever predicted? */
+   boolfast_idle;  /* fast idle? */
+};
+
 struct cpuidle_device {
unsigned intregistered:1;
unsigned intenabled:1;
@@ -89,6 +98,7 @@ struct cpuidle_device {
cpumask_t   coupled_cpus;
struct cpuidle_coupled  *coupled;
 #endif
+   struct cpuidle_stat idle_stat;
 };
 
 DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
@@ -131,6 +141,8 @@ extern bool cpuidle_not_available(struct cpuidle_driver 
*drv,
 
 extern int cpuidle_select(struct cpuidle_driver *drv,
  struct cpuidle_device *dev);
+extern void cpuidle_entry_start(void);
+extern void cpuidle_entry_end(void);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -164,6 +176,8 @@ static inline bool cpuidle_not_available(struct 
cpuidle_driver *drv,
 static inline int cpuidle_select(struct cpuidle_driver *drv,
 struct cpuidle_device *dev)
 {return -ENODEV; }
+static inline void cpuidle_entry_start(void) { }
+static inline void cpuidle_entry_end(void) { }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
 {return -ENODEV; }
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 6c23e30..0951dac 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -210,6 +210,12 @@ static void cpuidle_idle_call(void)
 static void do_idle(void)
 {
/*
+* we record idle entry overhead now, so any deferrable items
+* in idle entry path need to be placed between cpuidle_entry_start()
+* and cpuidle_entry_end()
+*/
+   cpuidle_entry_start();
+   /*
 * If the arch has a polling bit, we maintain an invariant:
 *
 * Our polling bit is clear if we're not scheduled (i.e. if rq->curr !=
@@ -217,10 +223,10 @@ static void do_idle(void)
 * then setting need_resched is guaranteed to cause the CPU to
 * reschedule.
 */
-
__current_set_polling();
quiet_vmstat();
tick_nohz_idle_enter();
+   cpuidle_entry_end();
 
while (!need_resched()) {
check_pgt_cache();
-- 
2.7.4

[RFC PATCH v2 7/8] cpuidle: introduce irq timing to make idle prediction

2017-09-30 Thread Aubrey Li

Introduce irq timings output as a factor to predict the duration
of the coming idle

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/Kconfig   |  1 +
 drivers/cpuidle/cpuidle.c | 17 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index 7e48eb5..8b07e1c 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -5,6 +5,7 @@ config CPU_IDLE
default y if ACPI || PPC_PSERIES
select CPU_IDLE_GOV_LADDER if (!NO_HZ && !NO_HZ_IDLE)
select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE)
+   select IRQ_TIMINGS
help
  CPU idle is a generic framework for supporting software-controlled
  idle processor power management.  It includes modular cross-platform
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 5d4f0b6..be56cea 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "cpuidle.h"
@@ -342,13 +343,27 @@ void cpuidle_entry_end(void)
 void cpuidle_predict(void)
 {
struct cpuidle_device *dev = cpuidle_get_device();
-   unsigned int overhead_threshold;
+   unsigned int idle_interval, overhead_threshold;
+   u64 now, next_evt;
 
if (!dev)
return;
 
overhead_threshold = dev->idle_stat.overhead * sysctl_fast_idle_ratio;
 
+   /*
+* check irq timings if the next event is coming soon
+*/
+   now = local_clock();
+   local_irq_disable();
+   next_evt = irq_timings_next_event(now);
+   local_irq_enable();
+   idle_interval = div_u64(next_evt - now, NSEC_PER_USEC);
+   if (idle_interval < overhead_threshold) {
+   dev->idle_stat.fast_idle = true;
+   return;
+   }
+
if (cpuidle_curr_governor->predict) {
dev->idle_stat.predicted_us = cpuidle_curr_governor->predict();
/*
-- 
2.7.4

[RFC PATCH v2 3/8] cpuidle: add a new predict interface

2017-09-30 Thread Aubrey Li

For the governor has predict functionality, add a new predict
interface in cpuidle framework to call and use it.
---
 drivers/cpuidle/cpuidle.c| 34 ++
 drivers/cpuidle/governors/menu.c |  7 +++
 include/linux/cpuidle.h  |  3 +++
 kernel/sched/idle.c  |  1 +
 4 files changed, 45 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 4066308..ef6f7dd 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -336,6 +336,40 @@ void cpuidle_entry_end(void)
 }
 
 /**
+ * cpuidle_predict - predict whether the coming idle is a fast idle or not
+ */
+void cpuidle_predict(void)
+{
+   struct cpuidle_device *dev = cpuidle_get_device();
+   unsigned int overhead_threshold;
+
+   if (!dev)
+   return;
+
+   overhead_threshold = dev->idle_stat.overhead;
+
+   if (cpuidle_curr_governor->predict) {
+   dev->idle_stat.predicted_us = cpuidle_curr_governor->predict();
+   /*
+* notify idle governor to avoid reduplicative
+* prediction computation
+*/
+   dev->idle_stat.predicted = true;
+   if (dev->idle_stat.predicted_us < overhead_threshold) {
+   /*
+* notify tick subsystem to keep ticking
+* for the coming idle
+*/
+   dev->idle_stat.fast_idle = true;
+   } else
+   dev->idle_stat.fast_idle = false;
+   } else {
+   dev->idle_stat.predicted = false;
+   dev->idle_stat.fast_idle = false;
+   }
+}
+
+/**
  * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
  */
 void cpuidle_install_idle_handler(void)
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 6bed197..90b2a10 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -344,6 +344,12 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
if (unlikely(latency_req == 0))
return 0;
 
+   /*don't predict again if idle framework already did it */
+   if (!dev->idle_stat.predicted)
+   menu_predict();
+   else
+   dev->idle_stat.predicted = false;
+
if (CPUIDLE_DRIVER_STATE_START > 0) {
struct cpuidle_state *s = 
&drv->states[CPUIDLE_DRIVER_STATE_START];
unsigned int polling_threshold;
@@ -518,6 +524,7 @@ static struct cpuidle_governor menu_governor = {
.enable =   menu_enable_device,
.select =   menu_select,
.reflect =  menu_reflect,
+   .predict =  menu_predict,
 };
 
 /**
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index cad9b71..9ca0288 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -143,6 +143,7 @@ extern int cpuidle_select(struct cpuidle_driver *drv,
  struct cpuidle_device *dev);
 extern void cpuidle_entry_start(void);
 extern void cpuidle_entry_end(void);
+extern void cpuidle_predict(void);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -178,6 +179,7 @@ static inline int cpuidle_select(struct cpuidle_driver *drv,
 {return -ENODEV; }
 static inline void cpuidle_entry_start(void) { }
 static inline void cpuidle_entry_end(void) { }
+static inline void cpuidle_predict(void) { }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
 {return -ENODEV; }
@@ -255,6 +257,7 @@ struct cpuidle_governor {
int  (*select)  (struct cpuidle_driver *drv,
struct cpuidle_device *dev);
void (*reflect) (struct cpuidle_device *dev, int index);
+   unsigned int (*predict)(void);
 };
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 0951dac..8704f3c 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -225,6 +225,7 @@ static void do_idle(void)
 */
__current_set_polling();
quiet_vmstat();
+   cpuidle_predict();
tick_nohz_idle_enter();
cpuidle_entry_end();
 
-- 
2.7.4

[RFC PATCH v2 8/8] cpuidle: introduce run queue average idle to make idle prediction

2017-09-30 Thread Aubrey Li

Introduce run queue average idle in scheduler as a factor to make
idle prediction

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/cpuidle.c | 12 
 include/linux/cpuidle.h   |  1 +
 kernel/sched/idle.c   |  5 +
 3 files changed, 18 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index be56cea..9424a2d 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -364,6 +364,18 @@ void cpuidle_predict(void)
return;
}
 
+   /*
+* check scheduler if the coming idle is likely a fast idle
+*/
+   idle_interval = div_u64(sched_idle_avg(), NSEC_PER_USEC);
+   if (idle_interval < overhead_threshold) {
+   dev->idle_stat.fast_idle = true;
+   return;
+   }
+
+   /*
+* check the idle governor if the coming idle is likely a fast idle
+*/
if (cpuidle_curr_governor->predict) {
dev->idle_stat.predicted_us = cpuidle_curr_governor->predict();
/*
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 45b8264..387d72b 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -234,6 +234,7 @@ static inline void cpuidle_use_deepest_state(bool enable)
 /* kernel/sched/idle.c */
 extern void sched_idle_set_state(struct cpuidle_state *idle_state);
 extern void default_idle_call(void);
+extern u64 sched_idle_avg(void);
 
 #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
 void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev, atomic_t *a);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8704f3c..d23b472 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -30,6 +30,11 @@ void sched_idle_set_state(struct cpuidle_state *idle_state)
idle_set_state(this_rq(), idle_state);
 }
 
+u64 sched_idle_avg(void)
+{
+   return this_rq()->avg_idle;
+}
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
-- 
2.7.4

[RFC PATCH v2 6/8] cpuidle: make fast idle threshold tunable

2017-09-30 Thread Aubrey Li

Add a knob to make fast idle threshold tunable

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/cpuidle.c |  3 ++-
 include/linux/cpuidle.h   |  1 +
 kernel/sysctl.c   | 12 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 6cb7e17..5d4f0b6 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -35,6 +35,7 @@ LIST_HEAD(cpuidle_detected_devices);
 static int enabled_devices;
 static int off __read_mostly;
 static int initialized __read_mostly;
+int sysctl_fast_idle_ratio = 10;
 
 int cpuidle_disabled(void)
 {
@@ -346,7 +347,7 @@ void cpuidle_predict(void)
if (!dev)
return;
 
-   overhead_threshold = dev->idle_stat.overhead;
+   overhead_threshold = dev->idle_stat.overhead * sysctl_fast_idle_ratio;
 
if (cpuidle_curr_governor->predict) {
dev->idle_stat.predicted_us = cpuidle_curr_governor->predict();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 791db15..45b8264 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -24,6 +24,7 @@ struct module;
 struct cpuidle_device;
 struct cpuidle_driver;
 
+extern int sysctl_fast_idle_ratio;
 
 /
  * CPUIDLE DEVICE INTERFACE *
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6648fbb..97f7e8af 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1229,6 +1230,17 @@ static struct ctl_table kern_table[] = {
.extra2 = &one,
},
 #endif
+#ifdef CONFIG_CPU_IDLE
+   {
+   .procname   = "fast_idle_ratio",
+   .data   = &sysctl_fast_idle_ratio,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = &one,
+   .extra2 = &one_hundred,
+   },
+#endif
{ }
 };
 
-- 
2.7.4

[RFC PATCH v2 5/8] timers: keep sleep length updated as needed

2017-09-30 Thread Aubrey Li

sleep length indicates how long we'll be idle. Currently, it's updated
only when tick nohz enters. These patch series make a new requirement
with tick, so we should keep sleep length updated as needed
---
 kernel/time/tick-sched.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d663fab..94fb9b8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1008,8 +1008,11 @@ void tick_nohz_irq_exit(void)
  */
 ktime_t tick_nohz_get_sleep_length(void)
 {
+   struct clock_event_device *dev = 
__this_cpu_read(tick_cpu_device.evtdev);
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 
+   ts->sleep_length = ktime_sub(dev->next_event, ktime_get());
+
return ts->sleep_length;
 }
 
-- 
2.7.4

[RFC PATCH v2 1/8] cpuidle: menu: extract prediction functionality

2017-09-30 Thread Aubrey Li

There are several factors in the menu governor to predict the next
idle interval:
- the next timer
- the recent idle interval history
- the corrected idle interval pattern
These factors are common enough to be extracted to be one function.

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/governors/menu.c | 64 +---
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 61b64c2..6bed197 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -276,6 +276,45 @@ static unsigned int get_typical_interval(struct 
menu_device *data)
 }
 
 /**
+ * menu_predict - predict the coming idle interval
+ *
+ * Return the predicted coming idle interval in micro-second
+ */
+static unsigned int menu_predict(void)
+{
+   struct menu_device *data = this_cpu_ptr(&menu_devices);
+   unsigned int expected_interval;
+   int cpu = smp_processor_id();
+
+   if (!data)
+   return UINT_MAX;
+
+   /* determine the expected residency time, round up */
+   data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
+
+   data->bucket = which_bucket(data->next_timer_us, nr_iowait_cpu(cpu));
+
+   /*
+* Force the result of multiplication to be 64 bits even if both
+* operands are 32 bits.
+* Make sure to round up for half microseconds.
+*/
+   data->predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)
+   data->next_timer_us * data->correction_factor[data->bucket],
+   RESOLUTION * DECAY);
+
+   expected_interval = get_typical_interval(data);
+   expected_interval = min(expected_interval, data->next_timer_us);
+
+   /*
+* Use the lowest expected idle interval to pick the idle state.
+*/
+   data->predicted_us = min(data->predicted_us, expected_interval);
+
+   return data->predicted_us;
+}
+
+/**
  * menu_select - selects the next idle state to enter
  * @drv: cpuidle driver containing state data
  * @dev: the CPU
@@ -289,7 +328,6 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
int first_idx;
int idx;
unsigned int interactivity_req;
-   unsigned int expected_interval;
unsigned long nr_iowaiters, cpu_load;
int resume_latency = dev_pm_qos_raw_read_value(device);
 
@@ -306,24 +344,6 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
if (unlikely(latency_req == 0))
return 0;
 
-   /* determine the expected residency time, round up */
-   data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
-
-   get_iowait_load(&nr_iowaiters, &cpu_load);
-   data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
-
-   /*
-* Force the result of multiplication to be 64 bits even if both
-* operands are 32 bits.
-* Make sure to round up for half microseconds.
-*/
-   data->predicted_us = 
DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us *
-data->correction_factor[data->bucket],
-RESOLUTION * DECAY);
-
-   expected_interval = get_typical_interval(data);
-   expected_interval = min(expected_interval, data->next_timer_us);
-
if (CPUIDLE_DRIVER_STATE_START > 0) {
struct cpuidle_state *s = 
&drv->states[CPUIDLE_DRIVER_STATE_START];
unsigned int polling_threshold;
@@ -345,14 +365,10 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
}
 
/*
-* Use the lowest expected idle interval to pick the idle state.
-*/
-   data->predicted_us = min(data->predicted_us, expected_interval);
-
-   /*
 * Use the performance multiplier and the user-configurable
 * latency_req to determine the maximum exit latency.
 */
+   get_iowait_load(&nr_iowaiters, &cpu_load);
interactivity_req = data->predicted_us / 
performance_multiplier(nr_iowaiters, cpu_load);
if (latency_req > interactivity_req)
latency_req = interactivity_req;
-- 
2.7.4

[RFC PATCH v2 4/8] tick/nohz: keep tick on for a fast idle

2017-09-30 Thread Aubrey Li

If the next idle is expected to be a fast idle, we should keep tick
on before going into idle

Signed-off-by: Aubrey Li 
---
 drivers/cpuidle/cpuidle.c | 14 ++
 include/linux/cpuidle.h   |  2 ++
 kernel/time/tick-sched.c  |  4 
 3 files changed, 20 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef6f7dd..6cb7e17 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -370,6 +370,20 @@ void cpuidle_predict(void)
 }
 
 /**
+ * cpuidle_fast_idle - predict whether or not the coming idle is a fast idle
+ * This function can be called in irq exit path, make it as soon as possible
+ */
+bool cpuidle_fast_idle(void)
+{
+   struct cpuidle_device *dev = cpuidle_get_device();
+
+   if (!dev)
+   return false;
+
+   return dev->idle_stat.fast_idle;
+}
+
+/**
  * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
  */
 void cpuidle_install_idle_handler(void)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 9ca0288..791db15 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -144,6 +144,7 @@ extern int cpuidle_select(struct cpuidle_driver *drv,
 extern void cpuidle_entry_start(void);
 extern void cpuidle_entry_end(void);
 extern void cpuidle_predict(void);
+extern bool cpuidle_fast_idle(void);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -180,6 +181,7 @@ static inline int cpuidle_select(struct cpuidle_driver *drv,
 static inline void cpuidle_entry_start(void) { }
 static inline void cpuidle_entry_end(void) { }
 static inline void cpuidle_predict(void) { }
+static inline void cpuidle_fast_idle(void) {return false; }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
 {return -ENODEV; }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c..d663fab 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -916,6 +917,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
return false;
}
 
+   if (cpuidle_fast_idle())
+   return false;
+
return true;
 }
 
-- 
2.7.4

[RFC PATCH v2 0/8] Introduct cpu idle prediction functionality

2017-09-30 Thread Aubrey Li

We found under some latency intensive workloads, short idle periods occurs
very common, then idle entry and exit path starts to dominate, so it's
important to optimize them. To determine the short idle pattern, we need
to figure out how long of the coming idle and the threshold of the short
idle interval.

A cpu idle prediction functionality is introduced in this proposal to catch
the short idle pattern.

Firstly, we check the IRQ timings subsystem, if there is an event
coming soon.
-- https://lwn.net/Articles/691297/

Secondly, we check the idle statistics of scheduler, if it's likely we'll
go into a short idle.
-- https://patchwork.kernel.org/patch/2839221/

Thirdly, we predict the next idle interval by using the prediction
fucntionality in the idle governor if it has.

For the threshold of the short idle interval, we record the timestamps of
the idle entry, and multiply by a tunable parameter at here:
-- /proc/sys/kernel/fast_idle_ratio

We use the output of the idle prediction to skip turning tick off if a
short idle is determined in this proposal. Reprogramming hardware timer
twice(off and on) is expensive for a very short idle. There are some
potential optimizations can be done according to the same indicator.

I observed when system is idle, the idle predictor reports 20/s long idle
and ZERO fast idle on one CPU. And when the workload is running, the idle
predictor reports 72899/s fast idle and ZERO long idle on the same CPU.

Aubrey Li (8):
  cpuidle: menu: extract prediction functionality
  cpuidle: record the overhead of idle entry
  cpuidle: add a new predict interface
  tick/nohz: keep tick on for a fast idle
  timers: keep sleep length updated as needed
  cpuidle: make fast idle threshold tunable
  cpuidle: introduce irq timing to make idle prediction
  cpuidle: introduce run queue average idle to make idle prediction

 drivers/cpuidle/Kconfig  |   1 +
 drivers/cpuidle/cpuidle.c| 109 +++
 drivers/cpuidle/governors/menu.c |  69 -
 include/linux/cpuidle.h  |  21 
 kernel/sched/idle.c  |  14 -
 kernel/sysctl.c  |  12 +
 kernel/time/tick-sched.c |   7 +++
 7 files changed, 209 insertions(+), 24 deletions(-)

-- 
2.7.4

Re: [lkp-robot] [blk] 47e0fb461f: BUG:unable_to_handle_kernel

2017-09-30 Thread NeilBrown

On Thu, Sep 21 2017, kernel test robot wrote:

> FYI, we noticed the following commit:
>
> commit: 47e0fb461fca1a68a566c82fcc006cc787312d8c ("blk: make the bioset 
> rescue_workqueue optional.")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> with following parameters:
>
>   runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 420M
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

Interesting.
I cannot see how that bug could be caused by that patch.

I think it is crashing in
static inline bool ata_is_host_link(const struct ata_link *link)
{
return link == &link->ap->link || link == link->ap->slave_link;
}
from
static inline int ata_link_max_devices(const struct ata_link *link)
{
if (ata_is_host_link(link) && link->ap->flags & ATA_FLAG_SLAVE_POSS)
return 2;
return 1;
}
from ata_dev_next().

I think %rdi holds link->ap, so the "link->ap->slave_link" dereference
causes the crash.

link->ap seems to be initialized quite early, and never cleared, so I
don't know how it could  be NULL...

Confused.

Thanks,
NeilBrown

signature.asc
Description: PGP signature

Re: [PATCH v2] netlink: do not proceed if dump's start() errs

2017-09-30 Thread Johannes Berg

On Thu, 2017-09-28 at 00:41 +0200, Jason A. Donenfeld wrote:
> Drivers that use the start method for netlink dumping rely on dumpit
> not
> being called if start fails. For example, ila_xlat.c allocates memory
> and assigns it to cb->args[0] in its start() function. It might fail
> to
> do that and return -ENOMEM instead. However, even when returning an
> error, dumpit will be called, which, in the example above, quickly
> dereferences the memory in cb->args[0], which will OOPS the kernel.
> This
> is but one example of how this goes wrong.
> 
> Since start() has always been a function with an int return type, it
> therefore makes sense to use it properly, rather than ignoring it.
> This
> patch thus returns early and does not call dumpit() when start()
> fails.
> 
> Signed-off-by: Jason A. Donenfeld 

Reviewed-by: Johannes Berg 


FWIW, I found another (indirect, via genetlink, like ila_xlat.c) in-
tree user that cares and expects the correct failure behaviour:

net/ipv6/seg6.c
.start  = seg6_genl_dumphmac_start,

which can also have memory allocation failures. No others appear to
exist, afaict.

Either way, perhaps it's worth sending this to stable for that reason.

johannes

[PATCH v2] Staging: rtl8723bs: Remove unnecessary comments.

2017-09-30 Thread Shreeya Patel

This patch removes unnecessary comments which are there
to explain why call to memset is in comments. Both of the
comments are not needed as they are not very useful.

Signed-off-by: Shreeya Patel 
---

Changes in v2:
  -Remove some more unnecessary comments and make the 
   commit message more appropriate.


 drivers/staging/rtl8723bs/core/rtw_mlme.c | 3 ---
 drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 3 ---
 drivers/staging/rtl8723bs/core/rtw_pwrctrl.c  | 2 --
 drivers/staging/rtl8723bs/core/rtw_recv.c | 4 
 drivers/staging/rtl8723bs/core/rtw_xmit.c | 3 ---
 5 files changed, 15 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme.c 
b/drivers/staging/rtl8723bs/core/rtw_mlme.c
index 6b77820..5b583f7 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme.c
@@ -28,9 +28,6 @@ sint  _rtw_init_mlme_priv(struct adapter *padapter)
struct mlme_priv*pmlmepriv = &padapter->mlmepriv;
sintres = _SUCCESS;
 
-   /*  We don't need to memset padapter->XXX to zero, because adapter is 
allocated by vzalloc(). */
-   /* memset((u8 *)pmlmepriv, 0, sizeof(struct mlme_priv)); */
-
pmlmepriv->nic_hdl = (u8 *)padapter;
 
pmlmepriv->pscanned = NULL;
diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c 
b/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
index b6d137f..ca35c1c 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
@@ -474,9 +474,6 @@ int init_mlme_ext_priv(struct adapter *padapter)
struct mlme_priv *pmlmepriv = &(padapter->mlmepriv);
struct mlme_ext_info *pmlmeinfo = &(pmlmeext->mlmext_info);
 
-   /*  We don't need to memset padapter->XXX to zero, because adapter is 
allocated by vzalloc(). */
-   /* memset((u8 *)pmlmeext, 0, sizeof(struct mlme_ext_priv)); */
-
pmlmeext->padapter = padapter;
 
/* fill_fwpriv(padapter, &(pmlmeext->fwpriv)); */
diff --git a/drivers/staging/rtl8723bs/core/rtw_pwrctrl.c 
b/drivers/staging/rtl8723bs/core/rtw_pwrctrl.c
index aabdaaf..820a061 100644
--- a/drivers/staging/rtl8723bs/core/rtw_pwrctrl.c
+++ b/drivers/staging/rtl8723bs/core/rtw_pwrctrl.c
@@ -1193,8 +1193,6 @@ void rtw_init_pwrctrl_priv(struct adapter *padapter)
 
 void rtw_free_pwrctrl_priv(struct adapter *adapter)
 {
-   /* memset((unsigned char *)pwrctrlpriv, 0, sizeof(struct 
pwrctrl_priv)); */
-
 #ifdef CONFIG_PNO_SUPPORT
if (pwrctrlpriv->pnlo_info != NULL)
printk("** pnlo_info memory leak\n");
diff --git a/drivers/staging/rtl8723bs/core/rtw_recv.c 
b/drivers/staging/rtl8723bs/core/rtw_recv.c
index 68a6303..73e6e41 100644
--- a/drivers/staging/rtl8723bs/core/rtw_recv.c
+++ b/drivers/staging/rtl8723bs/core/rtw_recv.c
@@ -46,9 +46,6 @@ sint _rtw_init_recv_priv(struct recv_priv *precvpriv, struct 
adapter *padapter)
union recv_frame *precvframe;
sintres = _SUCCESS;
 
-   /*  We don't need to memset padapter->XXX to zero, because adapter is 
allocated by vzalloc(). */
-   /* memset((unsigned char *)precvpriv, 0, sizeof (struct  recv_priv)); */
-
spin_lock_init(&precvpriv->lock);
 
_rtw_init_queue(&precvpriv->free_recv_queue);
@@ -65,7 +62,6 @@ sint _rtw_init_recv_priv(struct recv_priv *precvpriv, struct 
adapter *padapter)
res = _FAIL;
goto exit;
}
-   /* memset(precvpriv->pallocated_frame_buf, 0, NR_RECVFRAME * 
sizeof(union recv_frame) + RXFRAME_ALIGN_SZ); */
 
precvpriv->precv_frame_buf = (u8 
*)N_BYTE_ALIGMENT((SIZE_PTR)(precvpriv->pallocated_frame_buf), 
RXFRAME_ALIGN_SZ);
/* precvpriv->precv_frame_buf = precvpriv->pallocated_frame_buf + 
RXFRAME_ALIGN_SZ - */
diff --git a/drivers/staging/rtl8723bs/core/rtw_xmit.c 
b/drivers/staging/rtl8723bs/core/rtw_xmit.c
index 022f654..8cd05f8 100644
--- a/drivers/staging/rtl8723bs/core/rtw_xmit.c
+++ b/drivers/staging/rtl8723bs/core/rtw_xmit.c
@@ -51,9 +51,6 @@ s32   _rtw_init_xmit_priv(struct xmit_priv *pxmitpriv, struct 
adapter *padapter)
struct xmit_frame *pxframe;
sintres = _SUCCESS;
 
-   /*  We don't need to memset padapter->XXX to zero, because adapter is 
allocated by vzalloc(). */
-   /* memset((unsigned char *)pxmitpriv, 0, sizeof(struct xmit_priv)); */
-
spin_lock_init(&pxmitpriv->lock);
spin_lock_init(&pxmitpriv->lock_sctx);
sema_init(&pxmitpriv->xmit_sema, 0);
-- 
2.7.4

Re: [Part2 PATCH v4 02/29] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-09-30 Thread Borislav Petkov

On Fri, Sep 29, 2017 at 05:44:24PM -0500, Brijesh Singh wrote:
> Part1 is based on tip/master and Part2 is based on kvm/master.
> 
> With the current division, we should be able to compile and run part1
> and part2 independently. This patch defines X86_FEATURE_SEV which is
> currently been used by svm.c hence I kept the patch in Part2.
> 
> If we move it in Part1 then Part2 build will fail -- I am okay with
> including it as a pre-cursor to Part2 series. Is this something acceptable?

No no, don't do anything. I was just wondering about the reason for the move.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH v3 4/8] platform/x86: wmi: create character devices when requested by drivers

2017-09-30 Thread Greg Kroah-Hartman

On Fri, Sep 29, 2017 at 06:52:28PM -0700, Darren Hart wrote:
> 
> On Wed, Sep 27, 2017 at 11:02:16PM -0500, Mario Limonciello wrote:
> > For WMI operations that are only Set or Query read or write sysfs
> > attributes created by WMI vendor drivers make sense.
> > 
> > For other WMI operations that are run on Method, there needs to be a
> > way to guarantee to userspace that the results from the method call
> > belong to the data request to the method call.  Sysfs attributes don't
> > work well in this scenario because two userspace processes may be
> > competing at reading/writing an attribute and step on each other's
> > data.
> > 
> > When a WMI vendor driver declares a set of functions in a
> > file_operations object the WMI bus driver will create a character
> > device that maps to those file operations.
> > 
> > That character device will correspond to this path:
> > /dev/wmi/$driver
> > 
> > This policy is selected as one driver may map and use multiple
> > GUIDs and it would be better to only expose a single character
> > device.
> > 
> > The WMI vendor drivers will be responsible for managing access to
> > this character device and proper locking on it.
> > 
> > When a WMI vendor driver is unloaded the WMI bus driver will clean
> > up the character device.
> > 
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/platform/x86/wmi.c | 98 
> > +++---
> >  include/linux/wmi.h|  1 +
> >  2 files changed, 94 insertions(+), 5 deletions(-)
> 
> +Greg, Rafael, Matthew, and Christoph
> 
> You each provided feedback regarding the method of exposing WMI methods
> to userspace. This and subsequent patches from Mario lay some of the
> core groundwork.
> 
> They implement an implicit whitelist as only drivers requesting the char
> dev will see it created.
> 
> https://lkml.org/lkml/2017/9/28/8

If you want patchs reviewed, it's best to actually cc: us on the patch
itself :(

Re: [PATCH v2 1/7] driver core: emit uevents when device is bound to a driver

2017-09-30 Thread Greg Kroah-Hartman

On Fri, Sep 29, 2017 at 04:23:20PM -0700, Dmitry Torokhov wrote:
> On Fri, Sep 29, 2017 at 07:40:15PM +, Ruhl, Michael J wrote:
> > > -Original Message-
> > > From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
> > > Behalf Of Dan Williams
> > > Sent: Friday, September 29, 2017 3:37 PM
> > > To: Dmitry Torokhov 
> > > Cc: Greg Kroah-Hartman ; Tejun Heo
> > > ; Linux Kernel Mailing List  > > ker...@vger.kernel.org>; Guenter Roeck ; Ruhl,
> > > Michael J 
> > > Subject: Re: [PATCH v2 1/7] driver core: emit uevents when device is bound
> > > to a driver
> > > 
> > > On Wed, Jul 19, 2017 at 5:24 PM, Dmitry Torokhov
> > >  wrote:
> > > > There are certain touch controllers that may come up in either normal
> > > > (application) or boot mode, depending on whether firmware/configuration
> > > is
> > > > corrupted when they are powered on. In boot mode the kernel does not
> > > create
> > > > input device instance (because it does not necessarily know the
> > > > characteristics of the input device in question).
> > > >
> > > > Another number of controllers does not store firmware in a non-volatile
> > > > memory, and they similarly need to have firmware loaded before input
> > > device
> > > > instance is created. There are also other types of devices with similar
> > > > behavior.
> > > >
> > > > There is a desire to be able to trigger firmware loading via udev, but 
> > > > it
> > > > has to happen only when driver is bound to a physical device (i2c or 
> > > > spi).
> > > > These udev actions can not use ADD events, as those happen too early, so
> > > we
> > > > are introducing BIND and UNBIND events that are emitted at the right
> > > > moment.
> > > >
> > > > Also, many drivers create additional driver-specific device attributes
> > > > when binding to the device, to provide userspace with additional 
> > > > controls.
> > > > The new events allow userspace to adjust these driver-specific 
> > > > attributes
> > > > without worrying that they are not there yet.
> > > >
> > > > Signed-off-by: Dmitry Torokhov 
> > > 
> > > Hi Dmitry,
> > > 
> > > Mike (cc'd) reports a regression with this change:
> > > 
> > > ---
> > > 
> > > Previously, if I did:
> > > 
> > > # rmmod hfi1
> > > 
> > > The driver would be removed.
> > > 
> > > With 4.14.0-rc2+, when I remove the driver, the PCI bus is
> > > automatically re-probed and the driver re-loaded.
> > > 
> > > ---
> > > 
> > > A bisect points to commit 1455cf8dbfd0 "driver core: emit uevents when
> > > device is bound to a driver". I'm sending this because I have this
> > > mail in my archive, but I'll let Mike follow up with any other
> > > details.
> > 
> > My test environment is RedHat 7.3 GA + 4.14.0-rc2 kernel.
> > 
> > Blacklisting the driver keeps it from being autoloaded, but this didn't 
> > seem correct.
> > 
> > With the 4.13.x branch this did not occur
> 
> Yeah, udev is being stupid. Either change ACTION=="remove" to
> ACTION!="add" in /lib/udev/rules.d/80-drivers.rules or pick this patch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/commit/?h=driver-core-linus&id=6878e7de6af726de47f9f3bec649c3f49e786586

I have this fix queued up for Linus, sorry for the delay in getting it
to him, hope to do that today...

thanks,

greg k-h

[RFC PATCH 1/2] block: Treat all read ops as synchronous

2017-09-30 Thread Jeffy Chen

We added some in/out ops(eg. REQ_OP_SCSI_IN/OUT), but currently the
op_is_sync() is only checking REQ_OP_READ.

So treat all read ops as synchronous.

Fixes: aebf526b53ae ("block: fold cmd_type into the REQ_OP_ space")
Signed-off-by: Jeffy Chen 
---

 include/linux/blk_types.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..9b623ea4faa6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -288,8 +288,7 @@ static inline bool op_is_flush(unsigned int op)
  */
 static inline bool op_is_sync(unsigned int op)
 {
-   return (op & REQ_OP_MASK) == REQ_OP_READ ||
-   (op & (REQ_SYNC | REQ_FUA | REQ_PREFLUSH));
+   return !op_is_write(op) || (op & (REQ_SYNC | REQ_FUA | REQ_PREFLUSH));
 }
 
 typedef unsigned int blk_qc_t;
-- 
2.11.0

[RFC PATCH 2/2] block/cfq: Fix memory leak of async cfqq

2017-09-30 Thread Jeffy Chen

Currently we only unref the async cfqqs in cfq_pd_offline, which would
not be called when CONFIG_CFQ_GROUP_IOSCHED is disabled.

Kmemleak reported:
unreferenced object 0xffc0cd9fc000 (size 240):
  comm "kworker/3:1", pid 52, jiffies 4294673527 (age 97.149s)
  hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 80 55 13 cf c0 ff ff ff  .U..
10 c0 9f cd c0 ff ff ff 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x58/0x8c
[] kmem_cache_alloc+0x184/0x268
[] cfq_get_queue.isra.11+0x144/0x2e0
[] cfq_set_request+0x1bc/0x444
[] elv_set_request+0x88/0x9c
[] get_request+0x494/0x914
[] blk_get_request+0xdc/0x160
[] scsi_execute+0x70/0x23c
[] scsi_test_unit_ready+0xf4/0x1ec

Signed-off-by: Jeffy Chen 
---

 block/cfq-iosched.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9f342ef1ad42..75773eabecc7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -401,6 +401,7 @@ struct cfq_data {
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
 static void cfq_put_queue(struct cfq_queue *cfqq);
+static void cfqg_offline(struct cfq_group *cfqg);
 
 static struct cfq_rb_root *st_for(struct cfq_group *cfqg,
enum wl_class_t class,
@@ -1638,17 +1639,8 @@ static void cfq_pd_init(struct blkg_policy_data *pd)
 static void cfq_pd_offline(struct blkg_policy_data *pd)
 {
struct cfq_group *cfqg = pd_to_cfqg(pd);
-   int i;
-
-   for (i = 0; i < IOPRIO_BE_NR; i++) {
-   if (cfqg->async_cfqq[0][i])
-   cfq_put_queue(cfqg->async_cfqq[0][i]);
-   if (cfqg->async_cfqq[1][i])
-   cfq_put_queue(cfqg->async_cfqq[1][i]);
-   }
 
-   if (cfqg->async_idle_cfqq)
-   cfq_put_queue(cfqg->async_idle_cfqq);
+   cfqg_offline(cfqg);
 
/*
 * @blkg is going offline and will be ignored by
@@ -3741,6 +3733,21 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct 
cfq_queue *cfqq,
cfqq->pid = pid;
 }
 
+static void cfqg_offline(struct cfq_group *cfqg)
+{
+   int i;
+
+   for (i = 0; i < IOPRIO_BE_NR; i++) {
+   if (cfqg->async_cfqq[0][i])
+   cfq_put_queue(cfqg->async_cfqq[0][i]);
+   if (cfqg->async_cfqq[1][i])
+   cfq_put_queue(cfqg->async_cfqq[1][i]);
+   }
+
+   if (cfqg->async_idle_cfqq)
+   cfq_put_queue(cfqg->async_idle_cfqq);
+}
+
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
 static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
 {
@@ -4564,6 +4571,7 @@ static void cfq_exit_queue(struct elevator_queue *e)
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
blkcg_deactivate_policy(q, &blkcg_policy_cfq);
 #else
+   cfqg_offline(cfqd->root_group);
kfree(cfqd->root_group);
 #endif
kfree(cfqd);
-- 
2.11.0

Re: [PATCH v2 0/3] Add support rockchip RGB output interface

2017-09-30 Thread Sandy Huang


Do you  have any suggests for this series of patches?
Or apply to drm-misc-next?

在 2017/9/22 11:00, Sandy Huang 写道:

This patches add support rockchip RGB output, Some Rockchip CRTCs, like rv1108,
can directly output parallel and serial RGB data to panel or to conversion chip.
So we add this driver to probe encoder and connector to support this case.

Sandy Huang (3):
   dt-bindings: Add document for rockchip RGB output interface
   drm/rockchip: Add support for Rockchip Soc RGB output interface
   drm/rockchip: vop: Add more RGB output interface type

  .../bindings/display/rockchip/rockchip-rgb.txt |  78 +
  drivers/gpu/drm/rockchip/Kconfig   |   9 +
  drivers/gpu/drm/rockchip/Makefile  |   1 +
  drivers/gpu/drm/rockchip/rockchip_drm_drv.c|   2 +
  drivers/gpu/drm/rockchip/rockchip_drm_drv.h|   1 +
  drivers/gpu/drm/rockchip/rockchip_drm_vop.h|   2 +
  drivers/gpu/drm/rockchip/rockchip_rgb.c| 343 +
  7 files changed, 436 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/display/rockchip/rockchip-rgb.txt
  create mode 100644 drivers/gpu/drm/rockchip/rockchip_rgb.c

Re: [RFC 0/5] x86/intel_rdt: Better diagnostics

2017-09-30 Thread Borislav Petkov

On Thu, Sep 21, 2017 at 02:08:15PM +0200, Borislav Petkov wrote:
> On Mon, Sep 18, 2017 at 03:18:38PM -0700, Luck, Tony wrote:
> > From: Tony Luck 
> > 
> > Chatting online with Boris to diagnose why his test cases for RDT
> > weren't working, we came up with either a good idea (in which case
> > I credit Boris) or a dumb one (in which case this is all my fault).
> 
> Ha! I can share the fault, no worries :-)
> 
> I'll test them on my box when I get a chance.

Ok, I ran latest tip/master which has your patches:

# mkdir p0 p1
# echo "L3:0=00fff;1=ff000\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
bash: echo: write error: Invalid argument
# cat info/last_cmd_status
non-hex character in mask ff000\nMB:0=50

<--- I think this needs to be fixed in the doc examples to say:

echo -e "L3:0=00fff;1=ff000\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata

i.e., you need to supply -e in order to interpret backslash chars.

# echo -e "L3:0=00fff;1=ff000\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
bash: echo: write error: Invalid argument
# cat info/last_cmd_status
unknown/unsupported resource name 'MB'

<--- Yap, much better.

# mkdir c1
# cat c1/schemata
L3:0=f;1=f
# echo "L3:0=3;1=3" > c1/schemata
# cat info/last_cmd_status
ok
# echo 1 > c1/tasks
# cat info/last_cmd_status
ok
#

Yap, thanks for doing this!

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

[PATCH] crypto: ccp: Build the AMD secure processor driver only with AMD CPU support

2017-09-30 Thread Borislav Petkov

Hi,

just a small Kconfig correction. Feel free to add it to your patchset.

Thx.

---
From: Borislav Petkov 

This is AMD-specific hardware so present it in Kconfig only when AMD
CPU support is enabled.

Signed-off-by: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Tom Lendacky 
Cc: Gary Hook 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-cry...@vger.kernel.org
---
 drivers/crypto/ccp/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 627f3e61dcac..f58a6521270b 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -1,5 +1,6 @@
 config CRYPTO_DEV_CCP_DD
tristate "Secure Processor device driver"
+   depends on CPU_SUP_AMD
default m
help
  Provides AMD Secure Processor device driver.
-- 
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [Outreachy kernel] [PATCH 6/6] Staging: rtl8188eu: core: Use list_entry instead of container_of

2017-09-30 Thread Julia Lawall



On Sat, 30 Sep 2017, Srishti Sharma wrote:

> For variables of type struct list_head* use list_entry to access
> current list element instead of using container_of. Done by the
> following semantic patch by coccinelle.
>
> @r@
> struct list_head* l;
> @@
>
> -container_of
> +list_entry
>   (l,...)
>
> Signed-off-by: Srishti Sharma 
> ---
>  drivers/staging/rtl8188eu/core/rtw_xmit.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c 
> b/drivers/staging/rtl8188eu/core/rtw_xmit.c
> index be2f46e..29e9ee9 100644
> --- a/drivers/staging/rtl8188eu/core/rtw_xmit.c
> +++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c
> @@ -1401,7 +1401,7 @@ void rtw_free_xmitframe_queue(struct xmit_priv 
> *pxmitpriv, struct __queue *pfram
>   plist = phead->next;
>
>   while (phead != plist) {
> - pxmitframe = container_of(plist, struct xmit_frame, list);
> + pxmitframe = list_entry(plist, struct xmit_frame, list);
>
>   plist = plist->next;

It looks to me like this loop could be rewritten using
list_for_each_entry_safe.  The entry is because you only do something
interesting with the entry and the safe is because the elements of the
list are deleted along the way.  Perhaps the others canbe changed this way
too.

julia



>
> @@ -1432,7 +1432,7 @@ static struct xmit_frame *dequeue_one_xmitframe(struct 
> xmit_priv *pxmitpriv, str
>   xmitframe_plist = xmitframe_phead->next;
>
>   if (xmitframe_phead != xmitframe_plist) {
> - pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
> list);
> + pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
> list);
>
>   xmitframe_plist = xmitframe_plist->next;
>
> @@ -1473,7 +1473,7 @@ struct xmit_frame *rtw_dequeue_xframe(struct xmit_priv 
> *pxmitpriv, struct hw_xmi
>   sta_plist = sta_phead->next;
>
>   while (sta_phead != sta_plist) {
> - ptxservq = container_of(sta_plist, struct tx_servq, 
> tx_pending);
> + ptxservq = list_entry(sta_plist, struct tx_servq, 
> tx_pending);
>
>   pframe_queue = &ptxservq->sta_pending;
>
> @@ -1811,7 +1811,7 @@ static void dequeue_xmitframes_to_sleeping_queue(struct 
> adapter *padapter, struc
>   plist = phead->next;
>
>   while (phead != plist) {
> - pxmitframe = container_of(plist, struct xmit_frame, list);
> + pxmitframe = list_entry(plist, struct xmit_frame, list);
>
>   plist = plist->next;
>
> @@ -1878,7 +1878,7 @@ void wakeup_sta_to_xmit(struct adapter *padapter, 
> struct sta_info *psta)
>   xmitframe_plist = xmitframe_phead->next;
>
>   while (xmitframe_phead != xmitframe_plist) {
> - pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
> list);
> + pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
> list);
>
>   xmitframe_plist = xmitframe_plist->next;
>
> @@ -1959,7 +1959,7 @@ void wakeup_sta_to_xmit(struct adapter *padapter, 
> struct sta_info *psta)
>   xmitframe_plist = xmitframe_phead->next;
>
>   while (xmitframe_phead != xmitframe_plist) {
> - pxmitframe = container_of(xmitframe_plist, struct 
> xmit_frame, list);
> + pxmitframe = list_entry(xmitframe_plist, struct 
> xmit_frame, list);
>
>   xmitframe_plist = xmitframe_plist->next;
>
> @@ -2006,7 +2006,7 @@ void xmit_delivery_enabled_frames(struct adapter 
> *padapter, struct sta_info *pst
>   xmitframe_plist = xmitframe_phead->next;
>
>   while (xmitframe_phead != xmitframe_plist) {
> - pxmitframe = container_of(xmitframe_plist, struct xmit_frame, 
> list);
> + pxmitframe = list_entry(xmitframe_plist, struct xmit_frame, 
> list);
>
>   xmitframe_plist = xmitframe_plist->next;
>
> --
> 2.7.4
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/778fb00439b44bd4a9e1ef06b1d619f9e4f5525f.1506755266.git.srishtishar%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

Re: [PATCH] mm, hugetlb: fix "treat_as_movable" condition in htlb_alloc_mask

2017-09-30 Thread Alexandru Moise

On Fri, Sep 29, 2017 at 02:16:10PM -0700, Mike Kravetz wrote:
> Adding Anshuman
> 
> On 09/29/2017 01:43 PM, Alexandru Moise wrote:
> > On Fri, Sep 29, 2017 at 05:13:39PM +0200, Alexandru Moise wrote:
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index 424b0ef08a60..ab28de0122af 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -926,7 +926,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
> >> hstate *h, gfp_t gfp_mask,
> >>  /* Movability of hugepages depends on migration support. */
> >>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
> >>  {
> >> -  if (hugepages_treat_as_movable || hugepage_migration_supported(h))
> >> +  if (hugepages_treat_as_movable && hugepage_migration_supported(h))
> >>return GFP_HIGHUSER_MOVABLE;
> >>else
> >>return GFP_HIGHUSER;
> >> -- 
> >> 2.14.2
> >>
> > 
> > I seem to have terribly misunderstood the semantics of this flag wrt 
> > hugepages,
> > please ignore this for now.
> 
> That is Okay, it made me look at this code more closely.
> 
> static inline bool hugepage_migration_supported(struct hstate *h)
> {
> #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> if ((huge_page_shift(h) == PMD_SHIFT) ||
> (huge_page_shift(h) == PGDIR_SHIFT))
> return true;
> else
> return false;
> #else
> return false;
> #endif
> }

The real problem is that I still get movable hugepages somehow
even when that hugepages_treat_as_movable is 0, I need to dig
a bit deeper because this behavior really should be optional.

Tools like mcelog are not hugepage aware (IIRC) so users should be able
to rather choose between the balance of having their hugepage using
application run for longer or run with the higher risk of memory
corruption.

> 
> So, hugepage_migration_supported() can only return true if
> ARCH_ENABLE_HUGEPAGE_MIGRATION is defined.  Commit c177c81e09e5
> restricts hugepage_migration_support to x86_64.  So,
> ARCH_ENABLE_HUGEPAGE_MIGRATION is only defined for x86_64.
Hmm?

linux$ grep -rin ARCH_ENABLE_HUGEPAGE_MIGRATION *
arch/powerpc/platforms/Kconfig.cputype:311:config ARCH_ENABLE_HUGEPAGE_MIGRATION
arch/x86/Kconfig:2345:config ARCH_ENABLE_HUGEPAGE_MIGRATION

It is present on PPC_BOOK3S_64

../Alex

> 
> Commit 94310cbcaa3c added the ability to migrate gigantic hugetlb pages
> at the PGD level.  This added the check for PGD level pages to
> hugepage_migration_supported(), which is only there if
> ARCH_ENABLE_HUGEPAGE_MIGRATION is defined.  IIUC, this functionality
> was added for powerpc.  Yet, powerpc does not define
> ARCH_ENABLE_HUGEPAGE_MIGRATION (unless I am missing something).
> 
> -- 
> Mike Kravetz

Re: [Part1 PATCH v5.1 02/17] x86/mm: Add Secure Encrypted Virtualization (SEV) support

2017-09-30 Thread Borislav Petkov

On Fri, Sep 29, 2017 at 04:27:47PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> Provide support for Secure Encrypted Virtualization (SEV). This initial
> support defines a flag that is used by the kernel to determine if it is
> running with SEV active.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Andy Lutomirski 
> Cc: linux-kernel@vger.kernel.org
> Cc: x...@kernel.org
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
> 
> Hi Boris,
> 
> Similar to the sme_me_mask, sev_enabled must live in .data section otherwise 
> it
> will get zero'ed in clear_bss() and we will loose the value. I have 
> encountered
> this issue when booting SEV guest using qemu's -kernel option.

Ah, good catch.

> I have removed your R-b since was not sure if you are still okay with the 
> change.

Sure, looks good still.

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH] nvme-pci: Use PCI bus address for data/queues in CMB

2017-09-30 Thread Abhishek Shah

Hi Keith,

On Fri, Sep 29, 2017 at 8:12 PM, Keith Busch  wrote:
>
> On Fri, Sep 29, 2017 at 10:59:26AM +0530, Abhishek Shah wrote:
> > Currently, NVMe PCI host driver is programming CMB dma address as
> > I/O SQs addresses. This results in failures on systems where 1:1
> > outbound mapping is not used (example Broadcom iProc SOCs) because
> > CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
> > try to access CMB using dma address.
> >
> > To have CMB working on systems without 1:1 outbound mapping, we
> > program PCI bus address for I/O SQs instead of dma address. This
> > approach will work on systems with/without 1:1 outbound mapping.
> >
> > The patch is tested on Broadcom Stingray platform(arm64), which
> > does not have 1:1 outbound mapping, as well as on x86 platform,
> > which has 1:1 outbound mapping.
> >
> > Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Abhishek Shah 
> > Reviewed-by: Anup Patel 
> > Reviewed-by: Ray Jui 
> > Reviewed-by: Scott Branden 
>
> Thanks for the patch.
>
> On a similar note, we also break CMB usage in virutalization with direct
> assigned devices: the guest doesn't know the host physical bus address,
> so it sets the CMB queue address incorrectly there, too. I don't know of
> a way to fix that other than disabling CMB.

I don't have much idea on CMB usage in virtualization... will let
someone else comment on this.
>
>
>
> >  static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
> >  {
> > + int rc;
> >   u64 szu, size, offset;
> >   resource_size_t bar_size;
> >   struct pci_dev *pdev = to_pci_dev(dev->dev);
> > @@ -1553,6 +1574,13 @@ static void __iomem *nvme_map_cmb(struct nvme_dev 
> > *dev)
> >
> >   dev->cmb_dma_addr = dma_addr;
> >   dev->cmb_size = size;
> > +
> > + rc = nvme_find_cmb_bus_addr(pdev, dma_addr, size, &dev->cmb_bus_addr);
> > + if (rc) {
> > + iounmap(cmb);
> > + return NULL;
> > + }
> > +
> >   return cmb;
> >  }
>
> Minor suggestion: it's a little simpler if you find the bus address
> before ioremap:
>
> ---
> @@ -1554,6 +1554,10 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
> size = bar_size - offset;
>
> dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + 
> offset;
> +
> +   if (nvme_find_cmb_bus_addr(pdev, dma_addr, size, &dev->cmb_bus_addr))
> +   return NULL;
> +
> cmb = ioremap_wc(dma_addr, size);
> if (!cmb)
> return NULL;
> --

Thanks for the suggestion, will push patch with this change.


Regards,
Abhishek

[PATCH for-next 3/4] RDMA/hns: Update the IRRL table chunk size in hip08

2017-09-30 Thread Wei Hu (Xavier)

As the increase of the IRRL specification in hip08, the IRRL table
chunk size needs to be updated.
This patch updates the IRRL table chunk size to 256k for hip08.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  3 +++
 drivers/infiniband/hw/hns/hns_roce_hem.c| 31 ++---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  1 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  2 ++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  1 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  2 ++
 6 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 9353400..fc2a53d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -236,6 +236,8 @@ struct hns_roce_hem_table {
unsigned long   num_obj;
/*Single obj size */
unsigned long   obj_size;
+   unsigned long   table_chunk_size;
+   unsigned long   hem_alloc_size;
int lowmem;
struct mutexmutex;
struct hns_roce_hem **hem;
@@ -565,6 +567,7 @@ struct hns_roce_caps {
u32 cqe_ba_pg_sz;
u32 cqe_buf_pg_sz;
u32 cqe_hop_num;
+   u32 chunk_sz;   /* chunk size in non multihop mode*/
 };
 
 struct hns_roce_hw {
diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c 
b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 4a3d1d4..c08bc16 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -36,9 +36,6 @@
 #include "hns_roce_hem.h"
 #include "hns_roce_common.h"
 
-#define HNS_ROCE_HEM_ALLOC_SIZE(1 << 17)
-#define HNS_ROCE_TABLE_CHUNK_SIZE  (1 << 17)
-
 #define DMA_ADDR_T_SHIFT   12
 #define BT_BA_SHIFT32
 
@@ -314,7 +311,7 @@ static int hns_roce_set_hem(struct hns_roce_dev *hr_dev,
 
/* Find the HEM(Hardware Entry Memory) entry */
unsigned long i = (obj & (table->num_obj - 1)) /
- (HNS_ROCE_TABLE_CHUNK_SIZE / table->obj_size);
+ (table->table_chunk_size / table->obj_size);
 
switch (table->type) {
case HEM_TYPE_QPC:
@@ -559,7 +556,7 @@ int hns_roce_table_get(struct hns_roce_dev *hr_dev,
if (hns_roce_check_whether_mhop(hr_dev, table->type))
return hns_roce_table_mhop_get(hr_dev, table, obj);
 
-   i = (obj & (table->num_obj - 1)) / (HNS_ROCE_TABLE_CHUNK_SIZE /
+   i = (obj & (table->num_obj - 1)) / (table->table_chunk_size /
 table->obj_size);
 
mutex_lock(&table->mutex);
@@ -570,8 +567,8 @@ int hns_roce_table_get(struct hns_roce_dev *hr_dev,
}
 
table->hem[i] = hns_roce_alloc_hem(hr_dev,
-  HNS_ROCE_TABLE_CHUNK_SIZE >> PAGE_SHIFT,
-  HNS_ROCE_HEM_ALLOC_SIZE,
+  table->table_chunk_size >> PAGE_SHIFT,
+  table->hem_alloc_size,
   (table->lowmem ? GFP_KERNEL :
GFP_HIGHUSER) | __GFP_NOWARN);
if (!table->hem[i]) {
@@ -720,7 +717,7 @@ void hns_roce_table_put(struct hns_roce_dev *hr_dev,
}
 
i = (obj & (table->num_obj - 1)) /
-   (HNS_ROCE_TABLE_CHUNK_SIZE / table->obj_size);
+   (table->table_chunk_size / table->obj_size);
 
mutex_lock(&table->mutex);
 
@@ -757,8 +754,8 @@ void *hns_roce_table_find(struct hns_roce_dev *hr_dev,
 
if (!hns_roce_check_whether_mhop(hr_dev, table->type)) {
idx = (obj & (table->num_obj - 1)) * table->obj_size;
-   hem = table->hem[idx / HNS_ROCE_TABLE_CHUNK_SIZE];
-   dma_offset = offset = idx % HNS_ROCE_TABLE_CHUNK_SIZE;
+   hem = table->hem[idx / table->table_chunk_size];
+   dma_offset = offset = idx % table->table_chunk_size;
} else {
hns_roce_calc_hem_mhop(hr_dev, table, &mhop_obj, &mhop);
/* mtt mhop */
@@ -815,7 +812,7 @@ int hns_roce_table_get_range(struct hns_roce_dev *hr_dev,
 unsigned long start, unsigned long end)
 {
struct hns_roce_hem_mhop mhop;
-   unsigned long inc = HNS_ROCE_TABLE_CHUNK_SIZE / table->obj_size;
+   unsigned long inc = table->table_chunk_size / table->obj_size;
unsigned long i;
int ret;
 
@@ -846,7 +843,7 @@ void hns_roce_table_put_range(struct hns_roce_dev *hr_dev,
  unsigned long start, unsigned long end)
 {
struct hns_roce_hem_mhop mhop;
-   unsigned long inc = HNS_ROCE_TABLE_CHUNK_SIZE / table->obj_size;
+   unsigned long inc = table->table_chunk_size / table->obj_size;
unsigned long i;

[PATCH for-next 2/4] RDMA/hns: Add IOMMU enable support in hip08

2017-09-30 Thread Wei Hu (Xavier)

If the IOMMU is enabled, the length of sg obtained from
__iommu_map_sg_attrs is not 4kB. When the IOVA is set with the sg
dma address, the IOVA will not be page continuous. and the VA
returned from dma_alloc_coherent is a vmalloc address. However,
the VA obtained by the page_address is a discontinuous VA. Under
these circumstances, the IOVA should be calculated based on the
sg length, and record the VA returned from dma_alloc_coherent
in the struct of hem.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c |  5 -
 drivers/infiniband/hw/hns/hns_roce_hem.c   | 30 +++---
 drivers/infiniband/hw/hns/hns_roce_hem.h   |  6 ++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 22 +++---
 4 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 3e4c525..a69cd4b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -243,7 +243,10 @@ int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 
size, u32 max_direct,
goto err_free;
 
for (i = 0; i < buf->nbufs; ++i)
-   pages[i] = virt_to_page(buf->page_list[i].buf);
+   pages[i] =
+   is_vmalloc_addr(buf->page_list[i].buf) ?
+   vmalloc_to_page(buf->page_list[i].buf) :
+   virt_to_page(buf->page_list[i].buf);
 
buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP,
   PAGE_KERNEL);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c 
b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 8388ae2..4a3d1d4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -200,6 +200,7 @@ static struct hns_roce_hem *hns_roce_alloc_hem(struct 
hns_roce_dev *hr_dev,
   gfp_t gfp_mask)
 {
struct hns_roce_hem_chunk *chunk = NULL;
+   struct hns_roce_vmalloc *vmalloc;
struct hns_roce_hem *hem;
struct scatterlist *mem;
int order;
@@ -227,6 +228,7 @@ static struct hns_roce_hem *hns_roce_alloc_hem(struct 
hns_roce_dev *hr_dev,
sg_init_table(chunk->mem, HNS_ROCE_HEM_CHUNK_LEN);
chunk->npages = 0;
chunk->nsg = 0;
+   memset(chunk->vmalloc, 0, sizeof(chunk->vmalloc));
list_add_tail(&chunk->list, &hem->chunk_list);
}
 
@@ -243,7 +245,15 @@ static struct hns_roce_hem *hns_roce_alloc_hem(struct 
hns_roce_dev *hr_dev,
if (!buf)
goto fail;
 
-   sg_set_buf(mem, buf, PAGE_SIZE << order);
+   if (is_vmalloc_addr(buf)) {
+   vmalloc = &chunk->vmalloc[chunk->npages];
+   vmalloc->is_vmalloc_addr = true;
+   vmalloc->vmalloc_addr = buf;
+   sg_set_page(mem, vmalloc_to_page(buf),
+   PAGE_SIZE << order, offset_in_page(buf));
+   } else {
+   sg_set_buf(mem, buf, PAGE_SIZE << order);
+   }
WARN_ON(mem->offset);
sg_dma_len(mem) = PAGE_SIZE << order;
 
@@ -262,17 +272,25 @@ static struct hns_roce_hem *hns_roce_alloc_hem(struct 
hns_roce_dev *hr_dev,
 void hns_roce_free_hem(struct hns_roce_dev *hr_dev, struct hns_roce_hem *hem)
 {
struct hns_roce_hem_chunk *chunk, *tmp;
+   void *cpu_addr;
int i;
 
if (!hem)
return;
 
list_for_each_entry_safe(chunk, tmp, &hem->chunk_list, list) {
-   for (i = 0; i < chunk->npages; ++i)
+   for (i = 0; i < chunk->npages; ++i) {
+   if (chunk->vmalloc[i].is_vmalloc_addr)
+   cpu_addr = chunk->vmalloc[i].vmalloc_addr;
+   else
+   cpu_addr =
+  lowmem_page_address(sg_page(&chunk->mem[i]));
+
dma_free_coherent(hr_dev->dev,
   chunk->mem[i].length,
-  lowmem_page_address(sg_page(&chunk->mem[i])),
+  cpu_addr,
   sg_dma_address(&chunk->mem[i]));
+   }
kfree(chunk);
}
 
@@ -774,6 +792,12 @@ void *hns_roce_table_find(struct hns_roce_dev *hr_dev,
 
if (chunk->mem[i].length > (u32)offset) {
page = sg_page(&chunk->mem[i]);
+   if (chunk->vmalloc[i].is_vmalloc_addr) {
+

[PATCH for-next 0/4] Add Features & Code improvements for hip08

2017-09-30 Thread Wei Hu (Xavier)

This patch-set introduce PBL page size configuration support,IOMMU
support, updating PD&CQE&MTT specification and IRRL table chunk
size for hip08.

Shaobo Xu (1):
  RDMA/hns: Support WQE/CQE/PBL page size configurable feature in hip08

Wei Hu (Xavier) (3):
  RDMA/hns: Add IOMMU enable support in hip08
  RDMA/hns: Update the IRRL table chunk size in hip08
  RDMA/hns: Update the PD&CQE&MTT specification in hip08

 drivers/infiniband/hw/hns/hns_roce_alloc.c  | 34 +++
 drivers/infiniband/hw/hns/hns_roce_cq.c | 21 ++-
 drivers/infiniband/hw/hns/hns_roce_device.h | 13 ++--
 drivers/infiniband/hw/hns/hns_roce_hem.c| 61 +--
 drivers/infiniband/hw/hns/hns_roce_hem.h|  6 ++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  1 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  2 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 23 ---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  | 10 ++--
 drivers/infiniband/hw/hns/hns_roce_mr.c | 93 -
 drivers/infiniband/hw/hns/hns_roce_qp.c | 46 ++
 11 files changed, 222 insertions(+), 88 deletions(-)

-- 
1.9.1

[PATCH for-next 1/4] RDMA/hns: Support WQE/CQE/PBL page size configurable feature in hip08

2017-09-30 Thread Wei Hu (Xavier)

From: Shaobo Xu 

This patch updates to support WQE, CQE and PBL page size configurable
feature, which includes base address page size and buffer page size.

Signed-off-by: Shaobo Xu 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  | 29 +
 drivers/infiniband/hw/hns/hns_roce_cq.c | 21 ++-
 drivers/infiniband/hw/hns/hns_roce_device.h | 10 ++--
 drivers/infiniband/hw/hns/hns_roce_mr.c | 93 -
 drivers/infiniband/hw/hns/hns_roce_qp.c | 46 ++
 5 files changed, 142 insertions(+), 57 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 8c9a33f..3e4c525 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -167,12 +167,12 @@ void hns_roce_buf_free(struct hns_roce_dev *hr_dev, u32 
size,
if (buf->nbufs == 1) {
dma_free_coherent(dev, size, buf->direct.buf, buf->direct.map);
} else {
-   if (bits_per_long == 64)
+   if (bits_per_long == 64 && buf->page_shift == PAGE_SHIFT)
vunmap(buf->direct.buf);
 
for (i = 0; i < buf->nbufs; ++i)
if (buf->page_list[i].buf)
-   dma_free_coherent(dev, PAGE_SIZE,
+   dma_free_coherent(dev, 1 << buf->page_shift,
  buf->page_list[i].buf,
  buf->page_list[i].map);
kfree(buf->page_list);
@@ -181,20 +181,27 @@ void hns_roce_buf_free(struct hns_roce_dev *hr_dev, u32 
size,
 EXPORT_SYMBOL_GPL(hns_roce_buf_free);
 
 int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, u32 max_direct,
-  struct hns_roce_buf *buf)
+  struct hns_roce_buf *buf, u32 page_shift)
 {
int i = 0;
dma_addr_t t;
struct page **pages;
struct device *dev = hr_dev->dev;
u32 bits_per_long = BITS_PER_LONG;
+   u32 page_size = 1 << page_shift;
+   u32 order;
 
/* SQ/RQ buf lease than one page, SQ + RQ = 8K */
if (size <= max_direct) {
buf->nbufs = 1;
/* Npages calculated by page_size */
-   buf->npages = 1 << get_order(size);
-   buf->page_shift = PAGE_SHIFT;
+   order = get_order(size);
+   if (order <= page_shift - PAGE_SHIFT)
+   order = 0;
+   else
+   order -= page_shift - PAGE_SHIFT;
+   buf->npages = 1 << order;
+   buf->page_shift = page_shift;
/* MTT PA must be recorded in 4k alignment, t is 4k aligned */
buf->direct.buf = dma_alloc_coherent(dev, size, &t, GFP_KERNEL);
if (!buf->direct.buf)
@@ -209,9 +216,9 @@ int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 
size, u32 max_direct,
 
memset(buf->direct.buf, 0, size);
} else {
-   buf->nbufs = (size + PAGE_SIZE - 1) / PAGE_SIZE;
+   buf->nbufs = (size + page_size - 1) / page_size;
buf->npages = buf->nbufs;
-   buf->page_shift = PAGE_SHIFT;
+   buf->page_shift = page_shift;
buf->page_list = kcalloc(buf->nbufs, sizeof(*buf->page_list),
 GFP_KERNEL);
 
@@ -220,16 +227,16 @@ int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 
size, u32 max_direct,
 
for (i = 0; i < buf->nbufs; ++i) {
buf->page_list[i].buf = dma_alloc_coherent(dev,
- PAGE_SIZE, &t,
+ page_size, &t,
  GFP_KERNEL);
 
if (!buf->page_list[i].buf)
goto err_free;
 
buf->page_list[i].map = t;
-   memset(buf->page_list[i].buf, 0, PAGE_SIZE);
+   memset(buf->page_list[i].buf, 0, page_size);
}
-   if (bits_per_long == 64) {
+   if (bits_per_long == 64 && page_shift == PAGE_SHIFT) {
pages = kmalloc_array(buf->nbufs, sizeof(*pages),
  GFP_KERNEL);
if (!pages)
@@ -243,6 +250,8 @@ int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 
size, u32 max_direct,
kfree(pages);
if (!buf->direct.buf)
goto err_free;
+   } else {
+   buf->direct.buf = NULL;
}
}
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/h

[PATCH for-next 4/4] RDMA/hns: Update the PD&CQE&MTT specification in hip08

2017-09-30 Thread Wei Hu (Xavier)

This patch updates the PD specification to 16M for hip08. And it
updates the numbers of mtt and cqe segments for the buddy.

As the CQE supports hop num 1 addressing, the CQE specification is
64k. This patch updates to set the CQE specification to 64k.

Signed-off-by: Shaobo Xu 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h 
b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 65ed3f8..6106ad1 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -47,16 +47,16 @@
 #define HNS_ROCE_V2_MAX_QP_NUM 0x2000
 #define HNS_ROCE_V2_MAX_WQE_NUM0x8000
 #define HNS_ROCE_V2_MAX_CQ_NUM 0x8000
-#define HNS_ROCE_V2_MAX_CQE_NUM0x40
+#define HNS_ROCE_V2_MAX_CQE_NUM0x1
 #define HNS_ROCE_V2_MAX_RQ_SGE_NUM 0x100
 #define HNS_ROCE_V2_MAX_SQ_SGE_NUM 0xff
 #define HNS_ROCE_V2_MAX_SQ_INLINE  0x20
 #define HNS_ROCE_V2_UAR_NUM256
 #define HNS_ROCE_V2_PHY_UAR_NUM1
 #define HNS_ROCE_V2_MAX_MTPT_NUM   0x8000
-#define HNS_ROCE_V2_MAX_MTT_SEGS   0x10
-#define HNS_ROCE_V2_MAX_CQE_SEGS   0x1
-#define HNS_ROCE_V2_MAX_PD_NUM 0x40
+#define HNS_ROCE_V2_MAX_MTT_SEGS   0x100
+#define HNS_ROCE_V2_MAX_CQE_SEGS   0x100
+#define HNS_ROCE_V2_MAX_PD_NUM 0x100
 #define HNS_ROCE_V2_MAX_QP_INIT_RDMA   128
 #define HNS_ROCE_V2_MAX_QP_DEST_RDMA   128
 #define HNS_ROCE_V2_MAX_SQ_DESC_SZ 64
-- 
1.9.1

Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

2017-09-30 Thread Steven Rostedt

On Mon, 25 Sep 2017 17:23:00 -0700
Joel Fernandes  wrote:

> The trace_hardirqs_off API can be called even when IRQs are already
> off. This is unlike the trace_hardirqs_on which checks if IRQs are off
> (atleast from some callsites), here are the definitions just for
> reference [1]. I guess we could modify local_irq_disable and
> local_irq_save to check what the HW flags was before calling
> raw_local_irq_save and only then call trace_hardirqs_off if they were
> indeed on and now being turned off, but that adds complexity to it -
> also we have to then modify all the callsites from assembly code to
> conditionally call trace_hardirqs_on/off :(.
> 

> [1] 
> http://elixir.free-electrons.com/linux/latest/source/include/linux/irqflags.h#L89

Yeah, I think the issue for the recursion is basically this:

#define local_irq_restore(flags)\
do {\
if (raw_irqs_disabled_flags(flags)) {   \
raw_local_irq_restore(flags);   \
trace_hardirqs_off();   \
} else {\
trace_hardirqs_on();\
raw_local_irq_restore(flags);   \
}   \
} while (0)


Peter,

Is there any place where we would call local_irq_restore() when
interrupts are enabled?

-- Steve

Re: [PATCH REBASED 3/6] s390: Add __down_read_killable()

2017-09-30 Thread Heiko Carstens

On Fri, Sep 29, 2017 at 07:06:18PM +0300, Kirill Tkhai wrote:
> Similar to __down_write_killable(), and read killable primitive.
> 
> Signed-off-by: Kirill Tkhai 
> ---
>  arch/s390/include/asm/rwsem.h |   18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h

FWIW, while looking into this patch I realized that we never optimized our
rwsem primitives to make use of new atomic instructions.

The generic rwsem header file however does, since it uses atomic ops which
we did optimize. Even when compiling for old machines the generic version
generates better code. Therefore I will remove the 15 years old s390
implementation and switch to the generic version instead.

The same might be true for alpha and ia64...

[GIT PULL] SCSI fixes for 4.14-rc2

2017-09-30 Thread James Bottomley

Eight mostly minor fixes for recently discovered issues in drivers.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Dave Carroll (1):
  scsi: aacraid: Fix 2T+ drives on SmartIOC-2000

Guilherme G. Piccoli (1):
  scsi: aacraid: Add a small delay after IOP reset

Hannes Reinecke (2):
  scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add()
  scsi: scsi_transport_fc: set scsi_target_id upon rescan

Martin Wilck (1):
  scsi: ILLEGAL REQUEST + ASC==27 => target failure

Nikola Pajkovsky (1):
  scsi: aacraid: error: testing array offset 'bus' after use

Stefano Brivio (1):
  scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function

Xin Long (1):
  scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse 
nlmsg properly

And the diffstat:

drivers/scsi/aacraid/aachba.c   | 12 ++--
 drivers/scsi/aacraid/aacraid.h  |  5 +
 drivers/scsi/aacraid/linit.c| 20 
 drivers/scsi/aacraid/src.c  |  2 ++
 drivers/scsi/lpfc/lpfc_init.c   |  1 +
 drivers/scsi/scsi_error.c   |  3 ++-
 drivers/scsi/scsi_transport_fc.c| 14 +++---
 drivers/scsi/scsi_transport_iscsi.c |  2 +-
 8 files changed, 32 insertions(+), 27 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index a64285ab0728..af3e4d3f9735 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -699,13 +699,13 @@ static void _aac_probe_container1(void * context, struct 
fib * fibptr)
int status;
 
dresp = (struct aac_mount *) fib_data(fibptr);
-   if (!(fibptr->dev->supplement_adapter_info.supported_options2 &
-   AAC_OPTION_VARIABLE_BLOCK_SIZE))
+   if (!aac_supports_2T(fibptr->dev)) {
dresp->mnt[0].capacityhigh = 0;
-   if ((le32_to_cpu(dresp->status) != ST_OK) ||
-   (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) {
-   _aac_probe_container2(context, fibptr);
-   return;
+   if ((le32_to_cpu(dresp->status) == ST_OK) &&
+   (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) {
+   _aac_probe_container2(context, fibptr);
+   return;
+   }
}
scsicmd = (struct scsi_cmnd *) context;
 
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 92fabf2b0c24..403a639574e5 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2701,6 +2701,11 @@ static inline int aac_is_src(struct aac_dev *dev)
return 0;
 }
 
+static inline int aac_supports_2T(struct aac_dev *dev)
+{
+   return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
+}
+
 char * get_container_type(unsigned type);
 extern int numacb;
 extern char aac_driver_version[];
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 87cc4a93e637..62beb2596466 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -906,12 +906,14 @@ static int aac_eh_dev_reset(struct scsi_cmnd *cmd)
 
bus = aac_logical_to_phys(scmd_channel(cmd));
cid = scmd_id(cmd);
-   info = &aac->hba_map[bus][cid];
-   if (bus >= AAC_MAX_BUSES || cid >= AAC_MAX_TARGETS ||
-   info->devtype != AAC_DEVTYPE_NATIVE_RAW)
+
+   if (bus >= AAC_MAX_BUSES || cid >= AAC_MAX_TARGETS)
return FAILED;
 
-   if (info->reset_state > 0)
+   info = &aac->hba_map[bus][cid];
+
+   if (info->devtype != AAC_DEVTYPE_NATIVE_RAW &&
+   info->reset_state > 0)
return FAILED;
 
pr_err("%s: Host adapter reset request. SCSI hang ?\n",
@@ -962,12 +964,14 @@ static int aac_eh_target_reset(struct scsi_cmnd *cmd)
 
bus = aac_logical_to_phys(scmd_channel(cmd));
cid = scmd_id(cmd);
-   info = &aac->hba_map[bus][cid];
-   if (bus >= AAC_MAX_BUSES || cid >= AAC_MAX_TARGETS ||
-   info->devtype != AAC_DEVTYPE_NATIVE_RAW)
+
+   if (bus >= AAC_MAX_BUSES || cid >= AAC_MAX_TARGETS)
return FAILED;
 
-   if (info->reset_state > 0)
+   info = &aac->hba_map[bus][cid];
+
+   if (info->devtype != AAC_DEVTYPE_NATIVE_RAW &&
+   info->reset_state > 0)
return FAILED;
 
pr_err("%s: Host adapter reset request. SCSI hang ?\n",
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 48c2b2b34b72..0c9361c87ec8 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -740,6 +740,8 @@ static void aac_send_iop_reset(struct aac_dev *dev)
aac_set_intx_mode(dev);
 
src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
+
+   msleep(5000);
 }
 
 static void aac_send_hardware_soft_reset(struct aac_dev *dev)
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lp

[PATCH v2] nvme-pci: Use PCI bus address for data/queues in CMB

2017-09-30 Thread Abhishek Shah

Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

The patch is tested on Broadcom Stingray platform(arm64), which
does not have 1:1 outbound mapping, as well as on x86 platform,
which has 1:1 outbound mapping.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: sta...@vger.kernel.org
Signed-off-by: Abhishek Shah 
Reviewed-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/nvme/host/pci.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4a21213..1387050 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -94,6 +94,7 @@ struct nvme_dev {
bool subsystem;
void __iomem *cmb;
dma_addr_t cmb_dma_addr;
+   pci_bus_addr_t cmb_bus_addr;
u64 cmb_size;
u32 cmbsz;
u32 cmbloc;
@@ -1220,7 +1221,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, 
struct nvme_queue *nvmeq,
if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
  dev->ctrl.page_size);
-   nvmeq->sq_dma_addr = dev->cmb_dma_addr + offset;
+   nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
nvmeq->sq_cmds_io = dev->cmb + offset;
} else {
nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
@@ -1514,6 +1515,25 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static int nvme_find_cmb_bus_addr(struct pci_dev *pdev,
+ dma_addr_t dma_addr,
+ u64 size,
+ pci_bus_addr_t *bus_addr)
+{
+   struct resource *res;
+   struct pci_bus_region region;
+   struct resource tres = DEFINE_RES_MEM(dma_addr, size);
+
+   res = pci_find_resource(pdev, &tres);
+   if (!res)
+   return -EIO;
+
+   pcibios_resource_to_bus(pdev->bus, ®ion, res);
+   *bus_addr = region.start + (dma_addr - res->start);
+
+   return 0;
+}
+
 static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 {
u64 szu, size, offset;
@@ -1547,6 +1567,9 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
size = bar_size - offset;
 
dma_addr = pci_resource_start(pdev, NVME_CMB_BIR(dev->cmbloc)) + offset;
+   if (nvme_find_cmb_bus_addr(pdev, dma_addr, size, &dev->cmb_bus_addr))
+   return NULL;
+
cmb = ioremap_wc(dma_addr, size);
if (!cmb)
return NULL;
-- 
2.7.4

[PATCH v2] f2fs: order free nid allocator

2017-09-30 Thread Chao Yu

Previously, there is no restrict order among free nid allocators, if
there are no free nids being cached in memory, previous allocator will
try to load them by scanning NAT pages, but after that, these newly
loaded free nids could be grabbed by later allocators, result in long
delay of previous allocator during nid allocation.

This patch tries to refactor alloc_nid flow to serialize allocators.

Signed-off-by: Chao Yu 
---
v2:
- fix deadlock due to incorrectly using of down_trylock.
 fs/f2fs/f2fs.h |  3 ++-
 fs/f2fs/node.c | 63 --
 2 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c07690ce9a46..97ac7e6ab14b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -722,7 +722,8 @@ struct f2fs_nm_info {
struct list_head free_nid_list; /* list for free nids excluding 
preallocated nids */
unsigned int nid_cnt[MAX_NID_STATE];/* the number of free node id */
spinlock_t nid_list_lock;   /* protect nid lists ops */
-   struct mutex build_lock;/* lock for build free nids */
+   struct semaphore build_lock;/* lock for build free nids */
+   atomic_t allocator; /* # of free nid allocators */
unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE];
unsigned char *nat_block_bitmap;
unsigned short *free_nid_count; /* free nid count of NAT block */
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b95b2784e7d8..f6464b1faf03 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2041,9 +2041,9 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, 
bool sync, bool mount)
 
 void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
 {
-   mutex_lock(&NM_I(sbi)->build_lock);
+   down(&NM_I(sbi)->build_lock);
__build_free_nids(sbi, sync, mount);
-   mutex_unlock(&NM_I(sbi)->build_lock);
+   up(&NM_I(sbi)->build_lock);
 }
 
 /*
@@ -2055,22 +2055,30 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i = NULL;
+   bool alloc_failed = false, lock_build = false, ret = false;
+
+   spin_lock(&nm_i->nid_list_lock);
 retry:
 #ifdef CONFIG_F2FS_FAULT_INJECTION
if (time_to_inject(sbi, FAULT_ALLOC_NID)) {
f2fs_show_injection_info(FAULT_ALLOC_NID);
-   return false;
+   goto out;
}
 #endif
-   spin_lock(&nm_i->nid_list_lock);
 
-   if (unlikely(nm_i->available_nids == 0)) {
-   spin_unlock(&nm_i->nid_list_lock);
-   return false;
-   }
+   if (unlikely(nm_i->available_nids == 0))
+   goto out;
 
/* We should not use stale free nids created by build_free_nids */
-   if (nm_i->nid_cnt[FREE_NID] && !on_build_free_nids(nm_i)) {
+   if (nm_i->nid_cnt[FREE_NID] >= atomic_read(&nm_i->allocator) +
+   (alloc_failed ? 0 : 1)) {
+   if (!lock_build) {
+   if (!down_trylock(&nm_i->build_lock))
+   lock_build = true;
+   else
+   goto build;
+   }
+
f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list));
i = list_first_entry(&nm_i->free_nid_list,
struct free_nid, list);
@@ -2083,14 +2091,38 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
 
update_free_nid_bitmap(sbi, *nid, false, false);
 
-   spin_unlock(&nm_i->nid_list_lock);
-   return true;
+   ret = true;
+   goto out;
+   }
+build:
+   if (!alloc_failed) {
+   alloc_failed = true;
+   atomic_inc(&nm_i->allocator);
}
spin_unlock(&nm_i->nid_list_lock);
 
+   if (lock_build) {
+   lock_build = false;
+   up(&nm_i->build_lock);
+   }
+
/* Let's scan nat pages and its caches to get free nids */
-   build_free_nids(sbi, true, false);
+   down(&nm_i->build_lock);
+   lock_build = true;
+
+   if (nm_i->nid_cnt[FREE_NID] < atomic_read(&nm_i->allocator))
+   __build_free_nids(sbi, true, false);
+
+   spin_lock(&nm_i->nid_list_lock);
goto retry;
+
+out:
+   if (alloc_failed)
+   atomic_dec(&nm_i->allocator);
+   spin_unlock(&nm_i->nid_list_lock);
+   if (lock_build)
+   up(&nm_i->build_lock);
+   return ret;
 }
 
 /*
@@ -2154,7 +2186,7 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int 
nr_shrink)
if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
return 0;
 
-   if (!mutex_trylock(&nm_i->build_lock))
+   if (down_trylock(&nm_i->build_lock))
return 0;
 
spin_lock(&nm_i->nid_list_lock);
@@ -2168,7 +2200,7 @@ int try_to_free_nids(struct f2fs

[PATCH] net: stmmac: dwmac-rk: Add RK3128 GMAC support

2017-09-30 Thread David Wu

Add constants and callback functions for the dwmac on rk3128 soc.
As can be seen, the base structure is the same, only registers
and the bits in them moved slightly.

Signed-off-by: David Wu 
---
 .../devicetree/bindings/net/rockchip-dwmac.txt |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 112 +
 2 files changed, 113 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt 
b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
index 6af8eed..9c16ee2 100644
--- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
@@ -4,6 +4,7 @@ The device node has following properties.
 
 Required properties:
  - compatible: should be "rockchip,-gamc"
+   "rockchip,rk3128-gmac": found on RK312x SoCs
"rockchip,rk3228-gmac": found on RK322x SoCs
"rockchip,rk3288-gmac": found on RK3288 SoCs
"rockchip,rk3328-gmac": found on RK3328 SoCs
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 99823f5..13133b3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -83,6 +83,117 @@ struct rk_priv_data {
(((tx) ? soc##_GMAC_TXCLK_DLY_ENABLE : soc##_GMAC_TXCLK_DLY_DISABLE) | \
 ((rx) ? soc##_GMAC_RXCLK_DLY_ENABLE : soc##_GMAC_RXCLK_DLY_DISABLE))
 
+#define RK3128_GRF_MAC_CON00x0168
+#define RK3128_GRF_MAC_CON10x016c
+
+/* RK3128_GRF_MAC_CON0 */
+#define RK3128_GMAC_TXCLK_DLY_ENABLE   GRF_BIT(14)
+#define RK3128_GMAC_TXCLK_DLY_DISABLE  GRF_CLR_BIT(14)
+#define RK3128_GMAC_RXCLK_DLY_ENABLE   GRF_BIT(15)
+#define RK3128_GMAC_RXCLK_DLY_DISABLE  GRF_CLR_BIT(15)
+#define RK3128_GMAC_CLK_RX_DL_CFG(val) HIWORD_UPDATE(val, 0x7F, 7)
+#define RK3128_GMAC_CLK_TX_DL_CFG(val) HIWORD_UPDATE(val, 0x7F, 0)
+
+/* RK3128_GRF_MAC_CON1 */
+#define RK3128_GMAC_PHY_INTF_SEL_RGMII \
+   (GRF_BIT(6) | GRF_CLR_BIT(7) | GRF_CLR_BIT(8))
+#define RK3128_GMAC_PHY_INTF_SEL_RMII  \
+   (GRF_CLR_BIT(6) | GRF_CLR_BIT(7) | GRF_BIT(8))
+#define RK3128_GMAC_FLOW_CTRL  GRF_BIT(9)
+#define RK3128_GMAC_FLOW_CTRL_CLR  GRF_CLR_BIT(9)
+#define RK3128_GMAC_SPEED_10M  GRF_CLR_BIT(10)
+#define RK3128_GMAC_SPEED_100M GRF_BIT(10)
+#define RK3128_GMAC_RMII_CLK_25M   GRF_BIT(11)
+#define RK3128_GMAC_RMII_CLK_2_5M  GRF_CLR_BIT(11)
+#define RK3128_GMAC_CLK_125M   (GRF_CLR_BIT(12) | GRF_CLR_BIT(13))
+#define RK3128_GMAC_CLK_25M(GRF_BIT(12) | GRF_BIT(13))
+#define RK3128_GMAC_CLK_2_5M   (GRF_CLR_BIT(12) | GRF_BIT(13))
+#define RK3128_GMAC_RMII_MODE  GRF_BIT(14)
+#define RK3128_GMAC_RMII_MODE_CLR  GRF_CLR_BIT(14)
+
+static void rk3128_set_to_rgmii(struct rk_priv_data *bsp_priv,
+   int tx_delay, int rx_delay)
+{
+   struct device *dev = &bsp_priv->pdev->dev;
+
+   if (IS_ERR(bsp_priv->grf)) {
+   dev_err(dev, "Missing rockchip,grf property\n");
+   return;
+   }
+
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON1,
+RK3128_GMAC_PHY_INTF_SEL_RGMII |
+RK3128_GMAC_RMII_MODE_CLR);
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON0,
+DELAY_ENABLE(RK3128, tx_delay, rx_delay) |
+RK3128_GMAC_CLK_RX_DL_CFG(rx_delay) |
+RK3128_GMAC_CLK_TX_DL_CFG(tx_delay));
+}
+
+static void rk3128_set_to_rmii(struct rk_priv_data *bsp_priv)
+{
+   struct device *dev = &bsp_priv->pdev->dev;
+
+   if (IS_ERR(bsp_priv->grf)) {
+   dev_err(dev, "Missing rockchip,grf property\n");
+   return;
+   }
+
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON1,
+RK3128_GMAC_PHY_INTF_SEL_RMII | RK3128_GMAC_RMII_MODE);
+}
+
+static void rk3128_set_rgmii_speed(struct rk_priv_data *bsp_priv, int speed)
+{
+   struct device *dev = &bsp_priv->pdev->dev;
+
+   if (IS_ERR(bsp_priv->grf)) {
+   dev_err(dev, "Missing rockchip,grf property\n");
+   return;
+   }
+
+   if (speed == 10)
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON1,
+RK3128_GMAC_CLK_2_5M);
+   else if (speed == 100)
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON1,
+RK3128_GMAC_CLK_25M);
+   else if (speed == 1000)
+   regmap_write(bsp_priv->grf, RK3128_GRF_MAC_CON1,
+RK3128_GMAC_CLK_125M);
+   else
+   dev_err(dev, "unknown speed value for RGMII! speed=%d", speed);
+}
+
+static void rk3128_set_rmii_speed(struct rk_priv_data *bsp_priv, int speed)
+{
+   struct device *dev = &bsp_priv->pdev->dev;
+
+   if (IS_ERR(bsp_priv->grf)) {
+   dev_err(dev, "Missing rockchip,grf property\n");
+   return;
+   }
+
+   if (speed == 10) {

Re: [PATCH V7 0/6] block/scsi: safe SCSI quiescing

2017-09-30 Thread Martin Steigerwald

Hi Ming.

Ming Lei - 30.09.17, 14:12:
> Please consider this patchset for V4.15, and it fixes one
> kind of long-term I/O hang issue in either block legacy path
> or blk-mq.
>
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Isn´t that material for -stable as well?

I´d love to see this go into 4.14. Especially as its an LTS release.

Thanks,
Martin

> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation in case of transport_spi.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt only mode, and solves the issue
> by allowing RQF_PREEMP only during SCSI quiesce.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all.
> 
> V7:
>   - add Reviewed-by & Tested-by
>   - one line change in patch 5 for checking preempt request
> 
> V6:
>   - borrow Bart's idea of preempt only, with clean
> implementation(patch 5/patch 6)
>   - needn't any external driver's dependency, such as MD's
>   change
> 
> V5:
>   - fix one tiny race by introducing blk_queue_enter_preempt_freeze()
>   given this change is small enough compared with V4, I added
>   tested-by directly
> 
> V4:
>   - reorganize patch order to make it more reasonable
>   - support nested preempt freeze, as required by SCSI transport spi
>   - check preempt freezing in slow path of of blk_queue_enter()
>   - add "SCSI: transport_spi: resume a quiesced device"
>   - wake up freeze queue in setting dying for both blk-mq and legacy
>   - rename blk_mq_[freeze|unfreeze]_queue() in one patch
>   - rename .mq_freeze_wq and .mq_freeze_depth
>   - improve comment
> 
> V3:
>   - introduce q->preempt_unfreezing to fix one bug of preempt freeze
>   - call blk_queue_enter_live() only when queue is preempt frozen
>   - cleanup a bit on the implementation of preempt freeze
>   - only patch 6 and 7 are changed
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
>   - sync between preempt freeze and normal freeze
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013&r=3&w=2
> 
> 
> Thanks,
> Ming
> 
> Ming Lei (6):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   block: pass flags to blk_queue_enter()
>   block: prepare for passing RQF_PREEMPT to request allocation
>   block: support PREEMPT_ONLY
>   SCSI: set block queue at preempt only when SCSI device is put into
> quiesce
> 
>  block/blk-core.c| 63
> +++-- block/blk-mq.c  |
> 14 ---
>  block/blk-timeout.c |  2 +-
>  drivers/scsi/scsi_lib.c | 25 +---
>  fs/block_dev.c  |  4 ++--
>  include/linux/blk-mq.h  |  7 +++---
>  include/linux/blkdev.h  | 27 ++---
>  7 files changed, 107 insertions(+), 35 deletions(-)


-- 
Martin

Re: [PATCH] mm: kill kmemcheck again

2017-09-30 Thread Steven Rostedt

On Wed, 27 Sep 2017 17:02:07 +0200
Michal Hocko  wrote:

> > Now that 2 years have passed, and all distros provide gcc that supports
> > KASAN, kill kmemcheck again for the very same reasons.  
> 
> This is just too large to review manually. How have you generated the
> patch?

I agree. This needs to be taken out piece by piece, not in one go,
where there could be unexpected fallout.

> 
> My compile test batery failed for i386 allyesconfig for some reason
> which is not entirely clear to me (see attached).  I have applied on top
> of dc972a67cc54585bd83ad811c4e9b6ab3dcd427e and that one compiles fine.
> 
> > Cc: Steven Rostedt (VMware) 
> > Cc: David S. Miller 
> > Signed-off-by: Sasha Levin   
> 
> Anyway I fully support this removal. It is a lot of rarely used code and
> KASAN is much more usable.

Now that my default compilers support KASAN, I'm fine with this removal.

-- Steve

Re: [PATCH V7 0/6] block/scsi: safe SCSI quiescing

2017-09-30 Thread Ming Lei

Hi Martin,

On Sat, Sep 30, 2017 at 11:47:10AM +0200, Martin Steigerwald wrote:
> Hi Ming.
> 
> Ming Lei - 30.09.17, 14:12:
> > Please consider this patchset for V4.15, and it fixes one
> > kind of long-term I/O hang issue in either block legacy path
> > or blk-mq.
> >
> > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Isn´t that material for -stable as well?

Yeah, the patch 6 is CCed to -stable.

> 
> I´d love to see this go into 4.14. Especially as its an LTS release.

I am fine with either 4.14 or 4.15, and it is up to Jens.

Thanks
Ming

[PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-09-30 Thread Ming Lei

Hi Jens,

In Red Hat internal storage test wrt. blk-mq scheduler, we
found that I/O performance is much bad with mq-deadline, especially
about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
SRP...)

Turns out one big issue causes the performance regression: requests
are still dequeued from sw queue/scheduler queue even when ldd's
queue is busy, so I/O merge becomes quite difficult to make, then
sequential IO degrades a lot.

This issue becomes one of mains reasons for reverting default SCSI_MQ
in V4.13.

The 1st patch takes direct issue in blk_mq_request_bypass_insert(),
then we can improve dm-mpath's performance in part 2, which will
be posted out soon.

The 2nd six patches improve this situation, and brings back
some performance loss.

With this change, SCSI-MQ sequential I/O performance is
improved much, Paolo reported that mq-deadline performance
improved much[2] in his dbench test wrt V2. Also performanc
improvement on lpfc/qla2xx was observed with V1.[1]

Please consider it for V4.15.

[1] http://marc.info/?l=linux-block&m=150151989915776&w=2
[2] https://marc.info/?l=linux-block&m=150217980602843&w=2

V5:
- address some comments from Omar
- add Tested-by & Reveiewed-by tag
- use direct issue for blk_mq_request_bypass_insert(), and
start to consider to improve sequential I/O for dm-mpath
- only include part 1(the original patch 1 ~ 6), as suggested
by Omar

V4:
- add Reviewed-by tag
- some trival change: typo fix in commit log or comment,
variable name, no actual functional change

V3:
- totally round robin for picking req from ctx, as suggested
by Bart
- remove one local variable in __sbitmap_for_each_set()
- drop patches of single dispatch list, which can improve
performance on mq-deadline, but cause a bit degrade on
none because all hctxs need to be checked after ->dispatch
is flushed. Will post it again once it is mature.
- rebase on v4.13-rc6 with block for-next

V2:
- dequeue request from sw queues in round roubin's style
as suggested by Bart, and introduces one helper in sbitmap
for this purpose
- improve bio merge via hash table from sw queue
- add comments about using DISPATCH_BUSY state in lockless way,
simplifying handling on busy state,
- hold ctx->lock when clearing ctx busy bit as suggested
by Bart


Ming Lei (7):
  blk-mq: issue rq directly in blk_mq_request_bypass_insert()
  blk-mq-sched: fix scheduler bad performance
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq: introduce blk_mq_dequeue_from_ctx()
  blk-mq-sched: move actual dispatching into one helper
  blk-mq-sched: improve dispatching from sw queue
  blk-mq-sched: don't dequeue request until all in ->dispatch are
flushed

 block/blk-core.c|   3 +-
 block/blk-mq-debugfs.c  |   1 +
 block/blk-mq-sched.c| 104 ---
 block/blk-mq.c  | 114 +++-
 block/blk-mq.h  |   4 +-
 drivers/md/dm-rq.c  |   2 +-
 include/linux/blk-mq.h  |   3 ++
 include/linux/sbitmap.h |  64 +++
 8 files changed, 238 insertions(+), 57 deletions(-)

-- 
2.9.5

[PATCH V5 1/7] blk-mq: issue rq directly in blk_mq_request_bypass_insert()

2017-09-30 Thread Ming Lei

With issuing rq directly in blk_mq_request_bypass_insert(),
we can:

1) avoid to acquire hctx->lock.

2) the dispatch result can be returned to dm-rq, so that dm-rq
can use this information for improving I/O performance, and
part2 of this patchset will do that.

3) Also the following patch for improving sequential I/O performance
uses hctx->dispatch to decide if hctx is busy, so we need to avoid
to add rq into hctx->dispatch direclty.

There will be another patch in which we move blk_mq_request_direct_insert()
out since it is better for dm-rq to deal with this situation, and
the IO scheduler is actually in dm-rq side.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   |  3 +--
 block/blk-mq.c | 70 ++
 block/blk-mq.h |  2 +-
 drivers/md/dm-rq.c |  2 +-
 4 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 048be4aa6024..4c7fd2231145 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2350,8 +2350,7 @@ blk_status_t blk_insert_cloned_request(struct 
request_queue *q, struct request *
 * bypass a potential scheduler on the bottom device for
 * insert.
 */
-   blk_mq_request_bypass_insert(rq);
-   return BLK_STS_OK;
+   return blk_mq_request_bypass_insert(rq);
}
 
spin_lock_irqsave(q->queue_lock, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98a18609755e..d1b9fb539eba 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -39,6 +39,8 @@
 
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
+static blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+   struct request *rq, blk_qc_t *cookie, bool dispatch_only);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1401,20 +1403,31 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
blk_mq_hctx_mark_pending(hctx, ctx);
 }
 
+static void blk_mq_request_direct_insert(struct blk_mq_hw_ctx *hctx,
+struct request *rq)
+{
+   spin_lock(&hctx->lock);
+   list_add_tail(&rq->queuelist, &hctx->dispatch);
+   spin_unlock(&hctx->lock);
+
+   blk_mq_run_hw_queue(hctx, false);
+}
+
 /*
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq)
+blk_status_t blk_mq_request_bypass_insert(struct request *rq)
 {
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
+   blk_qc_t cookie;
+   blk_status_t ret;
 
-   spin_lock(&hctx->lock);
-   list_add_tail(&rq->queuelist, &hctx->dispatch);
-   spin_unlock(&hctx->lock);
-
-   blk_mq_run_hw_queue(hctx, false);
+   ret = blk_mq_try_issue_directly(hctx, rq, &cookie, true);
+   if (ret == BLK_STS_RESOURCE)
+   blk_mq_request_direct_insert(hctx, rq);
+   return ret;
 }
 
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
@@ -1527,9 +1540,14 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx 
*hctx, struct request *rq)
return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
 }
 
-static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
-   struct request *rq,
-   blk_qc_t *cookie, bool may_sleep)
+/*
+ * 'dispatch_only' means we only try to dispatch it out, and
+ * don't deal with dispatch failure if BLK_STS_RESOURCE or
+ * BLK_STS_IOERR happens.
+ */
+static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+   struct request *rq, blk_qc_t *cookie, bool may_sleep,
+   bool dispatch_only)
 {
struct request_queue *q = rq->q;
struct blk_mq_queue_data bd = {
@@ -1537,7 +1555,7 @@ static void __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
.last = true,
};
blk_qc_t new_cookie;
-   blk_status_t ret;
+   blk_status_t ret = BLK_STS_OK;
bool run_queue = true;
 
/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -1546,9 +1564,10 @@ static void __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
goto insert;
}
 
-   if (q->elevator)
+   if (q->elevator && !dispatch_only)
goto insert;
 
+   ret = BLK_STS_RESOURCE;
if (!blk_mq_get_driver_tag(rq, NULL, false))
goto insert;
 
@@ -1563,26 +1582,32 @@ static void __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
switch (ret) {
case BLK_STS_OK:
*cookie = new_cookie;
-   return;
+   return ret;
case BLK_STS_RESOURCE:

[PATCH V5 2/7] blk-mq-sched: fix scheduler bad performance

2017-09-30 Thread Ming Lei

When hw queue is busy, we shouldn't take requests from
scheduler queue any more, otherwise it is difficult to do
IO merge.

This patch fixes the awful IO performance on some
SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
is used by not taking requests if hw queue is busy.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Reviewed-by: Bart Van Assche 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..eca011fdfa0e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-   bool did_work = false;
+   bool do_sched_dispatch = true;
LIST_HEAD(rq_list);
 
/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -125,18 +125,18 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 */
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
-   did_work = blk_mq_dispatch_rq_list(q, &rq_list);
+   do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
} else if (!has_sched_dispatch) {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(q, &rq_list);
}
 
/*
-* We want to dispatch from the scheduler if we had no work left
-* on the dispatch list, OR if we did have work but weren't able
-* to make progress.
+* We want to dispatch from the scheduler if there was nothing
+* on the dispatch list or we were able to dispatch from the
+* dispatch list.
 */
-   if (!did_work && has_sched_dispatch) {
+   if (do_sched_dispatch && has_sched_dispatch) {
do {
struct request *rq;
 
-- 
2.9.5

[PATCH V5 3/7] sbitmap: introduce __sbitmap_for_each_set()

2017-09-30 Thread Ming Lei

We need to iterate ctx starting from any ctx in round robin
way, so introduce this helper.

Cc: Omar Sandoval 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 include/linux/sbitmap.h | 64 -
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index a1904aadbc45..0dcc60e820de 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -211,10 +211,14 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
  */
 bool sbitmap_any_bit_clear(const struct sbitmap *sb);
 
+#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
+#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+
 typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
 
 /**
- * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * __sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @start: Where to start the iteration.
  * @sb: Bitmap to iterate over.
  * @fn: Callback. Should return true to continue or false to break early.
  * @data: Pointer to pass to callback.
@@ -222,35 +226,61 @@ typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned 
int, void *);
  * This is inline even though it's non-trivial so that the function calls to 
the
  * callback will hopefully get optimized away.
  */
-static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
-   void *data)
+static inline void __sbitmap_for_each_set(struct sbitmap *sb,
+ unsigned int start,
+ sb_for_each_fn fn, void *data)
 {
-   unsigned int i;
+   unsigned int index;
+   unsigned int nr;
+   unsigned int scanned = 0;
 
-   for (i = 0; i < sb->map_nr; i++) {
-   struct sbitmap_word *word = &sb->map[i];
-   unsigned int off, nr;
+   if (start >= sb->depth)
+   start = 0;
+   index = SB_NR_TO_INDEX(sb, start);
+   nr = SB_NR_TO_BIT(sb, start);
 
-   if (!word->word)
-   continue;
+   while (scanned < sb->depth) {
+   struct sbitmap_word *word = &sb->map[index];
+   unsigned int depth = min_t(unsigned int, word->depth - nr,
+  sb->depth - scanned);
 
-   nr = 0;
-   off = i << sb->shift;
+   scanned += depth;
+   if (!word->word)
+   goto next;
+
+   /*
+* On the first iteration of the outer loop, we need to add the
+* bit offset back to the size of the word for find_next_bit().
+* On all other iterations, nr is zero, so this is a noop.
+*/
+   depth += nr;
while (1) {
-   nr = find_next_bit(&word->word, word->depth, nr);
-   if (nr >= word->depth)
+   nr = find_next_bit(&word->word, depth, nr);
+   if (nr >= depth)
break;
-
-   if (!fn(sb, off + nr, data))
+   if (!fn(sb, (index << sb->shift) + nr, data))
return;
 
nr++;
}
+next:
+   nr = 0;
+   if (++index >= sb->map_nr)
+   index = 0;
}
 }
 
-#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
-#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+/**
+ * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @sb: Bitmap to iterate over.
+ * @fn: Callback. Should return true to continue or false to break early.
+ * @data: Pointer to pass to callback.
+ */
+static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
+   void *data)
+{
+   __sbitmap_for_each_set(sb, 0, fn, data);
+}
 
 static inline unsigned long *__sbitmap_word(struct sbitmap *sb,
unsigned int bitnr)
-- 
2.9.5

[PATCH V5 4/7] blk-mq: introduce blk_mq_dequeue_from_ctx()

2017-09-30 Thread Ming Lei

This function is introduced for dequeuing request
from sw queue so that we can dispatch it in
scheduler's way.

More importantly, some SCSI devices may set
q->queue_depth, which is a per-request_queue limit,
and applied on pending I/O from all hctxs. This
function is introduced for avoiding to dequeue too
many requests from sw queue when ->dispatch isn't
flushed completely.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Reviewed-by: Bart Van Assche 
Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 38 ++
 block/blk-mq.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d1b9fb539eba..8b49af1ade7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -882,6 +882,44 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, 
struct list_head *list)
 }
 EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
 
+struct dispatch_rq_data {
+   struct blk_mq_hw_ctx *hctx;
+   struct request *rq;
+};
+
+static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void 
*data)
+{
+   struct dispatch_rq_data *dispatch_data = data;
+   struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
+   struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+
+   spin_lock(&ctx->lock);
+   if (unlikely(!list_empty(&ctx->rq_list))) {
+   dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+   list_del_init(&dispatch_data->rq->queuelist);
+   if (list_empty(&ctx->rq_list))
+   sbitmap_clear_bit(sb, bitnr);
+   }
+   spin_unlock(&ctx->lock);
+
+   return !dispatch_data->rq;
+}
+
+struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
+   struct blk_mq_ctx *start)
+{
+   unsigned off = start ? start->index_hw : 0;
+   struct dispatch_rq_data data = {
+   .hctx = hctx,
+   .rq   = NULL,
+   };
+
+   __sbitmap_for_each_set(&hctx->ctx_map, off,
+  dispatch_rq_from_ctx, &data);
+
+   return data.rq;
+}
+
 static inline unsigned int queued_to_index(unsigned int queued)
 {
if (!queued)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 61aecf398a4b..915de58572e7 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -35,6 +35,8 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, 
struct list_head *list);
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
bool wait);
+struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
+   struct blk_mq_ctx *start);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.9.5

[PATCH V5 5/7] blk-mq-sched: move actual dispatching into one helper

2017-09-30 Thread Ming Lei

So that it becomes easy to support to dispatch from
sw queue in the following patch.

No functional change.

Reviewed-by: Bart Van Assche 
Reviewed-by: Omar Sandoval 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index eca011fdfa0e..538f363f39ca 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,6 +89,22 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx 
*hctx)
return false;
 }
 
+static void blk_mq_do_dispatch_sched(struct request_queue *q,
+struct elevator_queue *e,
+struct blk_mq_hw_ctx *hctx)
+{
+   LIST_HEAD(rq_list);
+
+   do {
+   struct request *rq;
+
+   rq = e->type->ops.mq.dispatch_request(hctx);
+   if (!rq)
+   break;
+   list_add(&rq->queuelist, &rq_list);
+   } while (blk_mq_dispatch_rq_list(q, &rq_list));
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
@@ -136,16 +152,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 * on the dispatch list or we were able to dispatch from the
 * dispatch list.
 */
-   if (do_sched_dispatch && has_sched_dispatch) {
-   do {
-   struct request *rq;
-
-   rq = e->type->ops.mq.dispatch_request(hctx);
-   if (!rq)
-   break;
-   list_add(&rq->queuelist, &rq_list);
-   } while (blk_mq_dispatch_rq_list(q, &rq_list));
-   }
+   if (do_sched_dispatch && has_sched_dispatch)
+   blk_mq_do_dispatch_sched(q, e, hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-- 
2.9.5

[PATCH V5 6/7] blk-mq-sched: improve dispatching from sw queue

2017-09-30 Thread Ming Lei

SCSI devices use host-wide tagset, and the shared
driver tag space is often quite big. Meantime
there is also queue depth for each lun(.cmd_per_lun),
which is often small.

So lots of requests may stay in sw queue, and we
always flush all belonging to same hw queue and
dispatch them all to driver, unfortunately it is
easy to cause queue busy because of the small
per-lun queue depth. Once these requests are flushed
out, they have to stay in hctx->dispatch, and no bio
merge can participate into these requests, and
sequential IO performance is hurted.

This patch improves dispatching from sw queue when
there is per-request-queue queue depth by taking
request one by one from sw queue, just like the way
of IO scheduler.

Reviewed-by: Omar Sandoval 
Reviewed-by: Bart Van Assche 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c   | 53 --
 include/linux/blk-mq.h |  2 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 538f363f39ca..3ba112d9dc15 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -105,6 +105,42 @@ static void blk_mq_do_dispatch_sched(struct request_queue 
*q,
} while (blk_mq_dispatch_rq_list(q, &rq_list));
 }
 
+static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
+ struct blk_mq_ctx *ctx)
+{
+   unsigned idx = ctx->index_hw;
+
+   if (++idx == hctx->nr_ctx)
+   idx = 0;
+
+   return hctx->ctxs[idx];
+}
+
+static void blk_mq_do_dispatch_ctx(struct request_queue *q,
+  struct blk_mq_hw_ctx *hctx)
+{
+   LIST_HEAD(rq_list);
+   struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+   bool dispatched;
+
+   do {
+   struct request *rq;
+
+   rq = blk_mq_dequeue_from_ctx(hctx, ctx);
+   if (!rq)
+   break;
+   list_add(&rq->queuelist, &rq_list);
+
+   /* round robin for fair dispatch */
+   ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
+
+   dispatched = blk_mq_dispatch_rq_list(q, &rq_list);
+   } while (dispatched);
+
+   if (!dispatched)
+   WRITE_ONCE(hctx->dispatch_from, ctx);
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
@@ -142,18 +178,31 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-   } else if (!has_sched_dispatch) {
+   } else if (!has_sched_dispatch && !q->queue_depth) {
+   /*
+* If there is no per-request_queue depth, we
+* flush all requests in this hw queue, otherwise
+* pick up request one by one from sw queue for
+* avoiding to mess up I/O merge when dispatch
+* run out of resource, which can be triggered
+* easily by per-request_queue queue depth
+*/
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(q, &rq_list);
}
 
+   if (!do_sched_dispatch)
+   return;
+
/*
 * We want to dispatch from the scheduler if there was nothing
 * on the dispatch list or we were able to dispatch from the
 * dispatch list.
 */
-   if (do_sched_dispatch && has_sched_dispatch)
+   if (has_sched_dispatch)
blk_mq_do_dispatch_sched(q, e, hctx);
+   else
+   blk_mq_do_dispatch_ctx(q, hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2747469cedaf..fccabe00fb55 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -30,6 +30,8 @@ struct blk_mq_hw_ctx {
 
struct sbitmap  ctx_map;
 
+   struct blk_mq_ctx   *dispatch_from;
+
struct blk_mq_ctx   **ctxs;
unsigned intnr_ctx;
 
-- 
2.9.5

[PATCH V5 7/7] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-09-30 Thread Ming Lei

During dispatching, we moved all requests from hctx->dispatch to
one temporary list, then dispatch them one by one from this list.
Unfortunately during this period, run queue from other contexts
may think the queue is idle, then start to dequeue from sw/scheduler
queue and still try to dispatch because ->dispatch is empty. This way
hurts sequential I/O performance because requests are dequeued when
lld queue is busy.

This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to
make sure that request isn't dequeued until ->dispatch is
flushed.

Reviewed-by: Bart Van Assche 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   | 53 +-
 block/blk-mq.c |  6 ++
 include/linux/blk-mq.h |  1 +
 4 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 813ca3bbbefc..f1a62c0d1acc 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -182,6 +182,7 @@ static const char *const hctx_state_name[] = {
HCTX_STATE_NAME(SCHED_RESTART),
HCTX_STATE_NAME(TAG_WAITING),
HCTX_STATE_NAME(START_ON_RUN),
+   HCTX_STATE_NAME(DISPATCH_BUSY),
 };
 #undef HCTX_STATE_NAME
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 3ba112d9dc15..c5eac1eee442 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-   bool do_sched_dispatch = true;
LIST_HEAD(rq_list);
 
/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 */
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
-   do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-   } else if (!has_sched_dispatch && !q->queue_depth) {
+   blk_mq_dispatch_rq_list(q, &rq_list);
+
+   /*
+* We may clear DISPATCH_BUSY just after it
+* is set from another context, the only cost
+* is that one request is dequeued a bit early,
+* we can survive that. Given the window is
+* small enough, no need to worry about performance
+* effect.
+*/
+   if (list_empty_careful(&hctx->dispatch))
+   clear_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
+   }
+
+   /*
+* If DISPATCH_BUSY is set, that means hw queue is busy
+* and requests in the list of hctx->dispatch need to
+* be flushed first, so return early.
+*
+* Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
+* will be run to try to make progress, so it is always
+* safe to check the state here.
+*/
+   if (test_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state))
+   return;
+
+   if (!has_sched_dispatch) {
/*
 * If there is no per-request_queue depth, we
 * flush all requests in this hw queue, otherwise
@@ -187,22 +211,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 * run out of resource, which can be triggered
 * easily by per-request_queue queue depth
 */
-   blk_mq_flush_busy_ctxs(hctx, &rq_list);
-   blk_mq_dispatch_rq_list(q, &rq_list);
-   }
-
-   if (!do_sched_dispatch)
-   return;
-
-   /*
-* We want to dispatch from the scheduler if there was nothing
-* on the dispatch list or we were able to dispatch from the
-* dispatch list.
-*/
-   if (has_sched_dispatch)
+   if (!q->queue_depth) {
+   blk_mq_flush_busy_ctxs(hctx, &rq_list);
+   blk_mq_dispatch_rq_list(q, &rq_list);
+   } else {
+   blk_mq_do_dispatch_ctx(q, hctx);
+   }
+   } else {
blk_mq_do_dispatch_sched(q, e, hctx);
-   else
-   blk_mq_do_dispatch_ctx(q, hctx);
+   }
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8b49af1ade7f..7cb3f87334c0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1142,6 +1142,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
 
spin_lock(&hctx->lock);
list_splice_init(list, &hctx->dispatch);
+   /*
+* DISPATCH_BUSY won't be cleared until all requests
+* in hctx->dispat

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-09-30 Thread Ming Lei

On Sat, Sep 30, 2017 at 06:27:13PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> In Red Hat internal storage test wrt. blk-mq scheduler, we
> found that I/O performance is much bad with mq-deadline, especially
> about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
> SRP...)
> 
> Turns out one big issue causes the performance regression: requests
> are still dequeued from sw queue/scheduler queue even when ldd's
> queue is busy, so I/O merge becomes quite difficult to make, then
> sequential IO degrades a lot.
> 
> This issue becomes one of mains reasons for reverting default SCSI_MQ
> in V4.13.
> 
> The 1st patch takes direct issue in blk_mq_request_bypass_insert(),
> then we can improve dm-mpath's performance in part 2, which will
> be posted out soon.
> 
> The 2nd six patches improve this situation, and brings back
> some performance loss.
> 
> With this change, SCSI-MQ sequential I/O performance is
> improved much, Paolo reported that mq-deadline performance
> improved much[2] in his dbench test wrt V2. Also performanc
> improvement on lpfc/qla2xx was observed with V1.[1]
> 
> Please consider it for V4.15.
> 
> [1] http://marc.info/?l=linux-block&m=150151989915776&w=2
> [2] https://marc.info/?l=linux-block&m=150217980602843&w=2
> 
> V5:
>   - address some comments from Omar
>   - add Tested-by & Reveiewed-by tag
>   - use direct issue for blk_mq_request_bypass_insert(), and
>   start to consider to improve sequential I/O for dm-mpath
>   - only include part 1(the original patch 1 ~ 6), as suggested
>   by Omar
> 
> V4:
>   - add Reviewed-by tag
>   - some trival change: typo fix in commit log or comment,
>   variable name, no actual functional change
> 
> V3:
>   - totally round robin for picking req from ctx, as suggested
>   by Bart
>   - remove one local variable in __sbitmap_for_each_set()
>   - drop patches of single dispatch list, which can improve
>   performance on mq-deadline, but cause a bit degrade on
>   none because all hctxs need to be checked after ->dispatch
>   is flushed. Will post it again once it is mature.
>   - rebase on v4.13-rc6 with block for-next
> 
> V2:
>   - dequeue request from sw queues in round roubin's style
>   as suggested by Bart, and introduces one helper in sbitmap
>   for this purpose
>   - improve bio merge via hash table from sw queue
>   - add comments about using DISPATCH_BUSY state in lockless way,
>   simplifying handling on busy state,
>   - hold ctx->lock when clearing ctx busy bit as suggested
>   by Bart
> 
> 
> Ming Lei (7):
>   blk-mq: issue rq directly in blk_mq_request_bypass_insert()
>   blk-mq-sched: fix scheduler bad performance
>   sbitmap: introduce __sbitmap_for_each_set()
>   blk-mq: introduce blk_mq_dequeue_from_ctx()
>   blk-mq-sched: move actual dispatching into one helper
>   blk-mq-sched: improve dispatching from sw queue
>   blk-mq-sched: don't dequeue request until all in ->dispatch are
> flushed
> 
>  block/blk-core.c|   3 +-
>  block/blk-mq-debugfs.c  |   1 +
>  block/blk-mq-sched.c| 104 ---
>  block/blk-mq.c  | 114 
> +++-
>  block/blk-mq.h  |   4 +-
>  drivers/md/dm-rq.c  |   2 +-
>  include/linux/blk-mq.h  |   3 ++
>  include/linux/sbitmap.h |  64 +++
>  8 files changed, 238 insertions(+), 57 deletions(-)

Oops, the title should have been:

[PATCH V5 0/7] blk-mq-sched: improve sequential I/O performance(part 1)

Sorry for that.

-- 
Ming

Re: [PATCH REBASED 3/6] s390: Add __down_read_killable()

2017-09-30 Thread Martin Schwidefsky

On Sat, 30 Sep 2017 11:20:02 +0200
Heiko Carstens  wrote:

> On Fri, Sep 29, 2017 at 07:06:18PM +0300, Kirill Tkhai wrote:
> > Similar to __down_write_killable(), and read killable primitive.
> > 
> > Signed-off-by: Kirill Tkhai 
> > ---
> >  arch/s390/include/asm/rwsem.h |   18 --
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h  
> 
> FWIW, while looking into this patch I realized that we never optimized our
> rwsem primitives to make use of new atomic instructions.
> 
> The generic rwsem header file however does, since it uses atomic ops which
> we did optimize. Even when compiling for old machines the generic version
> generates better code. Therefore I will remove the 15 years old s390
> implementation and switch to the generic version instead.

Take care not to conflict with the queued spinlock/rwlock patches on the
features branch. 

https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=eb3b7b848fb3dd00f7a57d633d4ae4d194aa7865

Me thinks that what you have in mind is already done.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

Re: [PATCH] phy: rockchip-typec: Check for errors from tcphy_phy_init()

2017-09-30 Thread Guenter Roeck

On Fri, Sep 29, 2017 at 4:58 PM, Douglas Anderson  wrote:
> The function tcphy_phy_init() could return an error but the callers
> weren't checking the return value.  They should.  In at least one case
> while testing I saw the message "wait pma ready timeout" which
> indicates that tcphy_phy_init() really could return an error and we
> should account for it.
>
> Signed-off-by: Douglas Anderson 

Reviewed-by: Guenter Roeck 

> ---
>
>  drivers/phy/rockchip/phy-rockchip-typec.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/phy/rockchip/phy-rockchip-typec.c 
> b/drivers/phy/rockchip/phy-rockchip-typec.c
> index 4d2c57f21d76..38831eebc934 100644
> --- a/drivers/phy/rockchip/phy-rockchip-typec.c
> +++ b/drivers/phy/rockchip/phy-rockchip-typec.c
> @@ -685,8 +685,11 @@ static int rockchip_usb3_phy_power_on(struct phy *phy)
> if (tcphy->mode == new_mode)
> goto unlock_ret;
>
> -   if (tcphy->mode == MODE_DISCONNECT)
> -   tcphy_phy_init(tcphy, new_mode);
> +   if (tcphy->mode == MODE_DISCONNECT) {
> +   ret = tcphy_phy_init(tcphy, new_mode);
> +   if (ret)
> +   goto unlock_ret;
> +   }
>
> /* wait TCPHY for pipe ready */
> for (timeout = 0; timeout < 100; timeout++) {
> @@ -760,10 +763,12 @@ static int rockchip_dp_phy_power_on(struct phy *phy)
>  */
> if (new_mode == MODE_DFP_DP && tcphy->mode != MODE_DISCONNECT) {
> tcphy_phy_deinit(tcphy);
> -   tcphy_phy_init(tcphy, new_mode);
> +   ret = tcphy_phy_init(tcphy, new_mode);
> } else if (tcphy->mode == MODE_DISCONNECT) {
> -   tcphy_phy_init(tcphy, new_mode);
> +   ret = tcphy_phy_init(tcphy, new_mode);
> }
> +   if (ret)
> +   goto unlock_ret;
>
> ret = readx_poll_timeout(readl, tcphy->base + DP_MODE_CTL,
>  val, val & DP_MODE_A2, 1000,
> --
> 2.14.2.822.g60be5d43e6-goog
>

Re: [PATCH REBASED 3/6] s390: Add __down_read_killable()

2017-09-30 Thread Heiko Carstens

On Sat, Sep 30, 2017 at 12:36:12PM +0200, Martin Schwidefsky wrote:
> On Sat, 30 Sep 2017 11:20:02 +0200
> Heiko Carstens  wrote:
> 
> > On Fri, Sep 29, 2017 at 07:06:18PM +0300, Kirill Tkhai wrote:
> > > Similar to __down_write_killable(), and read killable primitive.
> > > 
> > > Signed-off-by: Kirill Tkhai 
> > > ---
> > >  arch/s390/include/asm/rwsem.h |   18 --
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/s390/include/asm/rwsem.h 
> > > b/arch/s390/include/asm/rwsem.h  
> > 
> > FWIW, while looking into this patch I realized that we never optimized our
> > rwsem primitives to make use of new atomic instructions.
> > 
> > The generic rwsem header file however does, since it uses atomic ops which
> > we did optimize. Even when compiling for old machines the generic version
> > generates better code. Therefore I will remove the 15 years old s390
> > implementation and switch to the generic version instead.
> 
> Take care not to conflict with the queued spinlock/rwlock patches on the
> features branch. 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=eb3b7b848fb3dd00f7a57d633d4ae4d194aa7865
> 
> Me thinks that what you have in mind is already done.

No, it's not done. You probably mixed up rwlocks and rwsems?

Re: [PATCH REBASED 3/6] s390: Add __down_read_killable()

2017-09-30 Thread Martin Schwidefsky

On Sat, 30 Sep 2017 12:36:12 +0200
Martin Schwidefsky  wrote:

> On Sat, 30 Sep 2017 11:20:02 +0200
> Heiko Carstens  wrote:
> 
> > On Fri, Sep 29, 2017 at 07:06:18PM +0300, Kirill Tkhai wrote:  
> > > Similar to __down_write_killable(), and read killable primitive.
> > > 
> > > Signed-off-by: Kirill Tkhai 
> > > ---
> > >  arch/s390/include/asm/rwsem.h |   18 --
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/s390/include/asm/rwsem.h 
> > > b/arch/s390/include/asm/rwsem.h
> > 
> > FWIW, while looking into this patch I realized that we never optimized our
> > rwsem primitives to make use of new atomic instructions.
> > 
> > The generic rwsem header file however does, since it uses atomic ops which
> > we did optimize. Even when compiling for old machines the generic version
> > generates better code. Therefore I will remove the 15 years old s390
> > implementation and switch to the generic version instead.  
> 
> Take care not to conflict with the queued spinlock/rwlock patches on the
> features branch. 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=eb3b7b848fb3dd00f7a57d633d4ae4d194aa7865
> 
> Me thinks that what you have in mind is already done.

Argh, pitfall rwlock != rwsem. Using the atomic_ops for the rwsem code makes
a lot of sense. Yes, please..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

Re: [PATCH] clk: rockchip: Delete an error message for a failed memory allocation in rockchip_clk_register_cpuclk()

2017-09-30 Thread Heiko Stuebner

Am Mittwoch, 27. September 2017, 11:44:30 CEST schrieb SF Markus Elfring:
> From: Markus Elfring 
> Date: Wed, 27 Sep 2017 11:38:17 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

applied for 4.15 after shortening the patch subject a bit


Thanks
Heiko

Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-30 Thread Tetsuo Handa

Yang Shi wrote:
> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> >>> Yang Shi wrote:
>  On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when 
> >> unreclaimable slabs amount > total user memory. Not only in oom panic 
> >> path.
> >
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since 
> > V2
> > because there are
> >
> > mutex_lock(&slab_mutex);
> > kmalloc(GFP_KERNEL);
> > mutex_unlock(&slab_mutex);
> >
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, 
> > aren't we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
>  I don't see the difference between regular oom path and oom path other
>  than calling panic() at last.
> 
>  And, the slab dump may be called by panic path too, it is for both
>  regular and panic path.
> >>>
> >>> Calling a function that might cause kerneloops immediately before calling 
> >>> panic()
> >>> would be tolerable, for the kernel will panic after all. But calling a 
> >>> function
> >>> that might cause kerneloops when there is no plan to call panic() is a 
> >>> bug.
> >>
> >> I got your point. slab_mutex is used to protect the list of all the
> >> slabs, since we are already in oom, there should be not kmem cache
> >> destroy happen during the list traverse. And, list_for_each_entry() has
> >> been replaced to list_for_each_entry_safe() to make the traverse more
> >> robust.
> > 
> > I consider that OOM event and kmem chache destroy event can run concurrently
> > because slab_mutex is not held by OOM event (and unfortunately cannot be 
> > held
> > due to possibility of deadlock) in order to protect the list of all the 
> > slabs.
> > 
> > I don't think replacing list_for_each_entry() with 
> > list_for_each_entry_safe()
> > makes the traverse more robust, for list_for_each_entry_safe() does not 
> > defer
> > freeing of memory used by list element. Rather, replacing 
> > list_for_each_entry()
> > with list_for_each_entry_rcu() (and making relevant changes such as
> > rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
> > safe.
> 
> I'm not sure if rcu could satisfy this case. rcu just can protect  
> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
> slabs.

I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

  Upon registration:

// do initialize/setup stuff here
synchronize_rcu(); // <= for dump_unreclaimable_slab()
list_add_rcu(&kmem_cache->list, &slab_caches);

  Upon unregistration:

list_del_rcu(&kmem_cache->list);
synchronize_rcu(); // <= for dump_unreclaimable_slab()
// do finalize/cleanup stuff here

then (if my understanding is correct)

rcu_read_lock();
list_for_each_entry_rcu(s, &slab_caches, list) {
if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
continue;

memset(&sinfo, 0, sizeof(sinfo));
get_slabinfo(s, &sinfo);

if (sinfo.num_objs > 0)
pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
(sinfo.active_objs * s->size) / 1024,
(sinfo.num_objs * s->size) / 1024);
}
rcu_read_unlock();

will make dump_unreclaimable_slab() safe.

Re: [PATCH v2] Staging: rtl8723bs: Remove unnecessary comments.

2017-09-30 Thread Tobin C. Harding

Hi Shreeya,

We don't usually add a period to the subject line for kernel patches. (reason: 
we only have about
52 characters for the commit brief description so best not to waste any).

On Sat, Sep 30, 2017 at 01:30:34PM +0530, Shreeya Patel wrote:
> This patch removes unnecessary comments which are there
> to explain why call to memset is in comments. Both of the
> comments are not needed as they are not very useful.

You may like to read Documentation/process/submitting-patches.rst (specifically
section 2) for tips on writing your git log.

Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.

Good luck,
Tobin.

Re: [PATCH 0/6] Replace container_of with list_entry

2017-09-30 Thread Tobin C. Harding

On Sat, Sep 30, 2017 at 12:49:00PM +0530, Srishti Sharma wrote:
> Replaces instances of container_of with list_entry to 
> access current list element.
> 
> Srishti Sharma (6):
>   Staging: rtl8188eu: core: Use list_entry instead of container_of
>   Staging: rtl8188eu: core: Use list_entry instead of container_of
>   Staging: rtl8188eu: core: Use list_entry instead of container_of
>   Staging: rtl8188eu: core: Use list_entry instead of container_of
>   Staging: rtl8188eu: core: Use list_entry instead of container_of
>   Staging: rtl8188eu: core: Use list_entry instead of container_of

You may have trouble getting patches merged with duplicate commit logs like 
this. The reason is that
the git index should be grep'able. You may like to squash all of these commits 
into a single patch
since they all do the same thing. The mantra is 'one thing per patch' so this 
makes sense in this
case.

Hope this helps,
Tobin.

Re: random insta-reboots on AMD Phenom II

2017-09-30 Thread Borislav Petkov

On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote:
> Any hints how to debug this?

Do

rdmsr -a 0xc0010015

as root and paste it here.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v2] Staging: rtl8723bs: Remove unnecessary comments.

2017-09-30 Thread Shreeya Patel

On Sat, 2017-09-30 at 21:06 +1000, Tobin C. Harding wrote:
> Hi Shreeya,
> 
> We don't usually add a period to the subject line for kernel patches.
> (reason: we only have about
> 52 characters for the commit brief description so best not to waste
> any).
> 
> On Sat, Sep 30, 2017 at 01:30:34PM +0530, Shreeya Patel wrote:
> > 
> > This patch removes unnecessary comments which are there
> > to explain why call to memset is in comments. Both of the
> > comments are not needed as they are not very useful.
> You may like to read Documentation/process/submitting-patches.rst
> (specifically
> section 2) for tips on writing your git log.
> 
> Describe your changes in imperative mood, e.g. "make xyzzy do
> frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I]
> changed xyzzy
> to do frotz", as if you are giving orders to the codebase to
> change
> its behaviour.
> 
> Good luck,
> Tobin.

Hello,

Thanks for correcting me :)
I'll do the necessary changes and will send as v3.

Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system

2017-09-30 Thread Baoquan He

Hi Mike,

On 09/28/17 at 07:10am, Mike Travis wrote:
> 
> 
> On 9/28/2017 2:01 AM, Ingo Molnar wrote:
> > 
> > > If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
> > > the direct mapping section, is incorrect.
> > > 
> > > Its direct mapping size includes two parts:
> > > #1 RAM size of system
> > > #2 MMIOH region size which only SGI UV system has.
> > > 
> > > However, the #2 can only be got till uv_system_init() is called in
> > > native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
> > > That's why I made this hack.
> > > 
> > > I checked uv_system_init() code, seems not easy to know the size of
> > > MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
> > > devel experts, not sure if they have any idea about this. Otherwise,
> > > this patch could be the only way I can think of.
> > > 
> > > Hi Mike and Russ,
> > > 
> > > Is there any chance we can get the size of MMIOH region before mm KASLR
> > > code, namely before we call kernel_randomize_memory()?
> 
> The sizes of the MMIOL and MMIOH areas are tied into the HUB design and how
> it is communicated to BIOS and the kernel.  This is via some of the config
> MMR's found in the HUB and it would be impossible to provide any access to
> these registers as they change with each new UV architecture.
> 
> The kernel does reserve the memory in the EFI memmap.  I can send you a
> console log of the full startup that includes the MMIOH reservations. Note
> that it is dependent on what I/O devices are actually present as UV does not
> map empty slots unless forced (because we'd quickly run out of resources.)
> Also, the EFI memmap entries do not specify the exact usage of the contained
> areas.

Does that mean we can get the size of MMIOH from efi entries? If yes,
please help provide a console log including those. If can get size from
efi, it will be more acceptable.

Or I can ask Frank to loan his uv system to me, not sure if he is doing
testing with them.

Thanks
Baoquan

> 
> > 
> > I don't mind system specific quirks to hardware enumeration details, as 
> > long as
> > they don't pollute generic code with such special hacks.
> > 
> > I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. 
> > Any
> > other code that relies on it in the future will be wrong as well on UV 
> > systems.
> 
> Which may come into play on other arches with the new upcoming memory
> technologies.
> > 
> > The right quirk would be to fix that up where it gets introduced, or 
> > something
> > like that.
> 
> Yes, does make sense.
> > 
> > Thanks,
> > 
> > Ingo
> >

[PATCH V5 0/8] blk-mq: improve bio merge for none scheduler

2017-09-30 Thread Ming Lei

Hi,

Patch 1 ~ 2 uses q->queue_depth as hint for setting up
scheduler queue depth.

Patch 3 ~ 8 improve bio merge via hash table in sw queue,
which makes bio merge more efficient than current approch
in which only the last 8 requests in sw queue are checked.

Also this way has been used in block legacy path for long time,
and blk-mq scheduler uses hash table to do bio merge too.

V5:
- splitted from previous patchset of 'blk-mq-sched: improve
SCSI-MQ performance V4'

Ming Lei (8):
  blk-mq-sched: introduce blk_mq_sched_queue_depth()
  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  block: introduce rqhash helpers
  block: move actual bio merge code into __elv_merge
  block: add check on elevator for supporting bio merge via hashtable
from blk-mq sw queue
  block: introduce .last_merge and .hash to blk_mq_ctx
  blk-mq-sched: refactor blk_mq_sched_try_merge()
  blk-mq: improve bio merge from blk-mq sw queue

 block/blk-mq-sched.c | 75 +++---
 block/blk-mq-sched.h | 23 +
 block/blk-mq.c   | 55 ---
 block/blk-mq.h   |  5 +++
 block/blk-settings.c |  2 ++
 block/blk.h  | 55 +++
 block/elevator.c | 93 +++-
 7 files changed, 216 insertions(+), 92 deletions(-)

-- 
2.9.5

[PATCH] vme: Fix integer overflow checking in vme_check_window()

2017-09-30 Thread Dan Carpenter

The controversial part of this patch is that I've changed it so we now
prevent integer overflows for VME_USER types and before we didn't.  I
view it as kernel-hardening.  I looked at a couple places that used
VME_USER types and they seemed pretty suspicious so I'm pretty sure
preventing overflows here is a good idea.

The most common problem which this function is for cases like VME_A16
where we don't put an upper bound on "size" so you could have "size" set
to U64_MAX and a valid vme_base would overflow the "vme_base + size"
into the valid range as well.

In the VME_A64 case, the integer overflow checking doesn't work because
"U64_MAX + 1" has an integer overflow and it's just a complicated way of
saying zero.  That VME_A64 case is sort of interesting as well because
there is a VME_A64_MAX define which is set to "U64_MAX + 1".  The
compiler will never let anyone use it since it can't be stored in a u64
variable...  With my patch it's now limited to just U64_MAX.

Anyway, I put one integer overflow check at the start of the function
and deleted all existing checks.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/vme/vme.c b/drivers/vme/vme.c
index 6a3ead42aba8..5b4c898d7509 100644
--- a/drivers/vme/vme.c
+++ b/drivers/vme/vme.c
@@ -208,29 +208,27 @@ int vme_check_window(u32 aspace, unsigned long long 
vme_base,
 {
int retval = 0;
 
+   if (vme_base + size < size)
+   return -EINVAL;
+
switch (aspace) {
case VME_A16:
-   if (((vme_base + size) > VME_A16_MAX) ||
-   (vme_base > VME_A16_MAX))
+   if (vme_base + size > VME_A16_MAX)
retval = -EFAULT;
break;
case VME_A24:
-   if (((vme_base + size) > VME_A24_MAX) ||
-   (vme_base > VME_A24_MAX))
+   if (vme_base + size > VME_A24_MAX)
retval = -EFAULT;
break;
case VME_A32:
-   if (((vme_base + size) > VME_A32_MAX) ||
-   (vme_base > VME_A32_MAX))
+   if (vme_base + size > VME_A32_MAX)
retval = -EFAULT;
break;
case VME_A64:
-   if ((size != 0) && (vme_base > U64_MAX + 1 - size))
-   retval = -EFAULT;
+   /* The VME_A64_MAX limit is actually U64_MAX + 1 */
break;
case VME_CRCSR:
-   if (((vme_base + size) > VME_CRCSR_MAX) ||
-   (vme_base > VME_CRCSR_MAX))
+   if (vme_base + size > VME_CRCSR_MAX)
retval = -EFAULT;
break;
case VME_USER1:

Re: random insta-reboots on AMD Phenom II

2017-09-30 Thread Adam Borowski

On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote:
> On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote:
> > Any hints how to debug this?
> 
> Do
> rdmsr -a 0xc0010015
> as root and paste it here.

110
110
110
110
110
110

on both 4.13.4 and 4.14-rc2+.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ We domesticated dogs 36000 years ago; together we chased
⣾⠁⢰⠒⠀⣿⡁ animals, hung out and licked or scratched our private parts.
⢿⡄⠘⠷⠚⠋⠀ Cats domesticated us 9500 years ago, and immediately we got
⠈⠳⣄ agriculture, towns then cities. -- whitroth on /.

Hello Friend

2017-09-30 Thread Wang_Jianlin




--
I intend to give you a portion of my wealth as a free-will financial 
donation to you, Respond to partake, please contact my private email for 
more details

Wang Jianlin
Wanda Group

[PATCH V5 2/8] blk-mq-sched: use q->queue_depth as hint for q->nr_requests

2017-09-30 Thread Ming Lei

SCSI sets q->queue_depth from shost->cmd_per_lun, and
q->queue_depth is per request_queue and more related to
scheduler queue compared with hw queue depth, which can be
shared by queues, such as TAG_SHARED.

This patch tries to use q->queue_depth as hint for computing
q->nr_requests, which should be more effective than
current way.

Reviewed-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.h | 18 +++---
 block/blk-mq.c   | 27 +--
 block/blk-mq.h   |  1 +
 block/blk-settings.c |  2 ++
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1d47f3fda1d0..906b10c54f78 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -99,12 +99,24 @@ static inline bool blk_mq_sched_needs_restart(struct 
blk_mq_hw_ctx *hctx)
 static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
 {
/*
-* Default to double of smaller one between hw queue_depth and 128,
+* q->queue_depth is more close to scheduler queue, so use it
+* as hint for computing scheduler queue depth if it is valid
+*/
+   unsigned q_depth = q->queue_depth ?: q->tag_set->queue_depth;
+
+   /*
+* Default to double of smaller one between queue depth and 128,
 * since we don't split into sync/async like the old code did.
 * Additionally, this is a per-hw queue depth.
 */
-   return 2 * min_t(unsigned int, q->tag_set->queue_depth,
-  BLKDEV_MAX_RQ);
+   q_depth = 2 * min_t(unsigned int, q_depth, BLKDEV_MAX_RQ);
+
+   /*
+* when queue depth of driver is too small, we set queue depth
+* of scheduler queue as 128 which is the default setting of
+* block legacy code.
+*/
+   return max_t(unsigned, q_depth, BLKDEV_MAX_RQ);
 }
 
 #endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7cb3f87334c0..561a663cdd0e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2694,7 +2694,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+static int __blk_mq_update_nr_requests(struct request_queue *q,
+  bool sched_only,
+  unsigned int nr)
 {
struct blk_mq_tag_set *set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
@@ -2713,7 +2715,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
unsigned int nr)
 * If we're using an MQ scheduler, just update the scheduler
 * queue depth. This is similar to what the old code would do.
 */
-   if (!hctx->sched_tags) {
+   if (!sched_only && !hctx->sched_tags) {
ret = blk_mq_tag_update_depth(hctx, &hctx->tags,
min(nr, 
set->queue_depth),
false);
@@ -2733,6 +2735,27 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
unsigned int nr)
return ret;
 }
 
+int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+{
+   return __blk_mq_update_nr_requests(q, false, nr);
+}
+
+/*
+ * When drivers update q->queue_depth, this API is called so that
+ * we can use this queue depth as hint for adjusting scheduler
+ * queue depth.
+ */
+int blk_mq_update_sched_queue_depth(struct request_queue *q)
+{
+   unsigned nr;
+
+   if (!q->mq_ops || !q->elevator)
+   return 0;
+
+   nr = blk_mq_sched_queue_depth(q);
+   return __blk_mq_update_nr_requests(q, true, nr);
+}
+
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
int nr_hw_queues)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 915de58572e7..5bca6ce1f01d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,6 +37,7 @@ bool blk_mq_get_driver_tag(struct request *rq, struct 
blk_mq_hw_ctx **hctx,
bool wait);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
+int blk_mq_update_sched_queue_depth(struct request_queue *q);
 
 /*
  * Internal helpers for allocating/freeing the request map
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8559e9563c52..c2db38d2ec2b 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -878,6 +878,8 @@ void blk_set_queue_depth(struct request_queue *q, unsigned 
int depth)
 {
q->queue_depth = depth;
wbt_set_queue_depth(q->rq_wb, depth);
+
+   WARN_ON(blk_mq_update_sched_queue_depth(q));
 }
 EXPORT_SYMBOL(blk_set_queue_depth);
 
-- 
2.9.5

[PATCH V5 1/8] blk-mq-sched: introduce blk_mq_sched_queue_depth()

2017-09-30 Thread Ming Lei

The following patch will use one hint to figure out
default queue depth for scheduler queue, so introduce
the helper of blk_mq_sched_queue_depth() for this purpose.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c |  8 +---
 block/blk-mq-sched.h | 11 +++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c5eac1eee442..8c09959bc0d0 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -588,13 +588,7 @@ int blk_mq_init_sched(struct request_queue *q, struct 
elevator_type *e)
return 0;
}
 
-   /*
-* Default to double of smaller one between hw queue_depth and 128,
-* since we don't split into sync/async like the old code did.
-* Additionally, this is a per-hw queue depth.
-*/
-   q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
-  BLKDEV_MAX_RQ);
+   q->nr_requests = blk_mq_sched_queue_depth(q);
 
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_sched_alloc_tags(q, hctx, i);
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..1d47f3fda1d0 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -96,4 +96,15 @@ static inline bool blk_mq_sched_needs_restart(struct 
blk_mq_hw_ctx *hctx)
return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
+static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
+{
+   /*
+* Default to double of smaller one between hw queue_depth and 128,
+* since we don't split into sync/async like the old code did.
+* Additionally, this is a per-hw queue depth.
+*/
+   return 2 * min_t(unsigned int, q->tag_set->queue_depth,
+  BLKDEV_MAX_RQ);
+}
+
 #endif
-- 
2.9.5

[PATCH V5 3/8] block: introduce rqhash helpers

2017-09-30 Thread Ming Lei

We need this helpers for supporting to use hashtable to improve
bio merge from sw queue in the following patches.

No functional change.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk.h  | 52 
 block/elevator.c | 36 +++-
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index fcb9775b997d..eb3436d4a73f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -146,6 +146,58 @@ static inline void blk_clear_rq_complete(struct request 
*rq)
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
+/*
+ * Merge hash stuff.
+ */
+#define rq_hash_key(rq)(blk_rq_pos(rq) + blk_rq_sectors(rq))
+
+#define bucket(head, key)  &((head)[hash_min((key), ELV_HASH_BITS)])
+
+static inline void __rqhash_del(struct request *rq)
+{
+   hash_del(&rq->hash);
+   rq->rq_flags &= ~RQF_HASHED;
+}
+
+static inline void rqhash_del(struct request *rq)
+{
+   if (ELV_ON_HASH(rq))
+   __rqhash_del(rq);
+}
+
+static inline void rqhash_add(struct hlist_head *hash, struct request *rq)
+{
+   BUG_ON(ELV_ON_HASH(rq));
+   hlist_add_head(&rq->hash, bucket(hash, rq_hash_key(rq)));
+   rq->rq_flags |= RQF_HASHED;
+}
+
+static inline void rqhash_reposition(struct hlist_head *hash, struct request 
*rq)
+{
+   __rqhash_del(rq);
+   rqhash_add(hash, rq);
+}
+
+static inline struct request *rqhash_find(struct hlist_head *hash, sector_t 
offset)
+{
+   struct hlist_node *next;
+   struct request *rq = NULL;
+
+   hlist_for_each_entry_safe(rq, next, bucket(hash, offset), hash) {
+   BUG_ON(!ELV_ON_HASH(rq));
+
+   if (unlikely(!rq_mergeable(rq))) {
+   __rqhash_del(rq);
+   continue;
+   }
+
+   if (rq_hash_key(rq) == offset)
+   return rq;
+   }
+
+   return NULL;
+}
+
 void blk_insert_flush(struct request *rq);
 
 static inline struct request *__elv_next_request(struct request_queue *q)
diff --git a/block/elevator.c b/block/elevator.c
index 153926a90901..824cc3e69ac3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -47,11 +47,6 @@ static DEFINE_SPINLOCK(elv_list_lock);
 static LIST_HEAD(elv_list);
 
 /*
- * Merge hash stuff.
- */
-#define rq_hash_key(rq)(blk_rq_pos(rq) + blk_rq_sectors(rq))
-
-/*
  * Query io scheduler to see if the current process issuing bio may be
  * merged with rq.
  */
@@ -268,14 +263,12 @@ EXPORT_SYMBOL(elevator_exit);
 
 static inline void __elv_rqhash_del(struct request *rq)
 {
-   hash_del(&rq->hash);
-   rq->rq_flags &= ~RQF_HASHED;
+   __rqhash_del(rq);
 }
 
 void elv_rqhash_del(struct request_queue *q, struct request *rq)
 {
-   if (ELV_ON_HASH(rq))
-   __elv_rqhash_del(rq);
+   rqhash_del(rq);
 }
 EXPORT_SYMBOL_GPL(elv_rqhash_del);
 
@@ -283,37 +276,22 @@ void elv_rqhash_add(struct request_queue *q, struct 
request *rq)
 {
struct elevator_queue *e = q->elevator;
 
-   BUG_ON(ELV_ON_HASH(rq));
-   hash_add(e->hash, &rq->hash, rq_hash_key(rq));
-   rq->rq_flags |= RQF_HASHED;
+   rqhash_add(e->hash, rq);
 }
 EXPORT_SYMBOL_GPL(elv_rqhash_add);
 
 void elv_rqhash_reposition(struct request_queue *q, struct request *rq)
 {
-   __elv_rqhash_del(rq);
-   elv_rqhash_add(q, rq);
+   struct elevator_queue *e = q->elevator;
+
+   rqhash_reposition(e->hash, rq);
 }
 
 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
 {
struct elevator_queue *e = q->elevator;
-   struct hlist_node *next;
-   struct request *rq;
-
-   hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
-   BUG_ON(!ELV_ON_HASH(rq));
 
-   if (unlikely(!rq_mergeable(rq))) {
-   __elv_rqhash_del(rq);
-   continue;
-   }
-
-   if (rq_hash_key(rq) == offset)
-   return rq;
-   }
-
-   return NULL;
+   return rqhash_find(e->hash, offset);
 }
 
 /*
-- 
2.9.5

[PATCH V5 5/8] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue

2017-09-30 Thread Ming Lei

blk_mq_sched_try_merge() will be reused in following patches
to support bio merge to blk-mq sw queue, so add checkes to
related functions which are called from blk_mq_sched_try_merge().

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/elevator.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/block/elevator.c b/block/elevator.c
index e11c7873fc21..2424aea85393 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -71,6 +71,10 @@ bool elv_bio_merge_ok(struct request *rq, struct bio *bio)
if (!blk_rq_merge_ok(rq, bio))
return false;
 
+   /* We need to support to merge bio from sw queue */
+   if (!rq->q->elevator)
+   return true;
+
if (!elv_iosched_allow_bio_merge(rq, bio))
return false;
 
@@ -449,6 +453,10 @@ static enum elv_merge __elv_merge(struct request_queue *q,
return ELEVATOR_BACK_MERGE;
}
 
+   /* no elevator when merging bio to blk-mq sw queue */
+   if (!e)
+   return ELEVATOR_NO_MERGE;
+
if (e->uses_mq && e->type->ops.mq.request_merge)
return e->type->ops.mq.request_merge(q, req, bio);
else if (!e->uses_mq && e->type->ops.sq.elevator_merge_fn)
@@ -711,6 +719,10 @@ struct request *elv_latter_request(struct request_queue 
*q, struct request *rq)
 {
struct elevator_queue *e = q->elevator;
 
+   /* no elevator when merging bio to blk-mq sw queue */
+   if (!e)
+   return NULL;
+
if (e->uses_mq && e->type->ops.mq.next_request)
return e->type->ops.mq.next_request(q, rq);
else if (!e->uses_mq && e->type->ops.sq.elevator_latter_req_fn)
@@ -723,6 +735,10 @@ struct request *elv_former_request(struct request_queue 
*q, struct request *rq)
 {
struct elevator_queue *e = q->elevator;
 
+   /* no elevator when merging bio to blk-mq sw queue */
+   if (!e)
+   return NULL;
+
if (e->uses_mq && e->type->ops.mq.former_request)
return e->type->ops.mq.former_request(q, rq);
if (!e->uses_mq && e->type->ops.sq.elevator_former_req_fn)
-- 
2.9.5

[PATCH V5 7/8] blk-mq-sched: refactor blk_mq_sched_try_merge()

2017-09-30 Thread Ming Lei

This patch introduces one function __blk_mq_try_merge()
which will be resued for bio merge to sw queue in
the following patch.

No functional change.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Reviewed-by: Bart Van Assche 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8c09959bc0d0..a58f4746317c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -222,12 +222,11 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
}
 }
 
-bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-   struct request **merged_request)
+static bool __blk_mq_try_merge(struct request_queue *q,
+   struct bio *bio, struct request **merged_request,
+   struct request *rq, enum elv_merge type)
 {
-   struct request *rq;
-
-   switch (elv_merge(q, &rq, bio)) {
+   switch (type) {
case ELEVATOR_BACK_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
@@ -250,6 +249,15 @@ bool blk_mq_sched_try_merge(struct request_queue *q, 
struct bio *bio,
return false;
}
 }
+
+bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
+   struct request **merged_request)
+{
+   struct request *rq;
+   enum elv_merge type = elv_merge(q, &rq, bio);
+
+   return __blk_mq_try_merge(q, bio, merged_request, rq, type);
+}
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
 
 /*
-- 
2.9.5

[PATCH V5 6/8] block: introduce .last_merge and .hash to blk_mq_ctx

2017-09-30 Thread Ming Lei

Prepare for supporting bio merge to sw queue if no
blk-mq io scheduler is taken.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq.h   |  4 
 block/blk.h  |  3 +++
 block/elevator.c | 22 +++---
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 5bca6ce1f01d..85ea8615fecf 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -18,6 +18,10 @@ struct blk_mq_ctx {
unsigned long   rq_dispatched[2];
unsigned long   rq_merged;
 
+   /* bio merge via request hash table */
+   struct request  *last_merge;
+   DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
+
/* incremented at completion time */
unsigned long   cacheline_aligned_in_smp rq_completed[2];
 
diff --git a/block/blk.h b/block/blk.h
index eb3436d4a73f..fa4f232afc18 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -198,6 +198,9 @@ static inline struct request *rqhash_find(struct hlist_head 
*hash, sector_t offs
return NULL;
 }
 
+enum elv_merge elv_merge_ctx(struct request_queue *q, struct request **req,
+struct bio *bio, struct blk_mq_ctx *ctx);
+
 void blk_insert_flush(struct request *rq);
 
 static inline struct request *__elv_next_request(struct request_queue *q)
diff --git a/block/elevator.c b/block/elevator.c
index 2424aea85393..0e13e5c18982 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -471,6 +471,13 @@ enum elv_merge elv_merge(struct request_queue *q, struct 
request **req,
return __elv_merge(q, req, bio, q->elevator->hash, q->last_merge);
 }
 
+enum elv_merge elv_merge_ctx(struct request_queue *q, struct request **req,
+   struct bio *bio, struct blk_mq_ctx *ctx)
+{
+   WARN_ON_ONCE(!q->mq_ops);
+   return __elv_merge(q, req, bio, ctx->hash, ctx->last_merge);
+}
+
 /*
  * Attempt to do an insertion back merge. Only check for the case where
  * we can append 'rq' to an existing request, so we can throw 'rq' away
@@ -516,16 +523,25 @@ void elv_merged_request(struct request_queue *q, struct 
request *rq,
enum elv_merge type)
 {
struct elevator_queue *e = q->elevator;
+   struct hlist_head *hash = e->hash;
+
+   /* we do bio merge on blk-mq sw queue */
+   if (q->mq_ops && !e) {
+   rq->mq_ctx->last_merge = rq;
+   hash = rq->mq_ctx->hash;
+   goto reposition;
+   }
+
+   q->last_merge = rq;
 
if (e->uses_mq && e->type->ops.mq.request_merged)
e->type->ops.mq.request_merged(q, rq, type);
else if (!e->uses_mq && e->type->ops.sq.elevator_merged_fn)
e->type->ops.sq.elevator_merged_fn(q, rq, type);
 
+ reposition:
if (type == ELEVATOR_BACK_MERGE)
-   elv_rqhash_reposition(q, rq);
-
-   q->last_merge = rq;
+   rqhash_reposition(hash, rq);
 }
 
 void elv_merge_requests(struct request_queue *q, struct request *rq,
-- 
2.9.5

[PATCH V5 8/8] blk-mq: improve bio merge from blk-mq sw queue

2017-09-30 Thread Ming Lei

This patch uses hash table to do bio merge from sw queue,
then we can align to blk-mq scheduler/block legacy's way
for bio merge.

Turns out bio merge via hash table is more efficient than
simple merge on the last 8 requests in sw queue. On SCSI SRP,
it is observed ~10% IOPS is increased in sequential IO test
with this patch.

It is also one step forward to real 'none' scheduler, in which
way the blk-mq scheduler framework can be more clean.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 49 -
 block/blk-mq.c   | 28 +---
 2 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a58f4746317c..8262ae71e0cd 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -260,50 +260,25 @@ bool blk_mq_sched_try_merge(struct request_queue *q, 
struct bio *bio,
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
 
-/*
- * Reverse check our software queue for entries that we could potentially
- * merge with. Currently includes a hand-wavy stop count of 8, to not spend
- * too much time checking for merges.
- */
-static bool blk_mq_attempt_merge(struct request_queue *q,
+static bool blk_mq_ctx_try_merge(struct request_queue *q,
 struct blk_mq_ctx *ctx, struct bio *bio)
 {
-   struct request *rq;
-   int checked = 8;
+   struct request *rq, *free = NULL;
+   enum elv_merge type;
+   bool merged;
 
lockdep_assert_held(&ctx->lock);
 
-   list_for_each_entry_reverse(rq, &ctx->rq_list, queuelist) {
-   bool merged = false;
-
-   if (!checked--)
-   break;
-
-   if (!blk_rq_merge_ok(rq, bio))
-   continue;
+   type = elv_merge_ctx(q, &rq, bio, ctx);
+   merged = __blk_mq_try_merge(q, bio, &free, rq, type);
 
-   switch (blk_try_merge(rq, bio)) {
-   case ELEVATOR_BACK_MERGE:
-   if (blk_mq_sched_allow_merge(q, rq, bio))
-   merged = bio_attempt_back_merge(q, rq, bio);
-   break;
-   case ELEVATOR_FRONT_MERGE:
-   if (blk_mq_sched_allow_merge(q, rq, bio))
-   merged = bio_attempt_front_merge(q, rq, bio);
-   break;
-   case ELEVATOR_DISCARD_MERGE:
-   merged = bio_attempt_discard_merge(q, rq, bio);
-   break;
-   default:
-   continue;
-   }
+   if (free)
+   blk_mq_free_request(free);
 
-   if (merged)
-   ctx->rq_merged++;
-   return merged;
-   }
+   if (merged)
+   ctx->rq_merged++;
 
-   return false;
+   return merged;
 }
 
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
@@ -321,7 +296,7 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, 
struct bio *bio)
if (hctx->flags & BLK_MQ_F_SHOULD_MERGE) {
/* default per sw-queue merge */
spin_lock(&ctx->lock);
-   ret = blk_mq_attempt_merge(q, ctx, bio);
+   ret = blk_mq_ctx_try_merge(q, ctx, bio);
spin_unlock(&ctx->lock);
}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 561a663cdd0e..9a3a561a63b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -849,6 +849,18 @@ static void blk_mq_timeout_work(struct work_struct *work)
blk_queue_exit(q);
 }
 
+static void blk_mq_ctx_remove_rq_list(struct blk_mq_ctx *ctx,
+   struct list_head *head)
+{
+   struct request *rq;
+
+   lockdep_assert_held(&ctx->lock);
+
+   list_for_each_entry(rq, head, queuelist)
+   rqhash_del(rq);
+   ctx->last_merge = NULL;
+}
+
 struct flush_busy_ctx_data {
struct blk_mq_hw_ctx *hctx;
struct list_head *list;
@@ -863,6 +875,7 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int 
bitnr, void *data)
sbitmap_clear_bit(sb, bitnr);
spin_lock(&ctx->lock);
list_splice_tail_init(&ctx->rq_list, flush_data->list);
+   blk_mq_ctx_remove_rq_list(ctx, flush_data->list);
spin_unlock(&ctx->lock);
return true;
 }
@@ -892,17 +905,23 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, 
unsigned int bitnr, void *d
struct dispatch_rq_data *dispatch_data = data;
struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+   struct request *rq = NULL;
 
spin_lock(&ctx->lock);
if (unlikely(!list_empty(&ctx->rq_list))) {
-   dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
-   list_del_init(&dispatch_data->rq->queuelist);
+   rq = list_entry_rq(ctx->rq_list.ne

[PATCH V5 4/8] block: move actual bio merge code into __elv_merge

2017-09-30 Thread Ming Lei

So that we can reuse __elv_merge() to merge bio
into requests from sw queue in the following patches.

No functional change.

Tested-by: Oleksandr Natalenko 
Tested-by: Tom Nguyen 
Tested-by: Paolo Valente 
Signed-off-by: Ming Lei 
---
 block/elevator.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 824cc3e69ac3..e11c7873fc21 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -409,8 +409,9 @@ void elv_dispatch_add_tail(struct request_queue *q, struct 
request *rq)
 }
 EXPORT_SYMBOL(elv_dispatch_add_tail);
 
-enum elv_merge elv_merge(struct request_queue *q, struct request **req,
-   struct bio *bio)
+static enum elv_merge __elv_merge(struct request_queue *q,
+   struct request **req, struct bio *bio,
+   struct hlist_head *hash, struct request *last_merge)
 {
struct elevator_queue *e = q->elevator;
struct request *__rq;
@@ -427,11 +428,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct 
request **req,
/*
 * First try one-hit cache.
 */
-   if (q->last_merge && elv_bio_merge_ok(q->last_merge, bio)) {
-   enum elv_merge ret = blk_try_merge(q->last_merge, bio);
+   if (last_merge && elv_bio_merge_ok(last_merge, bio)) {
+   enum elv_merge ret = blk_try_merge(last_merge, bio);
 
if (ret != ELEVATOR_NO_MERGE) {
-   *req = q->last_merge;
+   *req = last_merge;
return ret;
}
}
@@ -442,7 +443,7 @@ enum elv_merge elv_merge(struct request_queue *q, struct 
request **req,
/*
 * See if our hash lookup can find a potential backmerge.
 */
-   __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
+   __rq = rqhash_find(hash, bio->bi_iter.bi_sector);
if (__rq && elv_bio_merge_ok(__rq, bio)) {
*req = __rq;
return ELEVATOR_BACK_MERGE;
@@ -456,6 +457,12 @@ enum elv_merge elv_merge(struct request_queue *q, struct 
request **req,
return ELEVATOR_NO_MERGE;
 }
 
+enum elv_merge elv_merge(struct request_queue *q, struct request **req,
+   struct bio *bio)
+{
+   return __elv_merge(q, req, bio, q->elevator->hash, q->last_merge);
+}
+
 /*
  * Attempt to do an insertion back merge. Only check for the case where
  * we can append 'rq' to an existing request, so we can throw 'rq' away
-- 
2.9.5

[PATCH 2/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE

2017-09-30 Thread Ming Lei

If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
the queue in the three situations:

1) if BLK_MQ_S_SCHED_RESTART is set
- queue is rerun after one rq is completed, see blk_mq_sched_restart()
which is run from blk_mq_free_request()

2) BLK_MQ_S_TAG_WAITING is set
- queue is rerun after one tag is freed

3) otherwise
- queue is run immediately in blk_mq_dispatch_rq_list()

This random dealy of running hw queue is introduced by commit
6077c2d706097c0(dm rq: Avoid that request processing stalls sporadically),
which claimed one request processing stalling is fixed,
but never explained the behind idea, and it is a workaound at most.
Even the question isn't explained by anyone in recent discussion.

Also calling blk_mq_delay_run_hw_queue() inside .queue_rq() is
a horrible hack because it makes BLK_MQ_S_SCHED_RESTART not
working, and will degrade I/O peformance a lot.

Finally this patch makes sure that dm-rq returns
BLK_STS_RESOURCE to blk-mq only when underlying queue is
out of resource, so we switch to return DM_MAPIO_DELAY_REQUEU
if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED is set in
multipath_clone_and_map().

Signed-off-by: Ming Lei 
---
 drivers/md/dm-mpath.c | 4 +---
 drivers/md/dm-rq.c| 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index e8094d8fbe0d..97e4bd100fa1 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -484,9 +484,7 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
return DM_MAPIO_KILL;
} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
   test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) {
-   if (pg_init_all_paths(m))
-   return DM_MAPIO_DELAY_REQUEUE;
-   return DM_MAPIO_REQUEUE;
+   return DM_MAPIO_DELAY_REQUEUE;
}
 
memset(mpio, 0, sizeof(*mpio));
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f5e2b6967357..46f012185b43 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -758,7 +758,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx 
*hctx,
/* Undo dm_start_request() before requeuing */
rq_end_stats(md, rq);
rq_completed(md, rq_data_dir(rq), false);
-   blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
return BLK_STS_RESOURCE;
}
 
-- 
2.9.5

[PATCH 0/5] dm-rq: improve sequential I/O performance

2017-09-30 Thread Ming Lei

Hi,

This 1st one patch removes one log message which can be triggered
very easily.

The 2nd patch removes the workaround of blk_mq_delay_run_hw_queue()
in case of requeue, this way isn't necessary, and more worse, it
makes BLK_MQ_S_SCHED_RESTART not working, and degarde I/O performance.

The 3rd patch return DM_MAPIO_REQUEUE to dm-rq if underlying request
allocation fails, then we can return BLK_STS_RESOURCE from dm-rq
to blk-mq, so that blk-mq can hold the requests to be dequeued.

The 4th patch is a pre-patch for the 5th one, becasue even though
underlying request allocation succeeds, its queue may be
busy and we can get this feedback from blk_insert_cloned_request()
now. This patch trys to cache the allocated request so that
it may be reused in next dispatch to underlying queue.

The 5th patch improves sequential I/O performance by returning
STS_RESOURCE if underlying queue is busy.

In the commit log of the 5th patch, I/O IOPS data is provided and
we can see sequential I/O performance is improved a lot with this
patchset.

This patchset depends on the following two patchset:

[1] [PATCH V5 0/7] blk-mq-sched: improve sequential I/O performance(part 1)

https://marc.info/?l=linux-block&m=150676854821077&w=2

[2] [PATCH V5 0/8] blk-mq: improve bio merge for none scheduler

https://marc.info/?l=linux-block&m=150677085521416&w=2

Any comments are welcome! 

Thanks,
Ming

Ming Lei (5):
  dm-mpath: remove annoying message of 'blk_get_request() returned -11'
  dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of
BLK_STS_RESOURCE
  dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  dm-mpath: cache ti->clone during requeue
  dm-rq: improve I/O merge by dealing with underlying STS_RESOURCE

 block/blk-mq.c| 17 +---
 drivers/md/dm-mpath.c | 51 ++
 drivers/md/dm-rq.c| 56 +--
 3 files changed, 80 insertions(+), 44 deletions(-)

-- 
2.9.5

[PATCH 1/5] dm-mpath: remove annoying message of 'blk_get_request() returned -11'

2017-09-30 Thread Ming Lei

It is very normal to see allocation failure, so not necessary
to dump it and annoy people.

Signed-off-by: Ming Lei 
---
 drivers/md/dm-mpath.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 11f273d2f018..e8094d8fbe0d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -499,8 +499,6 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
if (IS_ERR(clone)) {
/* EBUSY, ENODEV or EWOULDBLOCK: requeue */
bool queue_dying = blk_queue_dying(q);
-   DMERR_LIMIT("blk_get_request() returned %ld%s - requeuing",
-   PTR_ERR(clone), queue_dying ? " (path offline)" : 
"");
if (queue_dying) {
atomic_inc(&m->pg_init_in_progress);
activate_or_offline_path(pgpath);
-- 
2.9.5

[PATCH 3/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure

2017-09-30 Thread Ming Lei

blk-mq will rerun queue via RESTART after one request is completed,
so not necessary to wait random time for requeuing, we should trust
blk-mq to do it.

More importantly, we need return BLK_STS_RESOURCE to blk-mq
so that dequeue from I/O scheduler can be stopped, then
I/O merge gets improved.

Signed-off-by: Ming Lei 
---
 drivers/md/dm-mpath.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 97e4bd100fa1..9ee223170ee9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -500,8 +500,20 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
if (queue_dying) {
atomic_inc(&m->pg_init_in_progress);
activate_or_offline_path(pgpath);
+   return DM_MAPIO_DELAY_REQUEUE;
}
-   return DM_MAPIO_DELAY_REQUEUE;
+
+   /*
+* blk-mq's SCHED_RESTART can cover this requeue, so
+* we needn't to deal with it by DELAY_REQUEUE. More
+* importantly, we have to return DM_MAPIO_REQUEUE
+* so that blk-mq can get the queue busy feedback,
+* otherwise I/O merge can be hurt.
+*/
+   if (q->mq_ops)
+   return DM_MAPIO_REQUEUE;
+   else
+   return DM_MAPIO_DELAY_REQUEUE;
}
clone->bio = clone->biotail = NULL;
clone->rq_disk = bdev->bd_disk;
-- 
2.9.5

[PATCH 5/5] dm-rq: improve I/O merge by dealing with underlying STS_RESOURCE

2017-09-30 Thread Ming Lei

If the underlying queue returns BLK_STS_RESOURCE, we let dm-rq
handle the requeue instead of blk-mq, then I/O merge can be
improved because underlying's out-of-resource can be perceived
and handled by dm-rq now.

Follows IOPS test of mpath on lpfc, fio(libaio, bs:4k, dio,
queue_depth:64, 8 jobs).

1) blk-mq none scheduler
-
 IOPS(K)  |v4.14-rc2|v4.14-rc2 with| v4.14-rc2 with
  | |[1][2]| [1] [2] [3]
-
read  |   53.69 |   40.26  |   94.61
-
randread  |   24.64 |   30.08  |   35.57
-
write |   39.55 |   41.51  |  216.84
-
randwrite |   33.97 |   34.27  |   33.98
-

2) blk-mq mq-deadline scheduler
-
 IOPS(K)  |v4.14-rc2|v4.14-rc2 with| v4.14-rc2 with
  | |[1][2]| [1] [2] [3]
-
 IOPS(K)  |MQ-DEADLINE  |MQ-DEADLINE  |MQ-DEADLINE
-
read  |   23.81 |   21.91 |   89.94
-
randread  |   38.47 |   38.96 |   38.02
-
write |   39.52 |40.2 |  225.75
-
randwrite |34.8 |   33.73 |   33.44
-

[1] [PATCH V5 0/7] blk-mq-sched: improve sequential I/O performance(part 1)

https://marc.info/?l=linux-block&m=150676854821077&w=2

[2] [PATCH V5 0/8] blk-mq: improve bio merge for none scheduler

https://marc.info/?l=linux-block&m=150677085521416&w=2

[3] this patchset

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 17 +
 drivers/md/dm-rq.c | 14 --
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9a3a561a63b5..58d2268f9733 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1467,17 +1467,6 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, 
struct request *rq,
blk_mq_hctx_mark_pending(hctx, ctx);
 }
 
-static void blk_mq_request_direct_insert(struct blk_mq_hw_ctx *hctx,
-struct request *rq)
-{
-   spin_lock(&hctx->lock);
-   list_add_tail(&rq->queuelist, &hctx->dispatch);
-   set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
-   spin_unlock(&hctx->lock);
-
-   blk_mq_run_hw_queue(hctx, false);
-}
-
 /*
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
@@ -1487,12 +1476,8 @@ blk_status_t blk_mq_request_bypass_insert(struct request 
*rq)
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
blk_qc_t cookie;
-   blk_status_t ret;
 
-   ret = blk_mq_try_issue_directly(hctx, rq, &cookie, true);
-   if (ret == BLK_STS_RESOURCE)
-   blk_mq_request_direct_insert(hctx, rq);
-   return ret;
+   return blk_mq_try_issue_directly(hctx, rq, &cookie, true);
 }
 
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2ef524bddd38..feb49c4d6fa2 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -405,7 +405,7 @@ static void end_clone_request(struct request *clone, 
blk_status_t error)
dm_complete_request(tio->orig, error);
 }
 
-static void dm_dispatch_clone_request(struct request *clone, struct request 
*rq)
+static blk_status_t dm_dispatch_clone_request(struct request *clone, struct 
request *rq)
 {
blk_status_t r;
 
@@ -417,6 +417,7 @@ static void dm_dispatch_clone_request(struct request 
*clone, struct request *rq)
if (r != BLK_STS_OK && r != BLK_STS_RESOURCE)
/* must complete clone in terms of original request */
dm_complete_request(rq, r);
+   return r;
 }
 
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
@@ -490,8 +491,10 @@ static int map_request(struct dm_rq_target_io *tio)
struct request *rq = tio->orig;
struct request *cache = tio->clone;
struct request *clone = cache;
+   blk_status_t ret;
 
r = ti->type->clone_and_map_rq(ti, rq, &tio->info, &clone);
+ again:
switch (r) {
case DM_MAPIO_SUBMITTED:
/* The target has taken the I/O to submit by itself later */
@@ -509,7 +512,14 @@ static int map_request(struct dm_rq_target_io *tio)
/* The target has remapped the I/O so dispatch it */

[PATCH 4/5] dm-mpath: cache ti->clone during requeue

2017-09-30 Thread Ming Lei

During requeue, block layer won't change the request any
more, such as no merge, so we can cache ti->clone and
let .clone_and_map_rq check if the cache can be hit.

Signed-off-by: Ming Lei 
---
 drivers/md/dm-mpath.c | 31 ---
 drivers/md/dm-rq.c| 41 +
 2 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 9ee223170ee9..52e4730541fd 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -457,6 +457,11 @@ do {   
\
 dm_noflush_suspending((m)->ti));   \
 } while (0)
 
+static void multipath_release_clone(struct request *clone)
+{
+   blk_put_request(clone);
+}
+
 /*
  * Map cloned requests (request-based multipath)
  */
@@ -470,7 +475,7 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
struct block_device *bdev;
struct dm_mpath_io *mpio = get_mpio(map_context);
struct request_queue *q;
-   struct request *clone;
+   struct request *clone = *__clone;
 
/* Do we need to select a new pgpath? */
pgpath = lockless_dereference(m->current_pgpath);
@@ -493,7 +498,23 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
 
bdev = pgpath->path.dev->bdev;
q = bdev_get_queue(bdev);
-   clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC);
+
+   /*
+* This request may be from requeue path, and its clone
+* may have been allocated before. We need to check
+* if the cached clone can be hit.
+*/
+   if (clone) {
+   if (clone->q != q) {
+   blk_rq_unprep_clone(clone);
+   multipath_release_clone(clone);
+   clone = NULL;
+   } else
+   goto start_io;
+   }
+
+   if (!clone)
+   clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, 
GFP_ATOMIC);
if (IS_ERR(clone)) {
/* EBUSY, ENODEV or EWOULDBLOCK: requeue */
bool queue_dying = blk_queue_dying(q);
@@ -520,6 +541,7 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
*__clone = clone;
 
+ start_io:
if (pgpath->pg->ps.type->start_io)
pgpath->pg->ps.type->start_io(&pgpath->pg->ps,
  &pgpath->path,
@@ -527,11 +549,6 @@ static int multipath_clone_and_map(struct dm_target *ti, 
struct request *rq,
return DM_MAPIO_REMAPPED;
 }
 
-static void multipath_release_clone(struct request *clone)
-{
-   blk_put_request(clone);
-}
-
 /*
  * Map cloned bios (bio-based multipath)
  */
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 46f012185b43..2ef524bddd38 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -221,6 +221,12 @@ static void dm_end_request(struct request *clone, 
blk_status_t error)
blk_rq_unprep_clone(clone);
tio->ti->type->release_clone_rq(clone);
 
+   /*
+* We move the clearing from tio_init in .queue_rq to here because
+* tio->clone may be cached during requeue
+*/
+   tio->clone = NULL;
+
rq_end_stats(md, rq);
if (!rq->q->mq_ops)
blk_end_request_all(rq, error);
@@ -267,11 +273,15 @@ static void dm_requeue_original_request(struct 
dm_rq_target_io *tio, bool delay_
int rw = rq_data_dir(rq);
unsigned long delay_ms = delay_requeue ? 100 : 0;
 
+   /*
+* This request won't be changed any more during requeue,
+* so we cache tio->clone and let .clone_and_map_rq decide
+* to use the cached clone or allocate a new clone, and
+* the cached clone has to be freed before allocating a
+* new one.
+*/
+
rq_end_stats(md, rq);
-   if (tio->clone) {
-   blk_rq_unprep_clone(tio->clone);
-   tio->ti->type->release_clone_rq(tio->clone);
-   }
 
if (!rq->q->mq_ops)
dm_old_requeue_request(rq, delay_ms);
@@ -448,7 +458,6 @@ static void init_tio(struct dm_rq_target_io *tio, struct 
request *rq,
 {
tio->md = md;
tio->ti = NULL;
-   tio->clone = NULL;
tio->orig = rq;
tio->error = 0;
tio->completed = 0;
@@ -456,8 +465,12 @@ static void init_tio(struct dm_rq_target_io *tio, struct 
request *rq,
 * Avoid initializing info for blk-mq; it passes
 * target-specific data through info.ptr
 * (see: dm_mq_init_request)
+*
+* If tio->clone is cached during requeue, we don't
+* clear tio->info, and delay the initialization
+* to .clone_and_map_rq if the cache isn't hit.
 */
-   if (!md->init_tio_pdu)
+   if (!md->init_t

Re: random insta-reboots on AMD Phenom II

2017-09-30 Thread Borislav Petkov

On Sat, Sep 30, 2017 at 01:29:03PM +0200, Adam Borowski wrote:
> On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote:
> > On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote:
> > > Any hints how to debug this?
> > 
> > Do
> > rdmsr -a 0xc0010015
> > as root and paste it here.
> 
> 110
> 110
> 110
> 110
> 110
> 110
> 
> on both 4.13.4 and 4.14-rc2+.

Boot into -rc2+ and do as root:

# wrmsr -a 0xc0010015 0x118

If the issue gets fixed then Mr. Luto better revert the new lazy TLB
flushing fun'n'games for 4.14 before it is too late and that kernel
releases b0rked.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

[PATCH] x86/CPU/AMD, mm: Extend with mem_encrypt=sme option

2017-09-30 Thread Borislav Petkov

On Fri, Sep 29, 2017 at 06:06:52PM -0500, Brijesh Singh wrote:
> The mem_encrypt=on activates both SME and SEV. Add a new argument to disable
> the SEV and allow SME. The argument can be useful when SEV has issues and
> we want to disable it.
> 
> early_detect_mem_encrypt() [cpu/amd.com] will need to know the state of
> the mem_encrypt= argument. Since early_detect_mem_encrypt() is not defined
> as __init hence we are not able to use the 'boot_command_line' variable to
> parse the cmdline argument. We introduce a new function me_cmdline_state()
> to get the cmdline state from mem_encrypt.c.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Borislav Petkov 
> Cc: k...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---

Ok, I went and simplified this whole code path a bit because it was
needlessly a bit too complex. Below is the result, only compile-tested.

Brijesh, Tom, guys, please check my logic, I might've missed a case.

Thanks.

---
From: Borislav Petkov 
Date: Sat, 30 Sep 2017 13:33:26 +0200
Subject: [PATCH] x86/CPU/AMD, mm: Extend with mem_encrypt=sme option

Extend the mem_encrypt= cmdline option with the "sme" argument so that
one can enable SME only (i.e., this serves as a SEV chicken bit). While
at it, streamline and document the flow logic here:

1. Check whether the SME CPUID leaf is present

2. Check whether the HW has enabled SME/SEV

3. Only *then* look at any potential command line params because doing
so before is pointless.

3.1 mem_encrypt=on  - enable both SME/SEV
3.2 mem_encrypt=sme - enable only SME
3.3 mem_encrypt=off - disable both

In addition, CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT enables both if
the kernel is built with it enabled.

While at it, shorten variable names, simplify code flow.

This is based on a patch by Brijesh Singh .

Signed-off-by: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: x...@kernel.org
---
 arch/x86/include/asm/mem_encrypt.h |  2 +
 arch/x86/kernel/cpu/amd.c  |  6 +++
 arch/x86/mm/mem_encrypt.c  | 82 +++---
 3 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 3ba68c92be1b..175310f00202 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -19,6 +19,8 @@
 
 #include 
 
+extern bool sev_enabled;
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern u64 sme_me_mask;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c1234aa0550c..d0669f3966a6 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 # include 
@@ -32,6 +33,8 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, 
const int *erratum);
  */
 static u32 nodes_per_socket = 1;
 
+bool sev_enabled __section(.data) = false;
+
 static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p)
 {
u32 gprs[8] = { 0 };
@@ -588,6 +591,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (IS_ENABLED(CONFIG_X86_32))
goto clear_all;
 
+   if (!sev_enabled)
+   goto clear_sev;
+
rdmsrl(MSR_K7_HWCR, msr);
if (!(msr & MSR_K7_HWCR_SMMLOCK))
goto clear_sev;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 057417a3d9b4..9b83bc1be7c0 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -27,12 +27,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
-static char sme_cmdline_arg[] __initdata = "mem_encrypt";
-static char sme_cmdline_on[]  __initdata = "on";
-static char sme_cmdline_off[] __initdata = "off";
+static char sme_cmd[] __initdata = "mem_encrypt";
+static char sme_cmd_on[]  __initdata = "on";
+static char sme_cmd_off[] __initdata = "off";
+static char sme_cmd_sme[] __initdata = "sme";
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -44,8 +46,6 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
 DEFINE_STATIC_KEY_FALSE(__sev);
 EXPORT_SYMBOL_GPL(__sev);
 
-static bool sev_enabled __section(.data) = false;
-
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
@@ -768,13 +768,13 @@ void __init sme_encrypt_kernel(void)
 
 void __init __nostackprotector sme_enable(struct boot_params *bp)
 {
-   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+   const char *cmdline_ptr, *cmd, *cmd_on, *cmd_off, *cmd_sme;
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
-   bool active_by_default;
-   unsigned long me_mask;
+   u64 me_mask, msr;
char b

Re: [RESEND RFC PATCH 0/7] sun8i H3 HDMI glue driver for DW HDMI

2017-09-30 Thread Alexey Kardashevskiy

On 21/09/17 06:01, Jernej Skrabec wrote:
> [added media mailing list due to CEC question]
> 
> This patch series adds a HDMI glue driver for Allwinner H3 SoC. For now, only
> video and CEC functionality is supported. Audio needs more tweaks.
> 
> Series is based on the H3 DE2 patch series available on mailing list:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-August/522697.html
> (ignore patches marked with [NOT FOR REVIEW NOW] tag)
> 
> Patch 1 adds support for polling plug detection since custom PHY used here
> doesn't support HPD interrupt.
> 
> Patch 2 enables overflow workaround for v1.32a. This HDMI controller exhibits
> same issues as HDMI controller used in iMX6 SoCs.
> 
> Patch 3 adds CLK_SET_RATE_PARENT to hdmi clock.
> 
> Patch 4 adds dt bindings documentation.
> 
> Patch 5 adds actual H3 HDMI glue driver.
> 
> Patch 6 and 7 add HDMI node to DT and enable it where needed.
> 
> Allwinner used DW HDMI controller in a non standard way:
> - register offsets obfuscation layer, which can fortunately be turned off
> - register read lock, which has to be disabled by magic number
> - custom PHY, which have to be initialized before DW HDMI controller
> - non standard clocks
> - no HPD interrupt
> 
> Because of that, I have two questions:
> - Since HPD have to be polled, is it enough just to enable poll mode? I'm
>   mainly concerned about invalidating CEC address here.
> - PHY has to be initialized before DW HDMI controller to disable offset
>   obfuscation and read lock among other things. This means that all clocks 
> have
>   to be enabled in glue driver. This poses a problem, since when using
>   component model, dw-hdmi bridge uses drvdata for it's own private data and
>   prevents glue layer to pass a pointer to unbind function, where clocks 
> should
>   be disabled. I noticed same issue in meson DW HDMI glue driver, where clocks
>   are also not disabled when unbind callback is called. I noticed that when H3
>   SoC is shutdown, HDMI output is still enabled and lastest image is shown on
>   monitor until it is unplugged from power supply. Is there any simple 
> solution
>   to this?
> 
> Chen-Yu,
> TL Lim was unable to obtain any answer from Allwinner about HDMI clocks. I 
> think
> it is safe to assume that divider in HDMI clock doesn't have any effect.
> 
> Branch based on linux-next from 1. September with integrated patches is
> available here:
> https://github.com/jernejsk/linux-1/tree/h3_hdmi_rfc


Out of curiosity I tried this one and got:



[0.071711] sun4i-usb-phy 1c19400.phy: Couldn't request ID GPIO
[0.074809] sun8i-h3-pinctrl 1c20800.pinctrl: initialized sunXi PIO driver
[0.076167] sun8i-h3-r-pinctrl 1f02c00.pinctrl: initialized sunXi PIO driver
[0.148009] [ cut here ]
[0.148035] WARNING: CPU: 0 PID: 1 at
drivers/clk/sunxi-ng/ccu_common.c:41 ccu_nm_set_rate+0x1d0/0x274
[0.148046] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc6-next-20170825-aik-aik #24
[0.148051] Hardware name: Allwinner sun8i Family
[0.148082] [] (unwind_backtrace) from []
(show_stack+0x10/0x14)
[0.148101] [] (show_stack) from []
(dump_stack+0x84/0x98)
[0.148117] [] (dump_stack) from [] (__warn+0xe0/0xfc)
[0.148132] [] (__warn) from []
(warn_slowpath_null+0x20/0x28)
[0.148145] [] (warn_slowpath_null) from []
(ccu_nm_set_rate+0x1d0/0x274)
[0.148161] [] (ccu_nm_set_rate) from []
(clk_change_rate+0x19c/0x250)
[0.148175] [] (clk_change_rate) from []
(clk_core_set_rate_nolock+0x68/0xb0)
[0.148187] [] (clk_core_set_rate_nolock) from []
(clk_set_rate+0x20/0x30)
[0.148202] [] (clk_set_rate) from []
(of_clk_set_defaults+0x200/0x364)
[0.148219] [] (of_clk_set_defaults) from []
(platform_drv_probe+0x18/0xb0)
[0.148233] [] (platform_drv_probe) from []
(driver_probe_device+0x234/0x2e8)
[0.148246] [] (driver_probe_device) from []
(__driver_attach+0xb8/0xbc)
[0.148258] [] (__driver_attach) from [

[PATCH 0/2] Fix two pinctrl issues

2017-09-30 Thread David Wu

They are:
1. Fix the rk3399 gpio0 and gpio1 banks' drive strength offset.
2. Fix the correct routing config for the gmac-m1 pins between rmii and rgmii.

David Wu (2):
  pinctrl: rockchip: Fix the rk3399 gpio0 and gpio1 banks' drv_offset at
pmu grf
  pinctrl: rockchip: Fix the correct routing config for the gmac-m1 pins
of rmii and rgmii

 drivers/pinctrl/pinctrl-rockchip.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

-- 
1.9.1

[PATCH 1/2] pinctrl: rockchip: Fix the rk3399 gpio0 and gpio1 banks' drv_offset at pmu grf

2017-09-30 Thread David Wu

The offset of gpio0 and gpio1 bank drive strength is 0x8, not 0x4.
But the mux is 0x4, we couldn't use the IOMUX_WIDTH_4BIT flag, so
we give them actual offset.

Signed-off-by: David Wu 
---
 drivers/pinctrl/pinctrl-rockchip.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-rockchip.c 
b/drivers/pinctrl/pinctrl-rockchip.c
index b5cb785..c7c9beb 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -3456,8 +3456,8 @@ static int rockchip_pinctrl_probe(struct platform_device 
*pdev)
 DRV_TYPE_IO_1V8_ONLY,
 DRV_TYPE_IO_DEFAULT,
 DRV_TYPE_IO_DEFAULT,
-0x0,
-0x8,
+0x80,
+0x88,
 -1,
 -1,
 PULL_TYPE_IO_1V8_ONLY,
@@ -3473,10 +3473,10 @@ static int rockchip_pinctrl_probe(struct 
platform_device *pdev)
DRV_TYPE_IO_1V8_OR_3V0,
DRV_TYPE_IO_1V8_OR_3V0,
DRV_TYPE_IO_1V8_OR_3V0,
-   0x20,
-   0x28,
-   0x30,
-   0x38
+   0xa0,
+   0xa8,
+   0xb0,
+   0xb8
),
PIN_BANK_DRV_FLAGS_PULL_FLAGS(2, 32, "gpio2", DRV_TYPE_IO_1V8_OR_3V0,
  DRV_TYPE_IO_1V8_OR_3V0,
-- 
1.9.1

[PATCH 2/2] pinctrl: rockchip: Fix the correct routing config for the gmac-m1 pins of rmii and rgmii

2017-09-30 Thread David Wu

If the gmac-m1 optimization(bit10) is selected, the gpio function
of gmac pins is not valid. We may use the rmii mode for gmac interface,
the pins such as rx_d2, rx_d3, which the rgmii mode used, but rmii not
used could be taken as gpio function. So gmac_rxd0m1 selects the bit2,
and gmac_rxd0m3 select bit10 is more correct.

Signed-off-by: David Wu 
---
 drivers/pinctrl/pinctrl-rockchip.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-rockchip.c 
b/drivers/pinctrl/pinctrl-rockchip.c
index c7c9beb..9e0cabf 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -900,12 +900,19 @@ static void rockchip_get_recalced_mux(struct 
rockchip_pin_bank *bank, int pin,
.route_offset = 0x50,
.route_val = BIT(16) | BIT(16 + 1) | BIT(0),
}, {
-   /* gmac-m1-optimized_rxd0 */
+   /* gmac-m1_rxd0 */
.bank_num = 1,
.pin = 11,
.func = 2,
.route_offset = 0x50,
-   .route_val = BIT(16 + 2) | BIT(16 + 10) | BIT(2) | BIT(10),
+   .route_val = BIT(16 + 2) | BIT(2),
+   }, {
+   /* gmac-m1-optimized_rxd3 */
+   .bank_num = 1,
+   .pin = 14,
+   .func = 2,
+   .route_offset = 0x50,
+   .route_val = BIT(16 + 10) | BIT(10),
}, {
/* pdm_sdi0m0 */
.bank_num = 2,
-- 
1.9.1

Re: random insta-reboots on AMD Phenom II

2017-09-30 Thread Markus Trippelsdorf

On 2017.09.30 at 13:53 +0200, Borislav Petkov wrote:
> On Sat, Sep 30, 2017 at 01:29:03PM +0200, Adam Borowski wrote:
> > On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote:
> > > On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote:
> > > > Any hints how to debug this?
> > > 
> > > Do
> > > rdmsr -a 0xc0010015
> > > as root and paste it here.
> > 
> > 110
> > 110
> > 110
> > 110
> > 110
> > 110
> > 
> > on both 4.13.4 and 4.14-rc2+.
> 
> Boot into -rc2+ and do as root:
> 
> # wrmsr -a 0xc0010015 0x118
> 
> If the issue gets fixed then Mr. Luto better revert the new lazy TLB
> flushing fun'n'games for 4.14 before it is too late and that kernel
> releases b0rked.

The issue does get fixed by setting TlbCacheDis to 1. I have been
running it for the last few weeks without any problems. 
Performance is not affected at all. So it might by easier to just set
the bit for older AMD processors as a boot quirk.
Changing the TLB code so late might not be a good idea...

-- 
Markus

Re: [PATCH 01/12] usb: mtu3: fix error return code in ssusb_gadget_init()

2017-09-30 Thread Sergei Shtylyov


Hello!

On 9/28/2017 3:17 AM, Chunfeng Yun wrote:


When fail to get irq number, platform_get_irq() may return


   Failing. IRQ. :-)


-EPROBE_DEFER, but we ignore it and always return -ENODEV,
so fix it.

Signed-off-by: Chunfeng Yun 
---
  drivers/usb/mtu3/mtu3_core.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/mtu3/mtu3_core.c b/drivers/usb/mtu3/mtu3_core.c
index 99c65b0..9475798 100644
--- a/drivers/usb/mtu3/mtu3_core.c
+++ b/drivers/usb/mtu3/mtu3_core.c
@@ -774,9 +774,9 @@ int ssusb_gadget_init(struct ssusb_mtk *ssusb)
return -ENOMEM;
  
  	mtu->irq = platform_get_irq(pdev, 0);

-   if (mtu->irq <= 0) {
+   if (mtu->irq < 0) {


   This is good as the function no longer returns 0 on error. Even when it 
did, 0 could mean a valid IRQ as well...



dev_err(dev, "fail to get irq number\n");
-   return -ENODEV;
+   return mtu->irq;
}
dev_info(dev, "irq %d\n", mtu->irq);
  


MBR, Sergei

Re: [PATCH V2] PCI: AER: fix deadlock in do_recovery

2017-09-30 Thread Sinan Kaya

On 9/30/2017 1:49 AM, Govindarajulu Varadarajan wrote:
> This patch does a pci_bus_walk and adds all the devices to a list. After
> unlocking (up_read) &pci_bus_sem, we go through the list and call
> err_handler of the devices with devic_lock() held. This way, we dont try
> to hold both locks at same time.

I do like this approach with some more feedback.

I need a little bit of help here from someone that knows get/put device calls.

I understand get_device() and put_device() are there to increment/decrement
reference counters. This patch seems to use them as an alternative for 
device_lock()
and device_unlock() API. 

If this is a good assumption, then you can get away with just replacing 
device_lock()
with get_device() and device_unlock() with put_device() in the existing code as
well. Then, you don't need to build a linklist.

A nit is your history messages in the commit message belongs to a cover letter.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Nouveau nullptr on NVIDIA NVA8

2017-09-30 Thread Woody Suwalski

Starting with the drm merge af3c8d98508d37541d4bf57f13a984a7f73a328c for 
4.13-rc1, the NVidia NVS3100M display on Dell Latitude E6410 had a 
nullptr crash on startup. As a result later the suspend2ram was locking 
up. Traced to a null ptr in nv50_mstm_service(), which seems to be 
called only from

nouveau_connector_hotplug().

Fixed by checking if mstm is not NULL before calling the service function.

[1.176456] Linux agpgart interface v0.103
[1.176610] [drm] radeon kernel modesetting enabled.
[1.17] [drm] amdgpu kernel modesetting enabled.
[1.176749] ACPI Warning: \_SB.PCI0.AGP.VID._DSM: Argument #4 type 
mismatch - Found [Buffer], ACPI requires [Package] (20170531/nsarguments-95)

[1.176780] ACPI: \_SB_.PCI0.AGP_.VID_: failed to evaluate _DSM
[1.176948] nouveau :01:00.0: NVIDIA GT218 (0a8600b1)
[1.196734] nouveau :01:00.0: bios: version 70.18.53.00.04
[1.198112] nouveau :01:00.0: fb: 512 MiB DDR3
[1.251598] [TTM] Zone  kernel: Available graphics memory: 1496332 kiB
[1.251600] [TTM] Initializing pool allocator
[1.251605] [TTM] Initializing DMA pool allocator
[1.251625] nouveau :01:00.0: DRM: VRAM: 512 MiB
[1.251628] nouveau :01:00.0: DRM: GART: 1048576 MiB
[1.251634] nouveau :01:00.0: DRM: TMDS table version 2.0
[1.251637] nouveau :01:00.0: DRM: DCB version 4.0
[1.251641] nouveau :01:00.0: DRM: DCB outp 00: 048003b6 0f200014
[1.251644] nouveau :01:00.0: DRM: DCB outp 01: 02033300 
[1.251647] nouveau :01:00.0: DRM: DCB outp 02: 088223a6 0f220010
[1.251650] nouveau :01:00.0: DRM: DCB outp 03: 08022362 00020010
[1.251652] nouveau :01:00.0: DRM: DCB outp 04: 028113c6 0f220010
[1.251655] nouveau :01:00.0: DRM: DCB outp 05: 02011382 00020010
[1.251657] nouveau :01:00.0: DRM: DCB conn 00: 2047
[1.251660] nouveau :01:00.0: DRM: DCB conn 01: 00101146
[1.251662] nouveau :01:00.0: DRM: DCB conn 02: 00410246
[1.251664] nouveau :01:00.0: DRM: DCB conn 03: 0300
[1.278401] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.278403] [drm] Driver supports precise vblank timestamp query.
[1.323205] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
[1.473861] nouveau :01:00.0: DRM: allocated 1440x900 fb: 
0x7, bo 8800b7baa000

[1.476208] fbcon: nouveaufb (fb0) is primary device
[1.830143] BUG: unable to handle kernel NULL pointer dereference at 
0020

[1.830152] IP: nv50_mstm_service+0xc/0xb0
[1.830153] PGD 0
[1.830154] P4D 0

[1.830158] Oops:  [#1] PREEMPT SMP
[1.830159] Modules linked in:
[1.830164] CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 4.13-pingu #1
[1.830166] Hardware name: Dell Inc. Latitude E6410/0K42JR, BIOS A16 
12/05/2013

[1.830171] Workqueue: events nvif_notify_work
[1.830173] task: 8800b79f1680 task.stack: c9154000
[1.830176] RIP: 0010:nv50_mstm_service+0xc/0xb0
[1.830178] RSP: :c9157df0 EFLAGS: 00010286
[1.830180] RAX: 8800b7096800 RBX: 8800b71b9418 RCX: 
8800b7096800
[1.830182] RDX: 8800b7a98b9c RSI: 002b RDI: 

[1.830183] RBP: 0008 R08: 8800b7096818 R09: 

[1.830185] R10:  R11: 0040 R12: 
8800b71b9000
[1.830187] R13:  R14:  R15: 
8800b71b9418
[1.830189] FS:  () GS:8800bb2c() 
knlGS:

[1.830191] CS:  0010 DS:  ES:  CR0: 80050033
[1.830193] CR2: 0020 CR3: 02209000 CR4: 
06e0

[1.830194] Call Trace:
[1.830200]  ? find_encoder+0x33/0x70
[1.830204]  ? nouveau_connector_hotplug+0x56/0x100
[1.830206]  ? nvif_notify_work+0x1f/0xa0
[1.830210]  ? nvkm_notify_work+0x64/0x70
[1.830214]  ? process_one_work+0x1a3/0x320
[1.830217]  ? worker_thread+0x42/0x3d0
[1.830220]  ? kthread+0xf2/0x130
[1.830223]  ? process_one_work+0x320/0x320
[1.830225]  ? kthread_create_on_node+0x40/0x40
[1.830228]  ? call_usermodehelper_exec_async+0x125/0x130
[1.830233]  ? ret_from_fork+0x25/0x30
[1.830234] Code: 89 04 24 e8 d7 2f ca ff 48 89 df e8 2f 72 c8 ff 48 
89 df e8 f7 ac 99 ff 48 83 c4 08 5b c3 90 41 54 55 48 8d 6f 08 53 48 83 
ec 18 <48> 8b 5f 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 c6

[1.830276] RIP: nv50_mstm_service+0xc/0xb0 RSP: c9157df0
[1.830277] CR2: 0020
[1.830281] ---[ end trace 9578c3b6b1cff0d4 ]---
[1.957826] Console: switching to colour frame buffer device 180x56
[1.975000] nouveau :01:00.0: fb0: nouveaufb frame buffer device
[1.975037] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 
on minor 0



Signed-off-by: Woody Suwalski 
---

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c 
b/drivers/gpu/drm/n

ce07a9415f ("locking/lockdep: Make check_prev_add() able to .."): BUG: unable to handle kernel NULL pointer dereference at 00000020

2017-09-30 Thread kernel test robot

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit ce07a9415f266e181a0a33033a5f7138760240a4
Author: Byungchul Park 
AuthorDate: Mon Aug 7 16:12:51 2017 +0900
Commit: Ingo Molnar 
CommitDate: Thu Aug 10 12:29:06 2017 +0200

locking/lockdep: Make check_prev_add() able to handle external stack_trace

Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.

A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().

Signed-off-by: Byungchul Park 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: a...@linux-foundation.org
Cc: boqun.f...@gmail.com
Cc: kernel-t...@lge.com
Cc: kir...@shutemov.name
Cc: npig...@gmail.com
Cc: wal...@google.com
Cc: wi...@infradead.org
Link: 
http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.p...@lge.com
Signed-off-by: Ingo Molnar 

70911fdc95  locking/lockdep: Change the meaning of check_prev_add()'s return 
value
ce07a9415f  locking/lockdep: Make check_prev_add() able to handle external 
stack_trace
74d83ec2b7  Merge tag 'platform-drivers-x86-v4.14-2' of 
git://git.infradead.org/linux-platform-drivers-x86
1418b85217  Add linux-next specific files for 20170929
+---++++---+
|   | 70911fdc95 | 
ce07a9415f | 74d83ec2b7 | next-20170929 |
+---++++---+
| boot_successes| 516| 129  
  | 167| 479   |
| boot_failures | 0  | 6
  | 43 | 146   |
| BUG:unable_to_handle_kernel   | 0  | 6
  | 24 | 42|
| Oops:#[##]| 0  | 6
  | 24 | 42|
| EIP:iput  | 0  | 5
  ||   |
| Kernel_panic-not_syncing:Fatal_exception  | 0  | 6
  | 1  |   |
| EIP:do_raw_spin_trylock   | 0  | 1
  | 1  |   |
| WARNING:kernel_stack  | 0  | 0
  | 20 | 110   |
| EIP:update_stack_state| 0  | 0
  | 23 | 42|
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 0
  | 23 | 42|
| invoked_oom-killer:gfp_mask=0x| 0  | 0
  | 0  | 16|
| Mem-Info  | 0  | 0
  | 0  | 16|
| EIP:clear_user| 0  | 0
  | 0  | 2 |
| EIP:copy_page_to_iter | 0  | 0
  | 0  | 1 |
+---++++---+

procd: Instance odhcpd::instance1 s in a crash loop 6 crashes, 0 seconds since 
last crash
procd: Instance uhttpd::instance1 s in a crash loop 6 crashes, 0 seconds since 
last crash
procd: Instance dnsmasq::instance1 s in a crash loop 6 crashes, 0 seconds since 
last crash
[  187.661000] Writes:  Total: 2  Max/Min: 0/0   Fail: 0 
procd: - shutdown -
[  220.353842] BUG: unable to handle kernel NULL pointer dereference at 0020
[  220.354946] IP: iput+0x544/0x650
[  220.355441] *pde =  
[  220.355444] 
[  220.356100] Oops:  [#1] PREEMPT SMP
[  220.356647] CPU: 0 PID: 29697 Comm: umount Not tainted 
4.13.0-rc4-00169-gce07a941 #627
[  220.357790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  220.359217] task: c0a0ba00 task.stack: c0a1e000
[  220.359881] EIP: iput+0x544/0x650
[  220.360384] EFLAGS: 00010246 CPU: 0
[  220.360900] EAX: 0001 EBX: c0100218 ECX:  EDX: 
[  220.361778] ESI:  EDI:  EBP: c0a1fdd8 ESP: c0a1fdc0
[  220.362689]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  220.363502] CR0: 80050033 CR2: 0020 CR3: 10a03000 CR4: 0690
[  220.36442

Re: Linux 4.13.4

2017-09-30 Thread Ed Tomlinson


Hi,

This build causes very annoying flickering on my display.  I am using the 
in kernel amdgpu module to drive a RX480 with 4G via display port.  When X 
is started (kde) I get flickers that are extrememly distracting.  The linux 
install is arch stable and is up to date. Nothing interesting in dmesg.


Reverting the changes to:

drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|3 
drivers/gpu/drm/amd/amdgpu/psp_v3_1.c  |2 


fixes the issue here.

Thanks
Ed Tomlinson


On Thursday, September 28, 2017 4:33:02 AM EDT, Greg KH wrote:

I'm announcing the release of the 4.13.4 kernel.

All users of the 4.13 kernel series must upgrade.

The updated 4.13.y git tree can be found at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-4.13.y

and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Documentation/dev-tools/gdb-kernel-debugging.rst   |6 
 Makefile   |2 
 arch/arc/kernel/entry.S|6 
 arch/arc/mm/tlb.c  |3 
 arch/mips/math-emu/dp_fmax.c   |   84 +--

 arch/mips/math-emu/dp_fmin.c   |   86 +--
 arch/mips/math-emu/dp_maddf.c  |  246 
+
 arch/mips/math-emu/ieee754int.h|4 
 arch/mips/math-emu/ieee754sp.h |4 
 arch/mips/math-emu/sp_fmax.c   |   84 +--

 arch/mips/math-emu/sp_fmin.c   |   86 +--
 arch/mips/math-emu/sp_maddf.c  |  229 
+--

 arch/powerpc/kernel/align.c|  119 ++
 arch/powerpc/platforms/powernv/npu-dma.c   |   12 -
 arch/powerpc/platforms/pseries/hotplug-memory.c|4 
 arch/s390/include/asm/mmu.h|2 
 arch/s390/include/asm/mmu_context.h|8 
 arch/s390/include/asm/tlbflush.h   |   30 --
 block/blk-core.c   |9 
 block/blk-mq.c |   16 +
 block/blk-mq.h |1 
 crypto/algif_skcipher.c|4 
 crypto/scompress.c |4 
 drivers/block/skd_main.c   |   21 +
 drivers/crypto/caam/caamalg_qi.c   |   11 
 drivers/crypto/ccp/ccp-crypto-aes-xts.c|4 
 drivers/crypto/ccp/ccp-dev-v5.c|2 
 drivers/crypto/ccp/ccp-dev.h   |2 
 drivers/crypto/ccp/ccp-ops.c   |   43 ++-
 drivers/devfreq/devfreq.c  |5 
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|3 
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c  |2 
 drivers/infiniband/hw/hfi1/init.c  |1 
 drivers/infiniband/hw/hfi1/rc.c|3 
 drivers/infiniband/hw/mlx5/mr.c|   18 +
 drivers/infiniband/hw/qib/qib_rc.c |4 
 drivers/input/joystick/xpad.c  |   10 
 drivers/input/serio/i8042-x86ia64io.h  |7 
 drivers/mailbox/bcm-flexrm-mailbox.c   |2 
 drivers/md/bcache/bcache.h |1 
 drivers/md/bcache/request.c|   12 -
 drivers/md/bcache/super.c  |7 
 drivers/md/bcache/sysfs.c  |4 
 drivers/md/bcache/util.c   |   50 ++--

 drivers/md/bcache/writeback.c  |   20 +
 drivers/md/bcache/writeback.h  |   21 +
 drivers/md/bitmap.c|9 
 drivers/media/i2c/adv7180.c|2 
 drivers/media/platform/qcom/venus/helpers.c|2 
 drivers/media/rc/lirc_dev.c|4 
 drivers/media/usb/uvc/uvc_ctrl.c   |7 
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c  |3 
 drivers/misc/cxl/api.c |4 
 drivers/misc/cxl/file.c|8 
 drivers/net/wireless/ath/wcn36xx/main.c|   52 
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |3 
 drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c |   62 -
 drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.h |3 
 drivers/net/wireless/intel/iwlwifi/mvm/nvm.c   |3 
 drivers/pci/hotplug/pciehp_hpc.c   |8 
 drivers/pci/hotplug/shpchp_hpc.c   |2 
 drivers/pinctrl/pinctrl-amd.c  |   75 ++
 drivers/pinctrl/pinctrl-amd.h  |1 
 drivers/pinctrl/samsung/pinctrl-exynos.c   |8 
 drivers/pinctrl/samsung/pinctrl-s3c24xx.c  |   37 +--

 dri

Re: [PATCH] mm: kill kmemcheck again

2017-09-30 Thread Vegard Nossum

On 30 September 2017 at 11:48, Steven Rostedt  wrote:
> On Wed, 27 Sep 2017 17:02:07 +0200
> Michal Hocko  wrote:
>
>> > Now that 2 years have passed, and all distros provide gcc that supports
>> > KASAN, kill kmemcheck again for the very same reasons.
>>
>> This is just too large to review manually. How have you generated the
>> patch?
>
> I agree. This needs to be taken out piece by piece, not in one go,
> where there could be unexpected fallout.

I have a patch from earlier this year that starts by removing the core
code and defining all the helpers/flags as no-ops so they can be
removed bit by bit at a later time. See the attachment. Pekka signed
off on it too.

I never actually submitted this because I was waiting for MSAN to be
merged in the kernel. It has been compile and run tested on x86_64.


Vegard
From b06e2b3b833b02ecb0afb9dd92422e89c7fbb6d9 Mon Sep 17 00:00:00 2001
From: Vegard Nossum 
Date: Thu, 30 Mar 2017 13:26:15 +0200
Subject: [PATCH] kmemcheck: remove core (x86 + mm) code

With KASAN/KMSAN and compiler-based instrumentation, this code is way past
its expiry date. There is zero reason to be using kmemcheck at this point,
as KASAN/KMSAN will be much faster, support SMP, and catch any bug that
kmemcheck would have caught. See the additional rationale and past
discussion at .

I take the approach of first removing all the core x86 and mm code, leaving
behind only include/linux/kmemcheck.h which provides some helpers (now only
dummies as for the !KMEMCHECK case previously) used in e.g. networking code
for special annotations.

We can then send individual (smaller, more reviewable) patches for removing
kmemcheck annotations in other subsystems.

Once there are no users of the kmemcheck helpers, we can kill off the dummy
helpers as well in a final patch.

Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: Sasha Levin 
Cc: Steven Rostedt 
Signed-off-by: Vegard Nossum 
Signed-off-by: Pekka Enberg 
---
 Documentation/admin-guide/kernel-parameters.txt |   7 -
 Documentation/dev-tools/index.rst   |   1 -
 Documentation/dev-tools/kmemcheck.rst   | 733 
 MAINTAINERS |  10 -
 arch/arm/include/asm/dma-iommu.h|   1 -
 arch/openrisc/include/asm/dma-mapping.h |   1 -
 arch/x86/Kconfig|   3 +-
 arch/x86/Makefile   |   5 -
 arch/x86/include/asm/dma-mapping.h  |   1 -
 arch/x86/include/asm/kmemcheck.h|  42 --
 arch/x86/include/asm/pgtable_types.h|   8 +-
 arch/x86/include/asm/string_32.h|   9 -
 arch/x86/include/asm/string_64.h|   8 -
 arch/x86/include/asm/xor.h  |   5 +-
 arch/x86/kernel/cpu/intel.c |  15 -
 arch/x86/kernel/traps.c |   5 -
 arch/x86/mm/Makefile|   2 -
 arch/x86/mm/fault.c |   6 -
 arch/x86/mm/init.c  |   5 +-
 arch/x86/mm/kmemcheck/Makefile  |   1 -
 arch/x86/mm/kmemcheck/error.c   | 227 
 arch/x86/mm/kmemcheck/error.h   |  15 -
 arch/x86/mm/kmemcheck/kmemcheck.c   | 658 -
 arch/x86/mm/kmemcheck/opcode.c  | 106 
 arch/x86/mm/kmemcheck/opcode.h  |   9 -
 arch/x86/mm/kmemcheck/pte.c |  22 -
 arch/x86/mm/kmemcheck/pte.h |  10 -
 arch/x86/mm/kmemcheck/selftest.c|  70 ---
 arch/x86/mm/kmemcheck/selftest.h|   6 -
 arch/x86/mm/kmemcheck/shadow.c  | 173 --
 arch/x86/mm/kmemcheck/shadow.h  |  18 -
 include/linux/dma-mapping.h |   8 +-
 include/linux/gfp.h |   2 -
 include/linux/kmemcheck.h   |  59 --
 include/linux/mm_types.h|   8 -
 include/linux/slab.h|  12 +-
 init/main.c |   1 -
 kernel/sysctl.c |  10 -
 lib/Kconfig.debug   |   6 +-
 lib/Kconfig.kmemcheck   |  94 ---
 mm/Kconfig.debug|   1 -
 mm/Makefile |   2 -
 mm/kmemcheck.c  | 125 
 mm/page_alloc.c |  14 -
 mm/slab.c   |  14 -
 mm/slab.h   |   2 -
 mm/slub.c   |  25 +-
 47 files changed, 18 insertions(+), 2547 deletions(-)
 delete mode 100644 Documentation/dev-tools/kmemcheck.rst
 delete mode 100644 arch/x86/include/asm/kmemcheck.h
 delete mode 100644 arch/x86/mm/kmemcheck/Makefile
 delete mode 100644 arch/x86/mm/kmemcheck/error.c
 delet

1 2 3 >

1 - 100 of 215 matches

Mail list logo