"handling registration interrupt
> > > took too long!!\n");
> > > - return -EINVAL;
> > > + "Timestamp offest processing
> > > + reached timeout of %lld ms\n",
> >
> > Typo in offest, you ca
On Thu, Feb 16, 2023 16:39 PM, Stanislaw Gruszka
wrote:
> On Sun, Feb 12, 2023 at 10:44:46PM +0200, Oded Gabbay wrote:
> > From: Ofir Bitton
> >
> > In order for interrupt timestamp to be more accurate we should capture
> > it during the interrupt handling rather t
VENT_SIZE,
> };
>
> diff --git a/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h
> b/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h
> index 82f3ca2a3966..8522f24deac0 100644
> --- a/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h
> +++ b/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h
> @@ -63,7 +63,10 @@ struct gaudi2_cold_rst_data {
> u32 fake_sig_validation_en : 1;
> u32 bist_skip_enable : 1;
> u32 bist_need_iatu_config : 1;
> - u32 reserved : 24;
> + u32 fake_bis_compliant : 1;
> + u32 wd_rst_cause_arm : 1;
> + u32 wd_rst_cause_arcpid : 1;
> + u32 reserved : 21;
> };
> __le32 data;
> };
Reviewed-by: Ofir Bitton
u64 reserved5;
> - __u64 reserved6;
> - __u32 reserved7;
> - __u8 reserved8;
> + __u32 reserved6;
> + __u8 reserved7;
> __u8 revision_id;
> __u16 tpc_interrupt_id;
> + __u32 reserved8;
> __u32 reserved9;
> - __u8 pad3[4];
> __u64 engine_core_interrupt_reg_addr;
> };
>
Reviewed-by: Ofir Bitton
%llu has not
> finished in %u seconds!\n",
> + cs->sequence, timeout_sec);
> break;
>
> default:
> dev_err(hdev->dev,
> - "Command submission %llu has not finished in time!\n",
> - cs->sequence);
> + "Command submission %llu has not finished in %u
> seconds!\n",
> + cs->sequence, timeout_sec);
> break;
> }
>
Reviewed-by: Ofir Bitton
.
>* @comp_name: Name of the component.
> - * @modules_mask: i'th bit (from LSB) is a flag - on if module i in enum
> - * hl_modules is used.
>* @modules_counter: number of set bits in modules_mask.
>* @reserved: reserved for future use.
>* @modules: versions of the component's modules. Elborated explanation in
> @@ -800,9 +774,8 @@ struct hl_component_versions {
> __u8 component[VERSION_MAX_LEN];
> __u8 fw_os[VERSION_MAX_LEN];
> __u8 comp_name[NAME_MAX_LEN];
> - __le16 modules_mask;
> __u8 modules_counter;
> - __u8 reserved[1];
> + __u8 reserved[3];
> struct hl_module_data modules[];
> };
>
Reviewed-by: Ofir Bitton
t; - u32 bist_need_iatu_config : 1;
> + u32 reserved1 : 1;
> u32 fake_bis_compliant : 1;
> u32 wd_rst_cause_arm : 1;
> u32 wd_rst_cause_arcpid : 1;
Reviewed-by: Ofir Bitton
if (nr == _IOC_NR(HL_IOCTL_INFO)) {
> ioctl = &hl_ioctls_control[nr];
> } else {
> - dev_err(hdev->dev_ctrl, "invalid ioctl: pid=%d, nr=0x%02x\n",
> + dev_dbg_ratelimited(hdev->dev_ctrl, "invalid ioctl: pid=%d,
> nr=0x%02x\n",
> task_pid_nr(current), nr);
> return -ENOTTY;
> }
Reviewed-by: Ofir Bitton
o.in_reset && !reset_device && !hdev->pldm)
device_is_idle = hdev->asic_funcs->is_device_idle(hdev,
idle_mask,
HL_BUSY_ENGINES_MASK_EXT_SIZE, NULL);
if (!device_is_idle) {
Reviewed-by: Ofir Bitton mailto:obit...@habana.ai>>
quot;failed to scrub memory from hpriv
release (%d)\n", rc);
+ hl_device_reset(hdev, HL_DRV_RESET_HARD);
+ }
}
/* Now we can mark the compute_ctx as not active. Even if a reset is
running in a different
Reviewed-by: Ofir Bitton mailto:obit...@habana.ai>>
banalabs/gaudi2/gaudi2.c
> @@ -10650,6 +10650,9 @@ static int gaudi2_ctx_init(struct hl_ctx *ctx)
> {
> int rc;
>
> + if (ctx->asid == HL_KERNEL_ASID_ID)
> + return 0;
> +
> rc = gaudi2_mmu_prepare(ctx->hdev, ctx->asid);
> if (rc)
> return rc;
Reviewed-by: Ofir Bitton
I will be leaving Intel soon, Yaron Avizrat will take the role
of habanalabs driver maintainer.
Signed-off-by: Ofir Bitton
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index ed2d2dbcec81..a4b36590061e 100644
--- a/MAINTAINERS
Hi Dave, Sima.
As I am about to leave Intel during the next weeks, I'm stepping
down from the maintainer role of the habanalabs driver.
Yaron Avizrat from Intel will replace me as the new maintainer.
Ofir Bitton (1):
MAINTAINERS: Change habanalabs maintainer
MAINTAINERS | 2 +-
1
_num)
> + event_mask |= GENMASK(input->event_types_num - 1, 0);
>
> WREG32(base_reg + mmSPMU_PMCNTENSET_EL0_OFFSET, event_mask);
> } else {
Reviewed-by: Ofir Bitton
ncs gaudi2_funcs = {
> .asic_dma_pool_free = gaudi2_dma_pool_free,
> .cpu_accessible_dma_pool_alloc = gaudi2_cpu_accessible_dma_pool_alloc,
> .cpu_accessible_dma_pool_free = gaudi2_cpu_accessible_dma_pool_free,
> - .asic_dma_unmap_single = gaudi2_dma_unmap_single,
> - .asic_dma_map_single = gaudi2_dma_map_single,
> .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
> .cs_parser = gaudi2_cs_parser,
> .asic_dma_map_sgtable = hl_dma_map_sgtable,
Reviewed-by: Ofir Bitton
CPUCP_PACKET_INTS_REGISTER, /* internal */
> - CPUCP_PACKET_ID_MAX /* must be last */
> + CPUCP_PACKET_SOFT_RESET,/* internal */
> + CPUCP_PACKET_INTS_REGISTER, /* internal */
> + CPUCP_PACKET_ID_MAX /* must be last */
> };
>
> #define CPUCP_PACKET_FENCE_VAL 0xFE8CE7A5
Ack for the whole series.
Reviewed-by: Ofir Bitton
On 24/02/2024 1:32, Carl Vanderlip wrote:
>
> On 2/20/2024 8:01 AM, Oded Gabbay wrote:
>> From: Ofir Bitton
>>
>> Today we read PCI VENDOR-ID in order to make sure PCI link is
>> healthy. Apparently the VENDOR-ID might be stored on host and
>> hence, when we
From: Ohad Sharabi
This addition helps log parsers better define the error without the need
to go back and search the device name on former log lines.
Signed-off-by: Ohad Sharabi
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/mmu/mmu.c | 8 +---
1 file changed, 5 insertions
From: Dani Liberman
The extra info will help in better traceability and debug.
Signed-off-by: Dani Liberman
Signed-off-by: Ofir Bitton
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 17 ++---
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a
Align embedded headers to latest release.
Signed-off-by: Ofir Bitton
---
.../habanalabs/include/gaudi2/gaudi2_fw_if.h | 27 +
.../include/gaudi2/gaudi2_reg_map.h | 8 +
include/linux/habanalabs/cpucp_if.h | 10 +--
include/linux/habanalabs
Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 25 ++-
drivers/accel/habanalabs/common/firmware_if.c | 10 +---
drivers/accel/habanalabs/common/mmu/mmu.c | 5 ++--
drivers/accel/habanalabs/common/pci/pci.c | 4 +--
4 files changed
From: Farah Kassabri
Align the interrupts related headers to latest release.
Signed-off-by: Farah Kassabri
Signed-off-by: Ofir Bitton
Reviewed-by: Ofir Bitton
---
.../gaudi2/gaudi2_async_ids_map_extended.h| 94 +--
1 file changed, 47 insertions(+), 47 deletions(-)
diff
From: Tal Risin
Exposing server type through debugfs to enable easier access via
scripts.
Signed-off-by: Tal Risin
Reviewed-by: Ofir Bitton
---
Documentation/ABI/testing/debugfs-driver-habanalabs | 6 ++
drivers/accel/habanalabs/common/debugfs.c | 5 +
2 files changed, 11
that event
was received would help in debugging the issue.
Signed-off-by: Farah Kassabri
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 12
drivers/accel/habanalabs/common/habanalabs.h | 15 ++-
drivers/accel/habanalabs/gaudi2/gaudi2.c | 3
, all calling functions that may be invoked by user space can issue
prints only if the error code is not -EAGAIN.
Signed-off-by: Ohad Sharabi
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/debugfs.c | 17 +--
drivers/accel/habanalabs/common/device.c | 4 +-
drivers/accel
Ohad Sharabi
Signed-off-by: Ofir Bitton
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/accel/habanalabs/common/device.c
b/drivers/accel/habanalabs/common/device.c
index a381ece
addition, this generic function now considers also the sub-minor FW
version and also remove dead code resulting in deprecated FW versions
compatibility.
Signed-off-by: Ohad Sharabi
Signed-off-by: Ofir Bitton
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/firmware_if.c | 25
From: Tomer Tayar
Future supported ASICs might use the dynamic EQ mechanism with the
firmware, and in that case the EQ size won't be equal to the default
HL_EQ_SIZE_IN_BYTES value.
Add an ASIC property to enable overriding this value.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir B
From: Tomer Tayar
FW initiates a hard reset upon an MC SEI severe error.
Align the driver to expect this reset and avoid accessing the device
until the reset is done.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/gaudi2/gaudi2.c | 4 ++--
1 file changed, 2
f-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/gaudi2/gaudi2.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c
b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index 08276f03c80f..18cc7b773650 100644
--- a/d
From: Tomer Tayar
hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and
therefore can be moved from Gaudi2-only code to common code, and
possibly used for other ASICs.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c
From: Igal Zeltser
Struct comms_desc_header is deprecated and replaced by struct
comms_msg_header. As a preparation for removing comms_desc_header
from FW, all it's usage in code is replaced by comms_msg_header.
Signed-off-by: Igal Zeltser
Reviewed-by: Ofir Bitton
---
drivers/
preboot and skip the time wasting attempt of trying to load the
boot fit, which will fail due to the error.
Signed-off-by: Farah Kassabri
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/firmware_if.c | 24 +--
1 file changed, 12 insertions(+), 12 deletions(-)
diff
From: Ariel Suller
when reporting tpc events, the dcore and tpc in dcore should
be reported and propagated, and not the generatl tpc number
Signed-off-by: Ariel Suller
Reviewed-by: Ofir Bitton
---
.../gaudi2/gaudi2_async_ids_map_extended.h| 150 +-
1 file changed, 75
.
Signed-off-by: Dani Liberman
Reviewed-by: Ofir Bitton
---
include/linux/habanalabs/cpucp_if.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/habanalabs/cpucp_if.h
b/include/linux/habanalabs/cpucp_if.h
index 1ac1d68193e3..0913415243e8 100644
--- a/include
patch will move the heartbeat thread scheduling to be after driver
is done with all initializations.
Signed-off-by: Farah Kassabri
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 54 +++-
1 file changed, 33 insertions(+), 21 deletions(-)
diff --git a
There are several timestamp registration debug prints which
spams the kernel log whenever dyn debug is enabled.
Remove those prints.
Signed-off-by: Ofir Bitton
---
.../accel/habanalabs/common/command_submission.c| 13 -
1 file changed, 13 deletions(-)
diff --git a/drivers/accel
From: Tomer Tayar
As the new dynamic EQ includes clock change events which are common and
not ASIC-specific, add a common handler for these events.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 46
drivers/accel
From: Vitaly Margolin
Add cpld_timestamp field to cpucp_info structure and return cpld
timestamp as part of cpld version
Signed-off-by: Vitaly Margolin
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/sysfs.c | 5 +++--
include/linux/habanalabs/cpucp_if.h | 3 ++-
2 files
From: Farah Kassabri
Gaudi2 with PCI revision ID with the value of '4' represents Gaudi2D
device and should be detected and initialized as Gaudi2.
Signed-off-by: Farah Kassabri
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 4
dri
From: Tomer Tayar
The device debugfs directory was modified to be named as the parent
device name.
Update the description of 'mmu' and 'mmu_error' to use the new path.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
Documentation/ABI/testing/debugfs-driver-haba
From: Tal Cohen
When sending disable pci msg towards firmware, there is a
possibility that an EQ packet is already pending,
disabling EQ interrupt will prevent this from happening.
The interrupt will be re-enabled after reset.
Signed-off-by: Tal Cohen
Reviewed-by: Ofir Bitton
---
drivers
From: Rakesh Ughreja
Netowrk EDMAs uses more outstanding transfers so this needs to be
programmed by EDMA firmware.
Signed-off-by: Rakesh Ughreja
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/accel
Some systems allow a maximum number of 128 MSI-X interrupts.
Hence we reduce the interrupt count to 128 instead of 512.
Signed-off-by: Ofir Bitton
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/gaudi2/gaudi2P.h| 8
drivers/accel/habanalabs/include/gaudi2/gaudi2.h | 4
From: Farah Kassabri
In order to have better debuggability upon encountering FW issues,
We are adding additional info once CPU packet timeout expires.
Signed-off-by: Farah Kassabri
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/firmware_if.c | 14 +++---
1 file changed
f the CI with a similar wrap around,
to make it easier to compare the values.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 19 ++-
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/accel/habanalabs/common/devi
and
print another reason - allocated command buffers.
Signed-off-by: Ilia Levi
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 19 +++---
drivers/accel/habanalabs/common/habanalabs.h | 14 ++-
.../accel/habanalabs/common/habanalabs_drv.c | 2 +-
dri
From: Tomer Tayar
The test packet which is sent to FW for the PQ heartbeat is used also as
the trigger in FW to send the EQ heartbeat event.
Add the time of the last sent packet to the debug info which is printed
upon a EQ heartbeat failure.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
-by: Didi Freiman
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/habanalabs.h | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/habanalabs/common/habanalabs.h
b/drivers/accel/habanalabs/common/habanalabs.h
index a06e5a966f45..6f27ce4fa01b
From: Tomer Tayar
Add a dump of the EQ entries headers upon a EQ heartbeat failure.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common/device.c | 2 ++
drivers/accel/habanalabs/common/habanalabs.h | 1 +
drivers/accel/habanalabs/common/irq.c
cleanup_resources() and accessing this work uninitialized.
As there is no real need to re-initialize this work every time it is
rescheduled, move this initialization to device_early_init() to be done
once and early enough.
Signed-off-by: Tomer Tayar
Reviewed-by: Ofir Bitton
---
drivers/accel/habanalabs/common
From: Oded Gabbay
Because I left habana, Ofir Bitton is now the habanalabs driver
maintainer.
The git repo also changed location to the Habana GitHub website.
Signed-off-by: Oded Gabbay
Acked-by: Daniel Vetter
---
MAINTAINERS | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff
accel/habanalabs: add more info upon cpu pkt timeout
Igal Zeltser (1):
accel/habanalabs: use msg_header instead of desc_header
Ilia Levi (1):
accel/habanalabs: additional print in device-in-use info
Oded Gabbay (1):
MAINTAINERS: Change habanalabs maintainer and git repo path
O
habanalabs: additional print in device-in-use info
Oded Gabbay (1):
MAINTAINERS: Change habanalabs maintainer and git repo path
Ofir Bitton (3):
accel/habanalabs/gaudi2: align embedded specs headers
accel/habanalabs: remove timestamp registration debug prints
accel/habanalbs/gaud
54 matches
Mail list logo