Re: [Bug-apl] Regex support

2017-09-21 Thread Elias Mårtenson
I've implemented the bare minimal needed to get regexes working through
a ⎕RE function. I've attached the diff.

I really need Jürgen to take a look at this, since my code that constructs
the return value cannot possibly be correct. There must be a better way to
handle this which does not involve conversion back and forth between
std::string.

Also, I have the result in an UTF-8-encoded C string, and I try to create
an UTF8_string from it like this:

Value_P field_value(UTF8_string(field.c_str()), LOC);

However, when I test this in APL I get the following result:

  '(..)..(..)$' ⎕RE 'sdklfjfj⍉'
┏→━━┓
┃"lf" "jâ\215\211"┃
┗∊━━┛

It seems the UTF-8 conversion is not done correctly by the UTF8_string
constructor. What did I do wrong?

Regards,
Elias

On 21 September 2017 at 11:38, Xiao-Yong Jin  wrote:

>
> > On Sep 20, 2017, at 9:19 PM, Peter Teeson 
> wrote:
> >
> > (These days performance can hardly be a compelling argument
> > with multiple many-core CPU chips.)
>
> This kind of argument for APL is exactly why Fortran is still alive and
> well.
>
>
Index: configure.ac
===
--- configure.ac(revision 1011)
+++ configure.ac(working copy)
@@ -162,6 +162,8 @@
 fi
 fi
 
+m4_include([m4/ax_path_lib_pcre.m4]) AX_PATH_LIB_PCRE([])
+
 # check if rdtsc (read CPU cycle counter is available.
 # This is expected only on Intel CPUs
 AC_MSG_CHECKING([whether CPU has rdtsc (read CPU cycle counter) opcode])
Index: m4/ax_path_lib_pcre.m4
===
--- m4/ax_path_lib_pcre.m4  (nonexistent)
+++ m4/ax_path_lib_pcre.m4  (working copy)
@@ -0,0 +1,90 @@
+# ===
+# https://www.gnu.org/software/autoconf-archive/ax_path_lib_pcre.html
+# ===
+#
+# SYNOPSIS
+#
+#   AX_PATH_LIB_PCRE [(A/NA)]
+#
+# DESCRIPTION
+#
+#   check for pcre lib and set PCRE_LIBS and PCRE_CFLAGS accordingly.
+#
+#   also provide --with-pcre option that may point to the $prefix of the
+#   pcre installation - the macro will check $pcre/include and $pcre/lib to
+#   contain the necessary files.
+#
+#   the usual two ACTION-IF-FOUND / ACTION-IF-NOT-FOUND are supported and
+#   they can take advantage of the LIBS/CFLAGS additions.
+#
+# LICENSE
+#
+#   Copyright (c) 2008 Guido U. Draheim 
+#
+#   This program is free software; you can redistribute it and/or modify it
+#   under the terms of the GNU General Public License as published by the
+#   Free Software Foundation; either version 3 of the License, or (at your
+#   option) any later version.
+#
+#   This program is distributed in the hope that it will be useful, but
+#   WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+#   Public License for more details.
+#
+#   You should have received a copy of the GNU General Public License along
+#   with this program. If not, see .
+#
+#   As a special exception, the respective Autoconf Macro's copyright owner
+#   gives unlimited permission to copy, distribute and modify the configure
+#   scripts that are the output of Autoconf when processing the Macro. You
+#   need not follow the terms of the GNU General Public License when using
+#   or distributing such scripts, even though portions of the text of the
+#   Macro appear in them. The GNU General Public License (GPL) does govern
+#   all other use of the material that constitutes the Autoconf Macro.
+#
+#   This special exception to the GPL applies to versions of the Autoconf
+#   Macro released by the Autoconf Archive. When you make and distribute a
+#   modified version of the Autoconf Macro, you may extend this special
+#   exception to the GPL to apply to your modified version as well.
+
+#serial 8
+
+AC_DEFUN([AX_PATH_LIB_PCRE],[dnl
+AC_MSG_CHECKING([lib pcre])
+AC_ARG_WITH(pcre,
+[  --with-pcre[[=prefix]]compile xmlpcre part (via libpcre check)],,
+ with_pcre="yes")
+if test ".$with_pcre" = ".no" ; then
+  AC_MSG_RESULT([disabled])
+  m4_ifval($2,$2)
+else
+  AC_MSG_RESULT([(testing)])
+  AC_CHECK_LIB(pcre, pcre_study)
+  if test "$ac_cv_lib_pcre_pcre_study" = "yes" ; then
+ PCRE_LIBS="-lpcre"
+ AC_MSG_CHECKING([lib pcre])
+ AC_MSG_RESULT([$PCRE_LIBS])
+ m4_ifval($1,$1)
+  else
+ OLDLDFLAGS="$LDFLAGS" ; LDFLAGS="$LDFLAGS -L$with_pcre/lib"
+ OLDCPPFLAGS="$CPPFLAGS" ; CPPFLAGS="$CPPFLAGS -I$with_pcre/include"
+ AC_CHECK_LIB(pcre, pcre_compile)
+ CPPFLAGS="$OLDCPPFLAGS"
+ LDFLAGS="$OLDLDFLAGS"
+ if test "$ac_cv_lib_pcre_pcre_compile" = "yes" ; then
+AC_MSG_RESULT(.setting PCRE_LIBS -L$with_pcre/lib -lpcre)
+PCRE_LIBS="-L$with_pcre/lib -lpcre"
+test -d "$with_pcre/include" && PCRE_CFLAGS="-I$with_pcre/include"
+ 

Re: [Bug-apl] Regex support

2017-09-21 Thread Juergen Sauermann

  
  
Hi Elias,
  
  the UTF8_constructors look OK, but it can be tricky to properly
  interpret indices (the elements of sub in your code) of
  UTF8-encoded strings (i.e whether they mean code points or byte
  offsets).
  
  My feeling is that you should avoid UTF8_strings completely and go
  for the UTF32 option of the library (assuming that
  UTF32 are codepoints encoded as 32 bit integers). APL character
  strings are almost UTF32 strings (except for gaps between
  the codepoints) and they avoid all the bits shifting needed for
  UTF8 strings.
  
  Best Regards,
  /// Jürgen


On 09/21/2017 12:09 PM, Elias Mårtenson
  wrote:


  I've implemented the bare minimal needed to get
regexes working through a ⎕RE function. I've attached the diff.


I really need Jürgen to take a look at this, since my code
  that constructs the return value cannot possibly be correct.
  There must be a better way to handle this which does not
  involve conversion back and forth between std::string.


Also, I have the result in an UTF-8-encoded C string, and I
  try to create an UTF8_string
  from it like this:


    Value_P
field_value(UTF8_string(field.c_str()), LOC);



However, when I test this in APL I get the following
  result:



        '(..)..(..)$' ⎕RE
  'sdklfjfj⍉'
  ┏→━━┓

  ┃"lf" "jâ\215\211"┃
  ┗∊━━┛
  
  
  It seems the UTF-8 conversion is not done correctly by
the UTF8_string
constructor. What did I do wrong?
  
  
  Regards,
  Elias      

  
  
On 21 September 2017 at 11:38,
  Xiao-Yong Jin 
  wrote:
  
  > On Sep 20, 2017, at 9:19 PM, Peter Teeson 
  wrote:
  >
  > (These days performance can hardly be a compelling
  argument
  > with multiple many-core CPU chips.)
  
This kind of argument for APL is exactly why Fortran
is still alive and well.

  


  


  




Re: [Bug-apl] cast from pointer to smaller type 'int'

2017-09-21 Thread Juergen Sauermann

Hi,

except that it did not compile on my machine:

*Thread_context.cc:73:44: error: invalid cast from type ‘pthread_t {aka 
long unsigned int}’ to type ‘uint64_t {aka long long unsigned int}’**

**<< reinterpret_cast(thread)**
**^**
**make[1]: *** [apl-Thread_context.o] Error 1*

This is primarily because *pthread_t *is not a pointer on my box.

Best Regards,
/// Jürgen


On 09/20/2017 10:36 PM, Xiao-Yong Jin wrote:

reinterpret_cast works from smaller sizes to larger sizes.

So simply
reinterpret_cast(thread)

should work for both of our machines (size_t is uint64_t for me).

... until some pthread implementation decides to hand you a larger sized struct 
for pthread_t.


On Sep 20, 2017, at 3:26 PM, Juergen Sauermann  
wrote:

Hi Xiao-Yong,

I can compile this:

<< reinterpret_cast(
   reinterpret_cast(thread))

Please let me know if it compiles on you box as well.

Best Regards,
Jürgen


On 09/20/2017 10:00 PM, Juergen Sauermann wrote:

Hi Xiao-Yong,

with reinterpret_cast I am getting (gcc 4.8)  this:

Thread_context.cc: In member function ‘void 
Thread_context::print(std::ostream&) const’:
Thread_context.cc:73:42: error: invalid cast from type ‘pthread_t {aka long 
unsigned int}’ to type ‘size_t {aka unsigned int}’
 << reinterpret_cast(thread)
   ^
make[1]: *** [apl-Thread_context.o] Error 1

It seems a little ridiculous to me that replacing a good old C-style cast that 
worked fine for the
last 10 years cannot be replaced by one of the 3 members of the C++ zoo of 
casts in a portable way?

Maybe some intermediate cast to const void * can be done on your machine?

Best Regards,
Jürgen


On 09/19/2017 11:44 PM, Xiao-Yong Jin wrote:

Should have got back to you sooner, but static_cast is not allowed between 
pointer types and non-pointer types.

Thread_context.cc:73:11: error: static_cast from 'pthread_t' (aka 
'_opaque_pthread_t *') to 'int' is not allowed
<< static_cast(thread)
   ^~~~

I need reinterpret_cast here.  I cannot reinterpret_cast either, 
because of the difference in size.



On Sep 11, 2017, at 3:01 PM, Juergen Sauermann 
  wrote:

Hi Xiao-Yong,

I see. In this particular case the pthread_t is only used to identify a thread 
and
to distinguish it from other threads for debugging purposes. So as long as the
compiler does not complain about the cast everything is fine. Cast to void * 
instead of
int would also be an option.

/// Jürgen
  


On 09/11/2017 08:38 PM, Xiao-Yong Jin wrote:


I don't think there is a portable way of printing a variable of type pthread_t.  It 
could be a struct, depending on the implementation.  static_cast is 
alright, but may not be useful in the future.




On Sep 11, 2017, at 1:08 PM, Juergen Sauermann 

  wrote:

Hi Xiao-Yong,

thanks, maybe fixed in SVN 1011.
Problem with that error is that the casted type is not a pointer, at least on 
my machine.

/// Jürgen


On 09/11/2017 06:55 PM, Xiao-Yong Jin wrote:



At revision 1010

Thread_context.cc:72:65: error: cast from pointer to smaller type 'int' loses 
information
out << "thread #" << setw(2) << N << ":" << setw(16)  << int(thread)