date:20160903

Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Veek. M

https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm

Wanted to know if the above link idea, had been implemented and if 
there's a module that accepts a pattern like 'cap' and give you all the 
instances of unicode 'CAP' characters.
 ⋂ \bigcap
 ⊓ \sqcap
 ∩ \cap
 ♑ \capricornus
 ⪸ \succapprox
 ⪷ \precapprox

(above's from tex)

I found two useful modules in this regard: unicode_tex, unicodedata
but unicodedata is a builtin which does not do globs, regexs - so it's 
kind of limiting in nature.

Would be nice if you could search html/xml character entity references 
as well.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Steve D'Aprano

On Sun, 4 Sep 2016 06:47 am, Thomas 'PointedEars' Lahn wrote:

> Your posting is lacking a real name in the “From” header field.


Thomas, if that is really your name, how do we know that:

Thomas 'PointedEars' Lahn 

is a real name? Is sounds made up to me. I'm afraid that we're going to have
to insist that you scan you birth certificate, drivers licence and other
forms of ID and send them to us so that we can see proof that this is your
real name.

Please either comply, or give up your stupid and pointless obsession with
trying to be the Internet Police for something that isn't even a real rule.




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pythons for .Net

2016-09-03 Thread Steve D'Aprano

On Sat, 3 Sep 2016 12:34 pm, Denis Akhiyarov wrote:

> Finally if anyone can contact Christian Heimes (Python Core Developer),
> then please ask him to reply on request to update the license to MIT:
> 
> https://github.com/pythonnet/pythonnet/issues/234
> 
> He is the only contributor that prevents updating to MIT license.


I have emailed him off-list.

Thanks for the information on PythonNet, Denis.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Chris Angelico

On Sun, Sep 4, 2016 at 11:51 AM, Steve D'Aprano
 wrote:
> On Sun, 4 Sep 2016 06:47 am, Thomas 'PointedEars' Lahn wrote:
>
>> Your posting is lacking a real name in the “From” header field.
>
>
> Thomas, if that is really your name, how do we know that:
>
> Thomas 'PointedEars' Lahn
>
> is a real name? Is sounds made up to me. I'm afraid that we're going to have
> to insist that you scan you birth certificate, drivers licence and other
> forms of ID and send them to us so that we can see proof that this is your
> real name.
>
> Please either comply, or give up your stupid and pointless obsession with
> trying to be the Internet Police for something that isn't even a real rule.
>

His posts aren't making it across the news->list gateway any more.
Killfile him and move on...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Steve D'Aprano

On Sun, 4 Sep 2016 12:19 pm, Chris Angelico wrote:

[...]
>> Please either comply, or give up your stupid and pointless obsession with
>> trying to be the Internet Police for something that isn't even a real
>> rule.
> 
> His posts aren't making it across the news->list gateway any more.
> Killfile him and move on...

But but but... I couldn't do that.

https://www.xkcd.com/386/




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Chris Angelico

On Sun, Sep 4, 2016 at 12:49 PM, Steve D'Aprano
 wrote:
> On Sun, 4 Sep 2016 12:19 pm, Chris Angelico wrote:
>
> [...]
>>> Please either comply, or give up your stupid and pointless obsession with
>>> trying to be the Internet Police for something that isn't even a real
>>> rule.
>>
>> His posts aren't making it across the news->list gateway any more.
>> Killfile him and move on...
>
> But but but... I couldn't do that.
>
> https://www.xkcd.com/386/

Ah, you got me. Can't force anyone to violate standards documents like
RFCs and XKCDs.

ChrisA
who may or may not have recently pushed a commit to make something
XKCD 859 compliant...
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Veek. M

Thomas 'PointedEars' Lahn wrote:

> Veek. M wrote:
> 
>> https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm
>> 
>> Wanted to know if the above link idea,
> 
> … which is 404-compliant; the Internet Archive does not have it either
> …
> 
>> had been implemented
> 
> Probably not.
> 
>> and if there's a module that accepts a pattern like 'cap' and give
>> you all the instances of unicode 'CAP' characters.
> 
> I do not know any.
> 
>>  ⋂ \bigcap
>>  ⊓ \sqcap
>>  ∩ \cap
>>  ♑ \capricornus
>>  ⪸ \succapprox
>>  ⪷ \precapprox
>> 
>> (above's from tex)
>> 
>> I found two useful modules in this regard: unicode_tex, unicodedata
>> but unicodedata is a builtin which does not do globs, regexs - so
>> it's kind of limiting in nature.
> 
> Quick hack:
> 
> #
> from unicode_tex import unicode_to_tex_map
> 
> for key, value \
> in filter(lambda item: "cap" in item[1], unicode_to_tex_map.items()):
> print(key, value)
> #
> 
> (Optimizations are welcome.)
> 
> It is easy to come up with methods that take a globbing or a regular
> expression (globbing expressions can be turned into regular
> expressions easily) and returns, perhaps as a dictionary or list of
> tuples, only the matching entries.
> 
> Other than that I think you will have to turn the Unicode Character
> Database (which is available via HTTP as one huge text file; see the
> Python Tutorial on “Internet Access” for how to get it dynamically)
> into whatever form suits you for querying it.
>  
>> Would be nice if you could search html/xml character entity
>> references as well.
> 
> For what purpose?
> 
> Your posting is lacking a real name in the “From” header field.
> 

Ouch! Sorry for the bad link Thomas. The link is titled '[Python-ideas] 
Extend unicodedata with a name search' and I suspect this updated link 
data (http://code.activestate.com/lists/python-ideas/29504/) may work - 
if it doesn't you could google the title.

I don't want to dump/replicate the existing Unicode data in module 
'unicodedata'.

Regarding purpose, well I need this for hexchat. I IRC a lot and often, 
I want to ask a question involving math symbols. I've written some 
python (included at the bottom) that translates:
 \help filter_word #into a list of symbols and names 
(serves as a memory jog). It also translates stuff like:
 A \cap B \epsilon C to A ∩ B ε C.
but all this works with a subset of tex - it can't do complicated 
formula. I wanted to extend it further.. I don't think I shall be able 
to subscript integrals easily but I could make better use of the 
available unicode, which means making it more accessible (hence the 
pattern matching feature) - html/xml entities provide a new way of 
remembering stuff.

---
Regarding the name (From field), my name *is* Veek.M though I tend to 
shorten it to Vek.M on Google (i think Veek was taken or some such 
thing). Just to be clear, my parents call me something closely related 
to Veek that is NOT Beek or Peek or Squeak or Sneak and my official name 
is something really weird. Identity theft being what it is, I probably 
am lying anyhow about all this, but it sounds funny so :p

import hexchat
import re, unicode_tex, unicodedata

__module_name__ = 'Unicode'
__module_version__ = '0.1'
__module_description__ = 'Substitute \whatever with Unicode char in 
cmdline input'

#re_repl = unicodedata.lookup('N-ARY UNION')

def debug(*args):
hexchat.prnt('#{}#'.format(*args))

def print_help(*args):
hexchat.prnt('{}'.format(*args))

def send_message(word, word_eol, userdata):
if not(word[0] == "65293"):
return

msg = hexchat.get_info('inputbox')
if msg is None:
return

x = re.match(r'(^\\help)\s+(\w+)', msg)
if x:
filter = x.groups()[1]
for key, value in unicode_tex.tex_to_unicode_map.items():
if filter in key:
print_help(value + ' ' + key)
hexchat.command("settext %s" % '')
return

tex_matches = re.findall(r'(\\\w+)', msg)
for tex_word in tex_matches:
repl = unicode_tex.tex_to_unicode_map.get(tex_word)

if repl is None:
repl = 'err'

msg = re.sub(re.escape(tex_word), repl, msg)

hexchat.command("settext %s" % msg)

hexchat.hook_print('Key Press', send_message)


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Rustom Mody

On Saturday, September 3, 2016 at 5:25:48 PM UTC+5:30, Veek. M wrote:
> https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm
> 
> Wanted to know if the above link idea, had been implemented and if 
> there's a module that accepts a pattern like 'cap' and give you all the 
> instances of unicode 'CAP' characters.
>  ⋂ \bigcap
>  ⊓ \sqcap
>  ∩ \cap
>  ♑ \capricornus
>  ⪸ \succapprox
>  ⪷ \precapprox
> 
> (above's from tex)
> 
> I found two useful modules in this regard: unicode_tex, unicodedata
> but unicodedata is a builtin which does not do globs, regexs - so it's 
> kind of limiting in nature.
> 
> Would be nice if you could search html/xml character entity references 
> as well.

[Not exactly an answer]

I use a number of things for such
1. Google
2. Xah Lee’s excellent pages which often fit my brain better than wikipedia:
   http://xahlee.info/comp/unicode_index.html
3. emacs’ function ucs-insert recently renamed to insert-char
   ie [In emacs] Type Alt-x insert-char
   After that some kind of TAB-globbing (case-insensitive) works
   I wont try with Cap (because the number of *CAPITAL* is in thousands!)
   eg alphaTAB gives nothing. However *alphaTAB gives a bunch.
   Narrow to "greek alpha"TAB and you get a bunch

The fact that we should have a series of levels for char-input from
most general and unergonomic (google) to most specific and ergonomic (special 
purpose keyboard) Ive tried to talk of as 7 levels near end of
http://blog.languager.org/2015/01/unicode-and-universe.html
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Rustom Mody

On Sunday, September 4, 2016 at 9:32:28 AM UTC+5:30, Veek. M wrote:
> Regarding the name (From field), my name *is* Veek.M though I tend to 
> shorten it to Vek.M on Google (i think Veek was taken or some such 
> thing). Just to be clear, my parents call me something closely related 
> to Veek that is NOT Beek or Peek or Squeak or Sneak and my official name 
> is something really weird. Identity theft being what it is, I probably 
> am lying anyhow about all this, but it sounds funny so :p

Please dont take the name-police bait.
As far as I am concerned telling someone “I dont like your name” is in the same
bracket as “I don’t like your religion/skin-color/gender/nationality/etc”
Ie its highly offensive
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

2016-09-03 Thread Jussi Piitulainen

Chris Angelico  writes:

> On Sun, Sep 4, 2016 at 12:49 PM, Steve D'Aprano
>  wrote:
>> On Sun, 4 Sep 2016 12:19 pm, Chris Angelico wrote:
>>
>> [...]
 Please either comply, or give up your stupid and pointless obsession with
 trying to be the Internet Police for something that isn't even a real
 rule.
>>>
>>> His posts aren't making it across the news->list gateway any more.
>>> Killfile him and move on...
>>
>> But but but... I couldn't do that.
>>
>> https://www.xkcd.com/386/
>
> Ah, you got me. Can't force anyone to violate standards documents like
> RFCs and XKCDs.
>
> ChrisA
> who may or may not have recently pushed a commit to make something
> XKCD 859 compliant...

(-:

(You were baiting for that.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Pythons for .Net

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

Re: Extend unicodedata with a name/pattern/regex search for character entity references?

10 matches

Site Navigation

Mail list logo

Footer information